Você está na página 1de 24

COMP 206:

Computer Architecture and


Implementation
Montek Singh
Wed, Sep 28, 2005
Topic: Pipelining -- Intermediate Concepts

(Multicycle Operations; Exceptions)


1

Outline
Multi-cycle operations
Floating-point operations
Structural and data hazards
Interrupts, Faults and Exceptions
Precise exceptions
Complications in pipelines

READING: Appendix A

Pipelining Multicycle Operations


Assume five-stage pipeline
Third stage (execution) has two functional units E1

and E2

Instruction goes through either E1 or E2, but not both


E1 and E2 are not pipelined
Stage delay of E1 = 2 cycles
Stage delay of E2 = 4 cycles
No buffering on inputs of E1 and E2

Stage delay of other stages = 1 cycle


Consider an instruction sequence of five instructions
Instructions 1, 3, 5 need E1
Instructions 2, 4 need E2
3

Space-Time Diagram: Multicycle Operations


Delay
1
IF
1
ID
2
E1
4
E2
1
MEM
1
WB

1
1

2
2
1

3
3
2
1

4
4
3
1
2

5
5
4
3
2
1

6
5
4
3
2

7
5
4
2
3

10 11 12 13

5
4

5
4

5
4
2
3

4
5

4
5

Out-of-order completion
3 finishes before 2, and 5 finishes before 4

Instructions may be delayed after entering the pipeline

because of structural hazards

Instructions 2 and 4 both want to use E2 unit at same time


Instruction 4 stalls in ID unit
This causes instruction 5 to stall in IF unit

Floating-Point Operations in MIPS


IF

ID

WAW hazards
possible; WAR
hazards not
possible

Out-of-order
completion; has
ramifications for
exceptions

EX

M1 M2 M3 M4 M5 M6 M7
A1 A2 A3 A4

Longer operation
latency implies
more frequent
stalls for RAW
hazards

MEM

DIV (25)
Structural hazard:
not fully pipelined

Structural hazard:
instructions have
varying running
times

WB
5

Structural Hazard on WB Unit


DIV.D (issued at t = -16)
MUL.D F0, F4, F6
integer instruction
integer instruction
ADD.D F2, F4, F6
integer instruction
integer instruction
L.D F2, 0(R2)

1
D
IF

2
D
ID
IF

3
D
M1
ID
IF

4
5
6
7
8
9
10 11
D
D
D
D
D
D MEM WB
M2 M3 M4 M5 M6 M7 MEM WB
EX MEM WB
ID EX MEM WB
IF
ID A1 A2 A3 A4 MEM WB
IF
ID EX MEM WB
IF
ID EX MEM WB
IF
ID EX MEM WB

This is worst-case scenario: max steady-state number of write ports is 1


Dont replicate resources; detect and serialize access as needed
Early resolution
Track use of WB in ID stage (using shift register), stall instructions there
reservation register

Simplifies pipeline control; all stalls occur in ID


adds shift register and write-conflict logic

Late resolution
Stall instructions at entry to MEM or WB stage
Complicates pipeline control (two stall locations)

WAW Hazards
DIV.D (issued at t = -16)
MULT.D F0, F4, F6
integer instruction
integer instruction
ADD.D F2, F4, F6
L.D F2, 0(R2)

1
D
IF

2
D
ID
IF

3
D
s
s

4
D
M1
ID
IF

5
6
7
8
9
10 11 12 13
D
D
D
D
D MEM WB
M2 M3 M4 M5 M6 M7 MEM WB
EX MEM WB
ID EX MEM WB
IF
ID
s
A1 A2 A3 A4 MEM WB
IF
ID EX MEM WB

WAW hazard arises only when no instruction between ADD.D and L.D uses

result computed by ADD.D

Adding an instruction like ADD.D F8,F2,F4 before L.D would stall pipeline

enough for RAW hazard to avoid WAW hazard


Can happen through a branch/trap (example in HP3, Section A.9)
Rare situation, but must still handle correctly
Hazard resolution
Delay the issue of L.D until ADD.D enters MEM
Cancel write of ADD.D

RAW Hazards
L: L.D F4, 0(R2)
IF
M:MUL.D F0, F4, F6 ID
A:ADD.D F2, F0, F8 EX
S:S.D 0(R2), F2
Mult
D:DIV.D F12, F4, F8 Add
Div
MEM
WB

1
L

2 3 4
M A A
L M M
L

5
S
A

6
S
A

7
S
A

8
S
A

9 10 11 12 13 14 15
S S S D
A A A S D
S S S
M M M M M M M
A A A A
D D
M
L
M

16 17 18 19

D
A

D
S
A

Longer delays of FP operations increases number of stalls in response to

RAW hazards
Two methods for reducing stalls

Compiler could have moved instruction D between instructions M and A,

which would allow D to complete earlier; or hardware could detect this


possibility and issue instruction D out of order
ID stage is a bottleneck because instructions wait there for their operands
to be available; could add buffers (reservation stations) to functional units
and let instructions await their operands there
8

Responsibilities of ID (all stalls in ID)


Three sets of checks
Structural hazards
Check for availability of FP unit
Ensure WB unit will be available when needed
RAW hazards
Stall current instruction until its source registers are not listed as
pending registers in a pipeline register that will not be available
when current instruction needs the result
WAW hazards
If any instruction in adder, divider, or multiplier has same register
destination as current instruction, stall current instruction

Hazards between FP and integer instructions


Integer and FP instructions use disjoint sets of registers,
except for FP-integer register moves
FP load-stores can conflict with integer load-stores in MEM
stage
9

MIPS R4000 Floating-Point Pipeline


1
Stage
A
D
E
M
N
R
S
U

1
A
D
E
M
N
R
S
U

Functional Unit
FP adder
FP divider
FP multiplier
FP multiplier
FP multiplier
FP adder
FP adder

2
x

3
x

Add
Subtract
x
x
x

Description
Mantissa ADD stage
Divide pipeline stage
Exception test stage
First stage of multiplier
Second stage of multiplier
Rounding stage
Operand shift stage
Unpack FP numbers

x
x

A
D
E
M
N
R
S
U
1

A
D
E
M
N
R
S
U

2
x

4
x

x
x

7
x

Multiply
x

30 31 32 33 34 35 36
x
x
x
x x x x x

x
x

Divide

10

Instruction Mixes in FP Pipeline: Adds Only


Cant initiate
another add
on cycle 2
Conflict here
1

1
A
D
E
M
N
R
S
U

2
x

3
x

Add
Subtract
x
x
x

Cant initiate
another add
on cycle 3
Conflict here

x
x

A
D
E
M
N
R
S
U

2
x

x
x

3
x

x
x
y

5
y

6
y

y
y
x

8
x

9
x

10 11 12 13 14 15 16 17 18 19
y y
x x
y y

x
x
y

y
y

y
y
x

x
x

x
x
y

y
y

y
y

Forbidden latencies: 1 and 2


Steady-state utilization (cycles 4 through 18)
= (5*7)/(8*15) = 35/120 = 29.17%
Total utilization (cycles 1 through 19)
= (5+5*7+2)/(8*19) = 42/152 = 27.63%
11

FP Pipeline: Multiplies Only


1
A
D
E
M
N
R
S
U

1
A
D
E
M
N
R
S
U

x
x

7
x

x
x

y
y
x

7
x

y
x

x
x

Collision vector:
1 indicates forbidden latency
0 indicates allowed latency
Steady-state utilization (cycles 5-24)
= (5*10)/(8*20) = 50/160 = 31.25%
Total utilization (cycles 1-28)
= (5+5*10+5)/(8*28) = 60/224 = 26.79%

Multiply
x
1

10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
y
z
x
y
z
z
z
y

z
y

y
z

x
x
z

x
z

z
x

y
y
x

y
x

x
y

z
z
y

z
y

z
z

z
z

12

FP Pipeline: Adds and Multiplies


1
A
D
E
M
N
R
S
U

3
x

Note out-of-order
completion
Steady-state utilization
(cycles 6-21)
= (4*17)/(8*16) = 68/128
= 53.13%
Total utilization
= (12+4*17+22)/(8*28)
= 85/224 = 37.95%

Add
Subtract
x

x
x

x
x

1
A
D
E
M
N
R
S
U

2
x

m
m

4
a

5
a

m
a

a
m

n
n
m
a
a

7
m

n
m

8
b

n
m
b

9
b

n
b
m

1
A
D
E
M
N
R
S
U

x
x

7
x

Multiply
x

10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
n a a
m b b
n a a
m b b
n
m
m
n
b
b

m
n

m
n
a

m
a
n

n
n
m
a
a

n
m

n
m
b

n
b
m

m
m
n
b
b

m
n

m
n
a

m
a
n

n
n
m
a
a

n
m

n
m
b

n
b

n
b
b

n
n

13

Interrupts, Faults, or Exceptions


I/O
request

Async

Coerced

Between
instr.

Resume

OS call

Sync

User
request

Between
instr.

Resume

Breakpoint Sync

User
request

Between
instr.

Resume

Power fail

Coerced

Within
instr.

Terminate

Async

Synchronous, coerced interrupts that occur within

instructions and after which execution must resume


are the hardest to implement
See Figure A.27 in HP3

14

Precise Interrupts (Sequential Processor)


When interrupt occurs, state of interrupted process is saved,

including PC (= u), registers, and memory


Interrupt is precise if the following three conditions hold
All instructions preceding

u have been executed, and have modified

the state correctly


All instructions following u are unexecuted, and have not modified
the state
If the interrupt was caused by an instruction, it was caused by
instruction u, which is either completely executed (overflow) or
completely unexecuted (VM page fault)
Precise interrupts are desirable if software is to fix up error

that caused interrupt and execution has to be resumed

Easy for external interrupts, could be complex and costly for

internal
Imperative for some interrupts (VM page faults, IEEE FP standard)
15

Problems on Sequential Processors


Instruction modifies state early,

then causes an interrupt

State change must be

undone
Example: First operand of
VAX instruction uses
autodecrement addressing
mode, which writes a
register. Trying to access
second operand causes a
page fault. Since instruction
execution cannot be
completed, we must restore
the register written by
autodecrement to its original
value

Long-running instructions
Not enough to be able to

restore state, must make


progress from interrupt to
interrupt
Example: MVC on IBM 360
copies 256 bytes
No virtual memory, so
interrupts not allowed to stop
MVC

Example: MVC on IBM 370

copies 256 bytes

Has virtual memory, so first


access all pages involved;
after that, no interrupts
allowed

Example: MVCL on IBM 370

copies up to 224 bytes

Has VM; two addresses and


length are in registers
Registers saved and restored
on interrupts (making
progress)

16

Interrupts in MIPS Pipeline


Pipeline stage
IF

ID
EX
MEM

WB

Problem exceptions
Page fault on instruction fetch
Misaligned memory access
Memory-protection violation
Undefined or illegal opcode
Arithmetic exception
Page fault on data fetch
Misaligned memory access
Memory-protection violation
None

How do we stop and restart execution on an interrupt to keep

it precise?
What problems do delayed branches cause?
What happens if multiple exceptions occur in the pipeline?
Can exceptions occur out-of-order?
What problems do multi-cycle instructions cause?

17

MIPS Integer Pipeline, Single Interrupt


u-2
u-1
u
u+1
u+2
TRAP

1
F

2
D
F

3
X
D
F

4
M
X
D
F

5
W
M
X
D
F

10

W
M
X
D
F

W
M
X
D

W
M
X

W
M

Force TRAP instruction in pipeline on next IF


Turn off all writes for faulting instruction and subsequent

instructions
After exception-handling routine in OS receives control, save
PC of faulting instruction
When exception has been handled, the RFE instruction reloads
PC and restarts sequential instruction execution

18

Complications with Delayed Branches


1 branch
2 delay slot
u BTA
u+1
u+2

1
F

2
D
F

3 4 5 6 7 8 9
X M W
D X M W
F D X M W
F D X M W
F D X M W

Suppose instruction 2 causes an exception (e.g., a page fault)

after the taken branch completes (determining that the


branch outcome is true)
Instruction 2 cannot complete
Neither can instruction u

On restart, we do not have sequential execution


We must remember two PC values: 2 and u

19

Complications with Multiple Exceptions

LW
ADD

1
F

2
D
F

3
X
D

4
M
X

5
W
M

6
W

At same cycle, LW takes a data page fault and ADD

takes an arithmetic exception


On an unpipelined machine, LWs exception would
occur first
Handle the page fault
Restart execution

ADD will cause arithmetic exception to occur; handle it then

20

Complications with Out-of-order Exceptions


1
LW
ADD

2
D
F

3
X
D

4
M
X

5
W
M

6
W

LW takes data page fault, ADD takes instruction page

fault
Relative timing differs between unpipelined and
pipelined machines

To maintain precise interrupts, we need to consider both

when they occur and the instructions that caused them


Post exceptions in exception status vector, turn off state
modifications, and check vector in WB unit

21

Complications with Multicycle Operations

DIVF F0, F2, F4


ADDF F10, F10, F8
SUBF F12, F12, F14

1 2 3
F D X
F D
F

4
X
X
D

5
X
X
X

6
X
X
X

7
X
X
X

8
X
M
X

9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
X X X X X X X X X X X X X X X X X X M W
W
M W

Instructions are independent (no hazards) and therefore issue

immediately
Differences in running times causes out-of-order termination
DIVF throws arithmetic exception late in its execution
At that point, ADDF and SUBF have both completed execution
and destroyed one of their operands
Can we maintain precise interrupts under these conditions?

22

FP Pipeline Exceptions: Solns. 1 and 2


Settle for imprecise interrupts (CRAY, with

checkpointing)

Done on Alpha 21064 and 21164, IBM Power-1 and Power-2,

MIPS R8000 by supporting a fast imprecise mode and a slow


precise mode
Not an option if you have to support virtual memory or IEEE
floating point standard

Software finishes certain instructions (SPARC)


Keep enough state around for trap handler to create a precise
sequence for exception and finish work for some instruction
stages
Only FP instructions cause this problem
1
F

2
D
F

3
X
D
F

4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
X X X X X X X X X X X X X X M W
X X X X X X X X M W
D X X X X X X X X M W
F D X X X X M W

23

FP Pipeline Exceptions: Solns. 3 and 4


Stalling (MIPS R2000/3000, MIPS R4000, Pentium)
An instruction is allowed to issue only if it is certain that all
the instructions before the issuing instruction will complete
without causing an exception
To prevent excessive stalling, FP units must decide on
possibility of exceptions early in pipeline
General methods (PowerPC 620, MIPS R10000)
Reorder buffer, history file, future file
An instruction is allowed to finalize its writes only when all
previously issued instructions are complete
More naturally used in connection with ILP (Chapter 4)
Significant complexity (to be discussed later)

24

Você também pode gostar