Escolar Documentos
Profissional Documentos
Cultura Documentos
Chapter 4
Sections 4.5 – 4.8
2
Introduction
Single-cycle datapath
Simple!
Hardware replication?
Cycle time?
Multi-cycle datapath
More involved
Less HW replication of major units
Better performance if the delay of major functional
units is balanced!
Can we do any better?
Pipelining!
3
Introduction
Pipelining
In Multi-cycle, only one major unit is used in each
cycle while other units are idle!
Why not to use them to do something else?
Basically, start the next instruction before the
current one is finished!
4
Introduction
Pipelining
The time required to execute one instruction
(Instruction latency) is not affected!
However, the number of instructions finished per
unit time (Throughput) is increased
Thus, Pipelining improves the throughput not
latency!
Most modern processors are pipelined!
Notes
As in multi-cycle, the cycle time is determined by
the slowest unit!
However, similar to single-cycle, we can get one
instruction done every cycle!
It is assumed that all instructions take the same
number of cycles!
5
Introduction
Single Cycle Implementation:
Cycle 1 Cycle 2
Clk
lw sw Waste R-type
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9Cycle 10
Clk
lw sw R-type
IFetch Dec Exec Mem WB IFetch Dec Exec Mem IFetch
Pipeline Implementation:
lw IFetch Dec Exec Mem WB
ALU
n Inst 1 IM Reg DM Reg
instruction is
s
completed every
t
cycle, so CPI = 1
ALU
r. Inst 2 IM Reg DM Reg
(similar to Single-
cycle)
O
ALU
IM Reg DM Reg
r Inst 3
d
e
ALU
IM Reg DM Reg
r Inst 4
Speeup = 1,000,000,000 /
200,000,800 = 4.99998
(very close to the number of stages)
9
Why Pipelining?
Example 1. Comparing pipelining to single-cycle
2) Determine the time required to finish executing the first 3
LOAD instructions and compute the speed up of pipelining
Single-cycle
TimeSC = 1000 x 3 = 3000 ps
Pipelining
TimePP = 200 x 5 +200 + 200 = 1400 ps
For 3 instructions
+
4 Shift +
left 2
IFetch/Dec
Read Addr 1
Instruction Data
Register Read
Memory Memory
Read Addr 2Data 1
Exec/Mem
Dec/Exec
Read File Read
PC
Mem/WB
Address ALU Address
Write Addr Read Data
Data 2 Write Data
Write Data
Sign
16 Extend 32
System Clock
14 Any problem?
Pipelined MIPS Datapath
IF ID EX MEM WB
+
4 Shift +
left 2
IFetch/Dec
Read Addr 1
Instruction Data
Register Read
Memory Memory
Read Addr 2Data 1
Exec/Mem
Dec/Exec
Read File Read
PC
Mem/WB
Address ALU Address
Write Addr Read Data
Data 2 Write Data
Write Data
Sign
16 Extend 32
System Clock
16
Pipelined MIPS Datapath
Example 2. Execution of LW instruction
(2) Instruction Decode and Read Registers: Store Reg[rs],
Reg[rt], sign extended offset , rd, rt, and the updated PC (why?) in the
ID/EX register
17
MIPS Pipelining
Example 2. Execution of LW instruction
(3) Execute Or Address Calculation: Store branch address,
Reg[rt], result, and zero flag in the EX/MEM register
18
Pipelined MIPS Datapath
Example 2. Execution of LW instruction
(4) Memory Access: Store the data from memory into
MEM/WB register
19
Pipelined MIPS Datapath
Example 2. Execution of LW instruction
(5) Write Back: Copy the data loaded in the MEM/WB
register to register file
20
Pipelined MIPS Datapath
Required data fields in the pipelining registers
Data fields are moved from one pipeline register to
another every clock cycle until they are no longer
needed
Pipeline
Data Fields Register Size
Register
IF/ID Instruction and PC 64 bits
PC, Reg[rs], Reg[rt], sign-extended
ID/EX 138 bits
offset, rt, rd
Branch address, Zero, ALU result,
EX/MEM Reg[rt], Destination register address (rt 103 bits
or rd)
ALU Result, Data from memory,
MEM/WB 69
Destination register address
21
Pipelined MIPS Control
All control signals can be determined during Decode stage while they
are needed in later stages!
Solution! Expand the pipeline registers to store and move the control
signals between stages until they are needed
22
Pipelined MIPS Control
Define the control signals and generate them in the decode stage
For the time being, no explicit write signals are required for the
pipeline registers since the are updated every cycle
23
Pipelined MIPS Control
Control signals needed in each stage
24
MIPS Pipeline
Example 3. Given the code segment and the register
contents below, show the contents of the data and control
fields in the pipeline registers if the sixth instruction has
been fetched (i.e. the beginning of cycle 7)
Register Contents
$1 1
Address Instruction
$2 5
0x00000000 lw $10, 20($1)
$3 3
0x00000004 sub $11,$1,$2
$4 -6
0x00000008 add $12,$3,$4
$5 2
0x0000000c lw $13, 24($1)
$6 7
0x00000010 add $3,$2,$1
$11 12
0x00000014 Sub $1,$5,$6
$12 -15
25 $13 10
MIPS Pipeline
Example 3. Multi-cycle diagram Time
ALU
lw $10, 20($1) IM Reg DM Reg
I
n
sub $11,$1,$2
ALU
s IM Reg DM Reg
t
r.
add $12,$3,$4
ALU
IM Reg DM Reg
O
r
ALU
d lw $13, 24($1) IM Reg DM Reg
e
r
ALU
add $3,$2,$1 IM Reg DM Reg
ALU
sub $1,$5,$6 IM Reg DM Reg
26
MIPS Pipeline
Example 3. Single-cycle diagram
sub $1,$5,$6 add $3,$2,$1 lw $13, 24($1) add $12,$3,$4 sub $11,$1,$2
27
MIPS Pipeline
Example 3.
At the beginning of cycle 7, the sixth instruction is stored
in the IF/ID register while the data and control for earlier
instructions are pushed to next pipeline registers and the
register files. Thus,
IF/ID register
No control signals are stored
Store the instruction sub $1,$5,$6 and PC+4
IF/ID.Instruction = 0x00A60822
IF/ID.PC = 0x00000018
28
MIPS Pipeline
Example 3.
ID/EX register
Store the information of add $3,$2,$1 and PC+4
ID/EX.PC = 0x00000014
ID/EX.RegRsContents = 0x00000005
ID/EX.RegRtContents = 0x00000001
ID/EX.RegRt = (00001)2
ID/EX.RegRd = (00011)2
ID/EX.SignExtend = 0x00001820
Control Information
ID/EX.MemToReg = 0
ID/EX.RegWrite = 1
ID/EX.MemRead = 0
ID/EX.MemWrite = 0
ID/EX.Branch = 0
ID/EX.ALUSrc = 0
ID/EX.RegDst = 1
ID/EX.ALUOp = (10)2
29
MIPS Pipeline
Example 3.
EX/MEM register
lw
ALU
I Mem Reg Mem Reg
n
s
Inst 1
ALU
t Mem Reg Mem Reg
r.
ALU
O Inst 2 Mem Reg Mem Reg
r
d
ALU
e Inst 3 Mem Reg Mem Reg
r
ALU
Inst 4 Mem Reg Mem Reg
34
Solution: Use two memories; Data and Instruction!
Structural Hazards
Single Register File!
Time (clock cycles)
One instruction is
writing and the
add $1,
ALU
I IM Reg DM Reg other is reading
n the register file?
s
Inst 1 Solution: Design
ALU
t IM Reg DM Reg
the register file to
r. write in the first
half of the cycle
ALU
O Inst 2 IM Reg DM Reg and read in the
r second half!
d
ALU
e add $2,$1, IM Reg DM Reg
r
ALU
add $1, IM Reg DM Reg
ALU
sub $4,$1,$5 IM Reg DM Reg
ALU
and $6,$1,$7 IM Reg DM Reg
ALU
or $8,$1,$9 IM Reg DM Reg
ALU
IM Reg DM Reg
xor $4,$1,$5
add $1,
ALU
I IM Reg DM Reg
n
s
t stall
r.
O stall
r
d
sub $4,$1,$5
ALU
e IM Reg DM Reg
r
ALU
and $6,$1,$7 IM Reg DM Reg
ALU
lw $1,5($s1) IM Reg DM Reg
ALU
sub $4,$1,$5 IM Reg DM Reg
ALU
and $6,$1,$7 IM Reg DM Reg
ALU
or $8,$1,$9 IM Reg DM Reg
ALU
IM Reg DM Reg
xor $4,$1,$5
lw $1,
ALU
I IM Reg DM Reg
n
s
t stall
r.
O stall
r
d
sub $4,$1,$5
ALU
e IM Reg DM Reg
r
ALU
and $6,$1,$7 IM Reg DM Reg
39
Data Hazards
Example 4. how many cycles are actually required to
execute the following code? Assume the pipeline is
already full. Ideally, and since the pipeline
is full, each instruction
requires 1 cycle. Thus, we
need 6 cycles (CPI =6/6= 1).
add $1, $2, $5 However, …
add $5, $3, $1
Register-use data hazard
sub $10, $7, $8 Adds 2 cycles by stalls
ALU
add $1, IM Reg DM Reg
I
n
s
ALU
t IM Reg DM Reg
sub $4,$1,$5
r.
ALU
IM Reg DM Reg
r and $6,$1,$7
d
e
ALU
r or $8,$1,$9 IM Reg DM Reg
ALU
IM Reg DM Reg
xor $4,$1,$5
42
No Stalls!
Data Hazards
Forwarding Hardware implementation
if (EX/MEM.RegWrite
and (EX/MEM.RegRd != 0)
and (EX/MEM.RegRd = ID/EX.RegRs))
then ForwardA = From EX/MEM
if (EX/MEM.RegWrite
and (EX/MEM.RegRd != 0)
and (EX/MEM.RegRd = ID/EX.RegRt))
then ForwardB = From EX/MEM
if (MEM/WB.RegWrite
and (MEM/WB.RegRd != 0)
and (MEM/WB.RegRd = ID/EX.RegRs))
then ForwardA = From MEM/WB
if (MEM/WB.RegWrite
and (MEM/WB.RegRd != 0)
and (MEM/WB.RegRd = ID/EX.RegRt))
then ForwardB = From MEM/WB
45
Data Hazards
Can the forwarding hardware be used with Load-use
data hazard?
ALU
I lw $1,4($2) IM Reg DM Reg
n
s
ALU
sub $4,$1,$5 IM Reg DM Reg
t
r.
ALU
IM Reg DM Reg
O and $6,$1,$7
r
d
ALU
IM Reg DM Reg
e or $8,$1,$9
r
ALU
IM Reg DM Reg
xor $4,$1,$5
46
We still need 1 Stall for the instruction following the load?
Data Hazards
How to stall the pipeline?
The control signals of the instruction in the decode stage are stored as
0’s (WHY?) in the ID/EX need a multiplexor for the control signals
48
Data Hazards
Stall Implementation
Inside hazard detection unit
if (ID/EX.MemRead
and [(ID/EX.RegRt == IF/ID.RegRs) or
(ID/EX.RegRt == IF/ID.RegRt)])
then
PCWrite = 0
IF/IDWrite = 0
Select 0’s as control signals
Any Problem?
Do we need to stall in all cases?
How about j and jal that come immediately after load with rs and/or rt
fields being the same as the rt field of the load?
49
Data Hazards
Example 5. Consider the following code segment in C
A=B+E
C=B+F
50
Data Hazards
Example 5. Ideally, each instruction
requires 1 cycle after the
pipeline is full. Thus, we
need (5+7-1) cycles.
lw $t1, 4($t0) # loads B CPI = 11/7 = 1.57
lw $t2, 12($t0) # loads E
add $t3, $t1, $t2 #A=B+E Load-use data hazard
Adds 1 cycle as a stall
sw $t3, 0($t0) # stores A
lw $t4, 16($t0) # loads F
add $t5, $t1, $t4 #C=B+F Load-use data hazard
Adds 1 cycle as a stall
sw $t5, 8($t0) # stores C
Solution
Ideally, the average CPI is 1 for each instruction
With no forwarding
Load-use hazards add two cycles
Register-use hazards add two cycles
Branch
Inst1
Inst2
Inst3
Solution!
55 Once it is known that the instruction is branch, then stall the pipeline for 3
cycles? Is it actually a stall?
Control Hazards
beq
ALU
I IM Reg DM Reg
n
s
t
stall
r.
stall
O
r stall
d
ALU
e IM Reg DM Reg
r
Inst
ALU
IM Reg DM
Inst
58
Control Hazards
Reducing the Cost of Branch Hazard
beq
ALU
IM Reg DM Reg
stall
ALU
IM Reg DM Reg
lw
Performance Issues
Consider branching in loops? EXAMPLE?
62
Control Hazards
Approach II – Dynamic Branch Prediction
2-bit Branch Predictor
Basically we have four states
two bits are used to store the prediction
Prediction state is changed when prediction is wrong twice
63
Control Hazards
Example 7. Consider a certain program that have a
conditional branch instruction whose actual outcome
is given below when the program is executed.
T-T-N-T-T-N-T
List predictions for the following branch prediction
schemes and find the prediction accuracy.
1. Predict always taken
2. Predict always not taken
3. 1-bit predictor, initialized to predict taken
4. 2-bit predictor, initialized to weakly predict taken
64
Control Hazards
Example 7.
Actual branch actions : T-T-N-T-T-N-T
Predict as always taken
Predictions : T-T-T-T-T-T-T
Accuracy = 5/7 = 71%
Predict as always not taken
Predictions : N-N-N-N-N-N-N
Accuracy = 2/7 = 29%
1-bit predictor initialized to predict taken
Predictions: T-T-T-N-T-T-N
Accuracy = 3/7 = 43%
2-bit predictor initialized to weakly predict taken
Predictions: T-T-T-T-T-T-T
Accuracy = 5/7 = 71%
65
Pipelining Performance
Example 8. Let’s compare the performance of single-cycle, multi-cycle, and
pipeline implementation of MIPS processor given the operation times and
instruction mix below.
For the pipelined implementation, assume that:
1) Branch decision is done in the MEM cycle. Branch handling in the pipeline
implementation is done by stalling the pipeline.
2) Half of the load instructions incur load-use hazard.
3) Forwarding is implemented.
4) The jump instruction is completed in the ID stage
68
Exceptions & Interrupts
Exceptions and interrupts are unexpected events
that require the change in the flow
The two terms are used interchangeably and
depending is ISA
Intel x86 uses the term interrupt only
In MIPS
Exceptions: any internal unexpected change in the flow (undefined
opecode, overflow, system calls)
Interrupts: the event is external (I/O controller request)
69
Exceptions & Interrupts
In MIPs, when an exception is generated, the
following sequence of steps are taken
The address of the offending instruction is saved into a
special called the Exception Program Counter (EPC).
The cause of the exception is saved in a special register
called the Cause Register.
The control is transferred to the operating system by
loading a special address (0x8000 00180) into the PC.
The code loaded starting at this address
Determines what actions will be done by the operating system in
response to the exception based on the value found in the Cause
Register. The operating system may terminate the program or
resume the execution using the value found in the EPC
70
Overflow Exception
Modifications to the Datapath
71
Fallacies
72
Reading Assignment
Read the following from the textbook
73