Você está na página 1de 33

CS420/520 Computer Architecture I

Single Cycle Datapath & Control


Dr. Xiaobo Zhou Department of Computer Science

CS420/520 datapath.1

UC. Colorado Springs

Adapted from UCB97 & UCB03

The Big Picture: Where are We Now?

The Five Classic Components of a Computer


Processor Input Control Memory Datapath

Output

Todays Topic: Datapath Design

CS420/520 datapath.2

UC. Colorado Springs

Adapted from UCB97 & UCB03

The Big Picture: The Performance Perspective


CPI

Performance of a machine was determined by: Instruction count Clock cycle time Clock cycles per instruction Inst. Count Processor design (datapath and control) will determine: Clock cycle time Clock cycles per instruction In the next two lectures: Single cycle processor: - Advantage: One clock cycle per instruction - Disadvantage: long cycle time

Cycle Time

CS420/520 datapath.3

UC. Colorado Springs

Adapted from UCB97 & UCB03

Review: Execution Cycle


Instruction Fetch Instruction Decode Operand Fetch Execute Result Store Next Instruction
CS420/520 datapath.4 UC. Colorado Springs Adapted from UCB97 & UCB03

Obtain instruction from program storage

Determine required actions and instruction size

Locate and obtain operand data

Compute result value or status Deposit results in storage (Register/Memory) for later use Determine successor instruction

Datapath Design Procedure


5 steps to design a processor
1. Analyze instruction set => datapath requirements 2. Select set of datapath components & establish clock methodology 3. Assemble datapath meeting the requirements 4. Analyze implementation of each instruction to determine setting of control points that effects the register transfer. 5. Assemble the control logic

MIPS makes it easier


Instructions same size/length Source registers always in same place Immediates same size, location Operations always on registers/immediates

Single cycle datapath => CPI=1, CCT => long


CS420/520 datapath.5 UC. Colorado Springs Adapted from UCB97 & UCB03

The MIPS Instruction Formats


All MIPS instructions are 32 bits long. The three instruction formats:
31 26 op 6 bits 31 op 6 bits 31 op 6 bits 26 target address 26 bits 26 rs 5 bits rs 5 bits 21 rt 5 bits 21 rt 5 bits 16 immediate 16 bits 0 16 rd 5 bits 11 shamt 5 bits 6 funct 6 bits 0 0

R-type I-type J-type

The different fields are: op: operation of the instruction rs, rt, rd: the source and destination register specifiers shamt: shift amount funct: selects the variant of the operation in the op field address / immediate: address offset or immediate value target address: target address of the jump instruction
CS420/520 datapath.6 UC. Colorado Springs Adapted from UCB97 & UCB03

The MIPS Subset


31 op 6 bits 31 26 op 6 bits rs 5 bits 26 rs 5 bits 21 rt 5 bits 21 rt 5 bits 16 immediate 16 bits 16 rd 5 bits 11 shamt 5 bits 6 funct 6 bits 0 0

ADD and subtract add rd, rs, rt sub rd, rs, rt OR Immediate: ori rt, rs, imm16 LOAD and STORE lw rt, rs, imm16 sw rt, rs, imm16 BRANCH: beq rs, rt, imm16 JUMP: j target

31 op

26 target address 26 bits


Adapted from UCB97 & UCB03

6 bits
CS420/520 datapath.7 UC. Colorado Springs

The Design is to Represent: ALU Central Part of CPU


(1) Functional Specification Inputs: 2 x 32 bit operands- A, B; 1 bit carry input- Cin. Outputs: 1 x 32 bit result- S; 1 bit carry output- Co. Operations: ADD (A plus B plus Cin), SUB (A minus B minus Cin), AND, OR, XOR Performance: left unspecified for now! (2) Block Diagram Understand the data and control flows 32 A Co B 32 3 M Cin 32
CS420/520 datapath.8 UC. Colorado Springs Adapted from UCB97 & UCB03

ALU
S

mode/function

4 Hardware Building Blocks


AND gate (c = a & b)

a b
OR gate (c = a | b)

a 0 1 0 1 a 0 1 0 1 a 0 1 d 0 1

b 0 0 1 1 b 0 0 1 1

c=a&b 0 0 0 1 c=a|b 0 1 1 1

a b
Inverter (c = !a )

a
Multiplexor if d==0, c=a; otherwise c= b

c d

c=!A 1 0 c a b
Adapted from UCB97 & UCB03

a b

0 1

CS420/520 datapath.9

UC. Colorado Springs

A One Bit ALU


This 1-bit ALU will perform AND, OR, and ADD
CarryIn A ALUop

Mux

Result

1-bit Full Adder CarryOut

CS420/520 datapath.10

UC. Colorado Springs

Adapted from UCB97 & UCB03

A 1-bit ALU and a 4-bit ALU


This 1-bit ALU will perform AND, OR, and ADD 1-bit ALU
ALUop CarryIn A A0 B0 A1 Result Mux B1 A2 1-bit Full Adder CarryOut B2 A3 B3 CarryIn0 1-bit Result0 ALU CarryIn1 CarryOut0 1-bit Result1 ALU CarryIn2 CarryOut1 1-bit Result2 ALU CarryIn3 CarryOut2 1-bit ALU CarryOut3
Adapted from UCB97 & UCB03

4-bit ALU

Result3

CS420/520 datapath.11

UC. Colorado Springs

How About Subtraction?


Keep in mind the followings: (A - B) is the that as: A + (-B) 2s Complement: Take the inverse of every bit and add 1 Bit-wise inverse of B is !B: A + !B + 1 = A + (!B + 1) = A + (-B) = A - B
Subtract A 4 ALU 4 CarryIn Zero Result

4 4 !B

Sel 0 2x1 Mux 1 4

CarryOut
Adapted from UCB97 & UCB03

CS420/520 datapath.12

UC. Colorado Springs

Zero Detection Logic


A = B is the same as A - B = 0 Zero Detection Logic is just a one BIG NOR gate Any non-zero input to the NOR gate will cause its output to be zero
CarryIn0 A0 B0 A1 B1 A2 B2 A3 B3
CS420/520 datapath.13

Result0 1-bit ALU CarryIn1 CarryOut0 Result1 1-bit ALU CarryIn2 CarryOut1 Result2 1-bit ALU CarryIn3 CarryOut2 1-bit ALU Result3

a 0 1 0 1

b 0 0 1 1

c = a NOR b 1 0 0 0
Zero

CarryOut3
UC. Colorado Springs Adapted from UCB97 & UCB03

The Disadvantage of Ripple Carry


The adder we just built is called a Ripple Carry Adder The carry bit may have to propagate from LSB to MSB Worst case delay for a N-bit adder: 2N-gate delay
CarryIn0 A0 B0 A1 B1 A2 B2 A3 B3 1-bit Result0 ALU CarryIn1 CarryOut0 1-bit Result1 ALU CarryIn2 CarryOut1 1-bit Result2 ALU CarryIn3 CarryOut2 1-bit ALU CarryOut3
CS420/520 datapath.14 UC. Colorado Springs Adapted from UCB97 & UCB03

CarryIn A

CarryOut

Result3

Carry Select Header


Consider building a 8-bit ALU Simple: connects two 4-bit ALUs in series (ripple carry)

A[3:0] 4 ALU

CarryIn Result[3:0] 4

B[3:0] 4 A[7:4] 4 Result[7:4] 4 ALU

B[7:4] 4 CarryOut

CS420/520 datapath.15

UC. Colorado Springs

Adapted from UCB97 & UCB03

Carry Select Header (Continue)


Consider building a 8-bit ALU Expensive but faster: uses three 4-bit ALUs A[3:0]
4 Result[3:0] 4 ALU 0 A[7:4] 4 X[7:4] 4 A[7:4] C0 4 Y[7:4] 4 C1 1 Sel C4 1 ALU 0 1 ALU B[3:0] 4 C4 Sel 2 to 1 MUX Result[7:4] 4 CarryIn

B[7:4] 4

B[7:4] 4 0 2 to 1 MUX CarryOut


CS420/520 datapath.16 UC. Colorado Springs Adapted from UCB97 & UCB03

Combinational Logic Elements


Adder
CarryIn A 32 Adder

32

Sum CarryOut

32 Select

MUX
A B

32 32

MUX

32

ALU
A 32

OP ALU

CI Overflow 32 Result Zero

32

Co
UC. Colorado Springs Adapted from UCB97 & UCB03

CS420/520 datapath.17

Storage Element: Register File


Register File consists of 32 registers: Two 32-bit output busses: busA and busB One 32-bit input bus: busW Register is selected by: RA selects the register to put on busA RB selects the register to put on busB RW selects the register to be written via busW when Write Enable is 1
RW RA RB Write Enable 5 5 5 busA busW 32 Clk 32 32-bit Registers 32 busB 32

Clock input (CLK) The CLK input is a factor ONLY during write operation During read operation, RF behaves as a combinational logic block: - RA or RB valid => busA or busB valid after access time.

CS420/520 datapath.18

UC. Colorado Springs

Adapted from UCB97 & UCB03

Storage Element: Idealized Memory


Write Enable Address

Memory (idealized) One input bus: Data In One output bus: Data Out

Data In 32 Clk

DataOut 32

Memory word is selected by: Address selects the word to put on Data Out Write Enable = 1: address selects the memory word to be written via the Data In bus Clock input (CLK) The CLK input is a factor ONLY during write operation During read operation, IM behaves as a combinational logic block: - Address valid => Data Out valid after access time.

CS420/520 datapath.19

UC. Colorado Springs

Adapted from UCB97 & UCB03

An Abstract View of the Critical Path


Register file and ideal memory: The CLK input is a factor ONLY during write operation During read operation, behave as combinational logic: - Address valid => Output valid after access time.
Critical Path (Load Operation) = PCs Clk-to-Q + Instruction Memorys Access Time + Register Files Access Time + ALU to Perform a 32-bit Add + Data Memory Access Time + Setup Time for Register File Write + Clock Skew 32 32 B ALU Data Address Data In Clk

Ideal Instruction Memory Instruction Address Next Address 32 PC

Instruction Rd Rs 5 5 Rt 5 Imm 16 A Rw Ra Rb 32 32-bit Registers Clk

Ideal Data Memory

Clk

32
UC. Colorado Springs

CS420/520 datapath.20

Adapted from UCB97 & UCB03

Clocking Methodology
Clk Setup Hold Dont Care State . signals . . . . . Combinational logic Clk Clk . . . State . . . Setup Hold

A clocking methodology defines when signals can be read and written. For simplicity, we suppose an edge-triggered clocking methodology. All storage elements are clocked by the same clock edge Edge-triggered: all stored values are updated on a clock edge
CS420/520 datapath.21 UC. Colorado Springs Adapted from UCB97 & UCB03

Overview of the Instruction Fetch Unit


The common operations Fetch the Instruction: mem[PC] Update the program counter: - Sequential Code: PC <- PC + 4 - Branch and Jump PC <- something else

Clk

PC Next Address Logic Address Instruction Memory

Instruction Word 32

CS420/520 datapath.22

UC. Colorado Springs

Adapted from UCB97 & UCB03

The ADD Instruction


31 op 6 bits 26 rs 5 bits 21 rt 5 bits 16 rd 5 bits 11 shamt 5 bits 6 funct 6 bits 0

add

rd, rs, rt Fetch the instruction from memory The actual operation Calculate the next instructions address

mem[PC] R[rd] <- R[rs] + R[rt] PC <- PC + 4

CS420/520 datapath.23

UC. Colorado Springs

Adapted from UCB97 & UCB03

The Subtract Instruction


31 op 6 bits 26 rs 5 bits 21 rt 5 bits 16 rd 5 bits 11 shamt 5 bits 6 funct 6 bits 0

sub

rd, rs, rt Fetch the instruction from memory The actual operation Calculate the next instructions address

mem[PC] R[rd] <- R[rs] - R[rt] PC <- PC + 4

CS420/520 datapath.24

UC. Colorado Springs

Adapted from UCB97 & UCB03

Datapath for Register-Register Operations


R[rd] <- R[rs] op R[rt] Example: add rd, rs, rt Ra, Rb, and Rw comes from instructions rs, rt, and rd fields ALUctr and RegWr: control logic after decoding the instruction
31 op 6 bits 26 rs 5 bits Rd Rs 5 5 Rt 5 busA 32 busB 32 ALU Result 32 21 rt 5 bits 16 rd 5 bits 11 shamt 5 bits 6 funct 6 bits 0

RegWr

ALUctr

Rw Ra Rb busW 32 Clk 32 32-bit Registers

CS420/520 datapath.25

UC. Colorado Springs

Adapted from UCB97 & UCB03

The OR Immediate Instruction


31 op 6 bits 26 rs 5 bits 21 rt 5 bits 16 immediate 16 bits 0

ori

rt, rs, imm16 mem[PC] Fetch the instruction from memory

R[rt] <- R[rs] or ZeroExt(imm16) The OR operation PC <- PC + 4


31 0000000000000000 16 bits

Calculate the next instructions address


16 15 immediate 16 bits 0

CS420/520 datapath.26

UC. Colorado Springs

Adapted from UCB97 & UCB03

Datapath for Logical Operations with Immediate


R[rt] <- R[rs] op ZeroExt[imm16] ]
31 op 6 bits 31 0000000000000000 16 bits Rt Rs Rt? 5 5 busA 32 busB Mux 32 ZeroExt imm16 Result 32 ALU 26 rs 5 bits 21 rt 5 bits 16 15 rd? 16 11 immediate 16 bits 0 immediate 16 bits 0

Rd RegDst Mux

RegWr 5 busW 32 Clk

ALUctr

Rw Ra Rb 32 32-bit Registers

16

32 ALUSrc
Adapted from UCB97 & UCB03

CS420/520 datapath.27

UC. Colorado Springs

The Load Instruction


31 op 26 rs 5 bits 6 bits 21 rt 5 bits 16 immediate 16 bits 0

lw

rt, rs, imm16 mem[PC]

Fetch the instruction from memory

Addr <- R[rs] + SignExt(imm16) Calculate the memory address R[rt] <- Mem[Addr] Load the data into the register PC <- PC + 4
31

Calculate the next instructions address


0 immediate 16 bits 0 immediate 16 bits
Adapted from UCB97 & UCB03

16 15 0 0000000000000000 16 bits 31 16 15 1111111111111111 1 16 bits

CS420/520 datapath.28

UC. Colorado Springs

Datapath for Load Operations


R[rt] <- Mem[R[rs] + SignExt[imm16]]
31 op Rd RegDst 6 bits Rt 26 rs 5 bits 21 rt 5 bits 16 rd

Example: lw
11 immediate 16 bits

rt, rs, imm16


0

Mux RegWr 5

Rs Rt? 5 5 busA 32 busB 32 Extender 32

ALUctr MemToReg ALU

busW 32 Clk

Rw Ra Rb 32 32-bit Registers

32 MemWr WrEn Adr

Mux

Mux ?? ALUSrc Data In 32 Clk

imm16

16

Data Memory

32

ExtOp
CS420/520 datapath.29 UC. Colorado Springs Adapted from UCB97 & UCB03

The Store Instruction


31 op 6 bits 26 rs 5 bits 21 rt 5 bits 16 immediate 16 bits 0

sw

rt, rs, imm16 Fetch the instruction from memory

mem[PC]

Addr <- R[rs] + SignExt(imm16) Calculate the memory address Mem[Addr] <- R[rt] PC <- PC + 4 Store the register into memory Calculate the next instructions address

CS420/520 datapath.30

UC. Colorado Springs

Adapted from UCB97 & UCB03

Datapath for Store Operations


Mem[ R[rs] + SignExt[imm16] <- R[rt] ] Example: sw
31 op 6 bits Rt Rd RegDst Mux RegWr 5 busW 32 Clk Rs 5 5 busA 32 busB 32 Extender 32 ALU Rt 26 rs 5 bits 21 rt 5 bits ALUctr 16 immediate 16 bits MemWr MemToReg

rt, rs, imm16


0

Rw Ra Rb 32 32-bit Registers

32 Mux WrEn Adr

Mux Data In 32 Clk ALUSrc

imm16

16

Data Memory

32

ExtOp
CS420/520 datapath.31

UC. Colorado Springs

Adapted from UCB97 & UCB03

The Branch Instruction


31 op 6 bits 26 rs 5 bits 21 rt 5 bits 16 immediate 16 bits 0

beq

rs, rt, imm16 Fetch the instruction from memory Calculate the branch condition

mem[PC] Cond <- R[rs] - R[rt]

if (COND eq 0) Calculate the next instructions address - PC <- PC + 4 + ( SignExt(imm16) x 4 ) else - PC <- PC + 4

CS420/520 datapath.32

UC. Colorado Springs

Adapted from UCB97 & UCB03

Datapath for Branch Operations


beq rs, rt, imm16
31 op 6 bits Rd RegDst Mux RegWr 5 busW 32 Clk Rs 5 5 busA 32 busB 32 Extender 32 ALUSrc ExtOp
CS420/520 datapath.33 UC. Colorado Springs Adapted from UCB97 & UCB03

We need to compare Rs and Rt!


21 rs 5 bits rt 5 bits 16 immediate 16 bits Branch Rt PC Clk 0

26

Rt

ALUctr

imm16 16

Next Address Logic

Rw Ra Rb 32 32-bit Registers

ALU

Zero To Instruction Memory

Mux

imm16

16

Binary Arithmetics for the Next Address


In theory, the PC is a 32-bit byte address into the instruction memory: Sequential operation: PC<31:0> = PC<31:0> + 4 Branch operation: PC<31:0> = PC<31:0> + 4 + SignExt[Imm16] * 4 The magic number 4 always comes up because: The 32-bit PC is a byte address And all our instructions are 4 bytes (32 bits) long In other words: The 2 LSBs of the 32-bit PC are always zeros There is no reason to have hardware to keep the 2 LSBs In practice, we can simplify the hardware by using a 30-bit PC<31:2>: Sequential operation: PC<31:2> = PC<31:2> + 1 Branch operation: PC<31:2> = PC<31:2> + 1 + SignExt[Imm16] In either case: Instruction Memory Address = PC<31:2> concat 00
CS420/520 datapath.34 UC. Colorado Springs Adapted from UCB97 & UCB03

Next Address Logic: Expensive and Fast Solution


Using a 30-bit PC: Sequential operation: PC<31:2> = PC<31:2> + 1 Branch operation: PC<31:2> = PC<31:2> + 1 + SignExt[Imm16] In either case: Instruction Memory Address = PC<31:2> concat 00
30 Addr<31:2> 30 30 1 Clk imm16 Instruction<15:0> 16 SignExt 00 0 30 Adder Mux Adder Addr<1:0> Instruction Memory 32 PC
CS420/520 datapath.35

1 30

30 Instruction<31:0> Branch Zero

UC. Colorado Springs

Adapted from UCB97 & UCB03

Next Address Logic: Cheap and Slow Solution


Why is this slow? Cannot start the address add until Zero (output of ALU) is valid Does it matter that this is slow in the overall scheme of things? Probably not here. Critical path is the load operation.
30 Addr<31:2> 30 0 Clk imm16 16 Instruction<15:0> 0 Mux 1 Carry In Adder 00 Addr<1:0> Instruction Memory 32 PC
CS420/520 datapath.36

1 30 30

30

SignExt

Instruction<31:0> Branch Zero


UC. Colorado Springs Adapted from UCB97 & UCB03

The Jump Instruction


31 op 6 bits 26 target address 26 bits 0

target mem[PC] Fetch the instruction from memory

PC<31:2> <- PC<31:28> concat target<25:0> Calculate the next instructions address

CS420/520 datapath.37

UC. Colorado Springs

Adapted from UCB97 & UCB03

Instruction Fetch Unit


j target PC<31:2> <- PC<31:28> concat target<25:0>
30 PC<31:28> Target 4 Instruction<25:0> 26 PC Clk imm16 Instruction<15:0> 16
CS420/520 datapath.38

30 00 30 1 Mux 0

Addr<31:2> Addr<1:0> Instruction Memory 32

30 1

Adder Adder SignExt 30

0 30 Mux 1 30

Jump

Instruction<31:0>

Branch
UC. Colorado Springs

Zero
Adapted from UCB97 & UCB03

Putting it All Together: A Single Cycle Datapath


We have everything except control signals (underline), to be continued ...
Branch Rd RegDst Rt Rs 5 5 Rt Jump Clk ALUctr Instruction Fetch Unit Instruction<31:0> <21:25> <16:20> <11:15> <0:15>

1 Mux 0 RegWr 5

Rs Zero ALU

Rt

Rd

Imm16 MemtoReg 0 Mux

busW 32 Clk

busA Rw Ra Rb 32 32 32-bit Registers busB 0 32 Extender 1 32

MemWr

32 32 WrEn Adr

Mux Data In 32 Clk

imm16

16

Data Memory

ALUSrc ExtOp
CS420/520 datapath.39 UC. Colorado Springs Adapted from UCB97 & UCB03

Summary of Datapath Design


5 steps to design a processor
1. Analyze instruction set => datapath requirements 2. Select set of datapath components & establish clock methodology 3. Assemble datapath meeting the requirements 4. Analyze implementation of each instruction to determine setting of control points that effects the register transfer. 5. Assemble the control logic

MIPS makes it easier


Instructions same size Source registers always in same place Immediates same size, location Operations always on registers/immediates

Single cycle datapath => CPI=1, CCT => long What is next?: implementing control
CS420/520 datapath.40 UC. Colorado Springs Adapted from UCB97 & UCB03

The Big Picture: Where are We Now?

The Five Classic Components of a Computer


Processor Input Control Memory Datapath

Output

Topic: Designing the Control for the Single Cycle Datapath

CS420/520 datapath.41

UC. Colorado Springs

Adapted from UCB97 & UCB03

An Abstract View of the Control Implementation

Ideal Instruction Memory Instruction Address Next Address 32 PC

Control
Instruction Rd Rs 5 5 Rt 5 A Rw Ra Rb 32 32-bit Registers 32 B ALU 32 Data Address Data In Clk Data Out Control Signals Conditions

Ideal Data Memory

Clk

Clk

32

Datapath

CS420/520 datapath.42

UC. Colorado Springs

Adapted from UCB97 & UCB03

The ADD Instruction


31 op 6 bits 26 rs 5 bits 21 rt 5 bits 16 rd 5 bits 11 shamt 5 bits 6 funct 6 bits 0

add

rd, rs, rt Fetch the instruction from memory The actual operation Calculate the next instructions address

mem[PC] R[rd] <- R[rs] + R[rt] PC <- PC + 4

CS420/520 datapath.43

UC. Colorado Springs

Adapted from UCB97 & UCB03

Instruction Fetch Unit at the Beginning of Add / Subtract


Fetch the instruction from Instruction memory: Instruction <- mem[PC] This is the same for all instructions
30 PC<31:28> Target 4 Instruction<25:0> 26 PC Clk imm16 Instruction<15:0> 16
CS420/520 datapath.44

30 00 30 1 Mux 0

Addr<31:2> Addr<1:0> Instruction Memory 32

30 1

Adder Adder SignExt 30

0 30 Mux 1 30

Jump = previous Instruction<31:0>

Branch = previous
UC. Colorado Springs

Zero = previous
Adapted from UCB97 & UCB03

The Single Cycle Datapath during Add and Subtract


31 op 26 rs 21 rt 16 rd 11 shamt 6 funct 0

R[rd] <- R[rs] + / - R[rt]


Branch = 0 Rd RegDst = 1 Rt Rs 5 5 Jump = 0 Clk Rt ALUctr = Add or Subtract Instruction Fetch Unit Instruction<31:0> <21:25> <16:20> <11:15> <0:15>

1 Mux 0

RegWr = 1 5 busW 32 Clk

Rs Zero

Rt

Rd

Imm16 MemtoReg = 0

busA Rw Ra Rb 32 32 32-bit Registers busB 0 32 Extender 1 32

MemWr = 0 0 Mux 32

ALU

32 WrEn Adr

Mux Data In 32 Clk

imm16

16

Data Memory

ALUSrc = 0 ExtOp = x
CS420/520 datapath.45 UC. Colorado Springs Adapted from UCB97 & UCB03

Instruction Fetch Unit at the End of Add and Subtract


PC <- PC + 4 This is the same for all instructions except: Branch and Jump
30 PC<31:28> Target 4 Instruction<25:0> 26 PC Clk imm16 Instruction<15:0> 16
CS420/520 datapath.46

30 00 30 1 Mux 0

Addr<31:2> Addr<1:0> Instruction Memory 32

30 1

Adder Adder SignExt 30

0 30 Mux 1 30

Jump = 0

Instruction<31:0>

Branch = 0
UC. Colorado Springs

Zero = x
Adapted from UCB97 & UCB03

The Single Cycle Datapath during Or Immediate


31 op 26 rs 21 rt 16 immediate 0

R[rt] <- R[rs] or ZeroExt[Imm16]


Branch = 0 Rd RegDst = 0 Rt Rs 5 5 Jump = 0 Clk Rt ALUctr = Or Instruction Fetch Unit Instruction<31:0> <21:25> <16:20> <11:15> <0:15>

1 Mux 0

RegWr = 1 5 busW 32 Clk

Rs Zero ALU

Rt

Rd

Imm16 MemtoReg = 0

busA Rw Ra Rb 32 32 32-bit Registers busB 0 32 Extender 1 32

MemWr = 0 0 Mux 32

32 WrEn Adr

Mux Data In 32 Clk

imm16

16

Data Memory

ALUSrc = 1 ExtOp = 0
CS420/520 datapath.47 UC. Colorado Springs Adapted from UCB97 & UCB03

The Single Cycle Datapath during Load


31 op 26 rs 21 rt 16 immediate 0

R[rt] <- Data Memory {R[rs] + SignExt[imm16]}


Branch = 0 Rd RegDst = 0 Rt Rs 5 5 Jump = 0 Clk Rt ALUctr = Add Instruction Fetch Unit Instruction<31:0> <21:25> <16:20> <11:15> <0:15>

1 Mux 0

RegWr = 1 5 busW 32 Clk

Rs Zero ALU

Rt

Rd

Imm16 MemtoReg = 1

busA Rw Ra Rb 32 32 32-bit Registers busB 0 32 Extender 1 32

MemWr = 0 0 Mux

32 WrEn Adr

Mux Data In 32 Clk

1 32

imm16

16

Data Memory

ALUSrc = 1 ExtOp = 1
CS420/520 datapath.48 UC. Colorado Springs Adapted from UCB97 & UCB03

The Single Cycle Datapath during Store


31 op 26 rs 21 rt 16 immediate 0

Data Memory {R[rs] + SignExt[imm16]} <- R[rt]


Branch = 0 Rd RegDst = x Rt Rs 5 5 Jump = 0 Clk Rt ALUctr = Add Instruction Fetch Unit Instruction<31:0> <21:25> <16:20> <11:15> <0:15>

1 Mux 0

RegWr = 0 5 busW 32 Clk

Rs Zero ALU

Rt

Rd

Imm16 MemtoReg = x

busA Rw Ra Rb 32 32 32-bit Registers busB 0 32 Extender 1 32

MemWr = 1 0 Mux 32

32 WrEn Adr

Mux Data In 32 Clk

imm16

16

Data Memory

ALUSrc = 1 ExtOp = 1
CS420/520 datapath.49 UC. Colorado Springs Adapted from UCB97 & UCB03

The Single Cycle Datapath during Branch


31 op 26 rs 21 rt 16 immediate 0

if (R[rs] - R[rt] == 0) then Zero <- 1 ; else Zero <- 0


Branch = 1 Rd RegDst = x Rt Rs 5 5 5 Jump = 0 Clk Rt ALUctr = Subtract Instruction Fetch Unit Instruction<31:0> <21:25> <16:20> <11:15> <0:15>

1 Mux 0

RegWr = 0

Rs Zero

Rt

Rd

Imm16 MemtoReg = x

busW 32 Clk

busA Rw Ra Rb 32 32 32-bit Registers busB 0 32 Extender 1 32

MemWr = 0 0 Mux 32

ALU

32 WrEn Adr

Mux Data In 32 Clk

imm16

16

Data Memory

ALUSrc = 0 ExtOp = x
CS420/520 datapath.50 UC. Colorado Springs Adapted from UCB97 & UCB03

Instruction Fetch Unit at the End of Branch


31 op 26 rs 21 rt 16 immediate 0

if (Zero == 1) then PC = PC + 4 + SignExt[imm16]*4 ; else PC = PC + 4


30 PC<31:28> Target 4 Instruction<25:0> 26 PC Clk imm16 Instruction<15:0> 16
CS420/520 datapath.51

30 00 30 1 Mux 0

Addr<31:2> Addr<1:0> Instruction Memory 32

30 1

The Single Cycle Datapath during Jump


31 op 26 target address 0

Nothing to do! Make sure control signals are set correctly!


Branch = 0/x Rd RegDst = x Rt Rs 5 5 Zero ALU MemWr = 0 0 Mux 32 WrEn Adr Data In 32 Clk ALUSrc = x ExtOp = x
CS420/520 datapath.52 UC. Colorado Springs Adapted from UCB97 & UCB03

Adder Adder SignExt 30 Jump = 1 Clk Rt

0 30 Mux 1 30

Jump = 0

Instruction<31:0>

Assume Zero = 1 to see the interesting case. Branch = 1


UC. Colorado Springs

Zero = 1
Adapted from UCB97 & UCB03

Instruction<31:0> <21:25> <16:20> <11:15> Instruction Fetch Unit ALUctr = x <0:15>

1 Mux 0

RegWr = 0 5 busW 32 Clk

Rs

Rt

Rd

Imm16 MemtoReg = x

busA Rw Ra Rb 32 32 32-bit Registers busB 0 32 Extender 1 32

32

Mux

imm16

16

Data Memory

Instruction Fetch Unit at the End of Jump


31 op 26 target address 0

PC <- PC<31:28> concat target<25:0> concat 00


30 PC<31:28> Target 4 Instruction<25:0> 26 PC Clk imm16 Instruction<15:0> 16
CS420/520 datapath.53

30 00 30 1 Mux 0

Addr<31:2> Addr<1:0> Instruction Memory 32

30 1

Step 4: Given Datapath: ---> Control


Instruction<31:0> <21:25> <16:20> <11:15> Inst Memory Adr <0:15> <31:26> <5:0>

Op Func Rs

Branch Jump RegWr RegDst ExtOp ALUSrc ALUctr MemWr MemtoReg

Adder Adder SignExt 30 Rt

0 30 Mux 1 30

Jump = 1

Instruction<31:0>

Branch = 0/x Zero = x


UC. Colorado Springs Adapted from UCB97 & UCB03

Rd

Imm16

Control
Zero

DATA PATH

CS420/520 datapath.54

UC. Colorado Springs

Adapted from UCB97 & UCB03

The Truth Table for the Main Control


RegDst op 6 ALUSrc Main Control func 6 ALU Control (Local) ALUctr 3

:
ALUop 3 00 0000 R-type 1 0 0 1 0 0 0 x R-type 1 0 0

op RegDst ALUSrc MemtoReg RegWrite MemWrite Branch Jump ExtOp ALUop (Symbolic) ALUop <2> ALUop <1> ALUop <0>
CS420/520 datapath.55

00 1101 10 0011 10 1011 00 0100 00 0010 ori 0 1 0 1 0 0 0 0 Or 0 1 0 lw 0 1 1 1 0 0 0 1 Add 0 0 0 sw x 1 x 0 1 0 0 1 Add 0 0 0 beq x 0 x 0 0 1 0 x Subtract 0 0 1 jump x x x 0 0 0 1 x xxx x x x

UC. Colorado Springs

Adapted from UCB97 & UCB03

The Truth Table for RegWrite


op RegWrite 00 0000 R-type 1 00 1101 10 0011 10 1011 00 0100 00 0010 ori 1 lw 1 sw 0 beq 0 jump 0

RegWrite = R-type + ori + lw = !op<5> & !op<4> & !op<3> & !op<2> & !op<1> & !op<0> + !op<5> & !op<4> & op<3> & op<2> & !op<1> & op<0> + op<5> & !op<4> & !op<3> & !op<2> & op<1> & op<0>
op<5>

(R-type) (ori) (lw)

..

op<5>

..

op<5>

..

op<5>

..

op<5>

..
<0>

op<5>

..
op<0>

<0>

<0>

<0>

<0>

R-type

ori

lw

sw

beq

jump RegWrite

CS420/520 datapath.56

UC. Colorado Springs

Adapted from UCB97 & UCB03

PLA Implementation of the Main Control


op<5>

..

op<5>

..

op<5>

..

op<5>

..

op<5>

..
<0>

op<5>

..
op<0>

<0>

<0>

<0>

<0>

R-type

ori

lw

sw

beq

jump

RegWrite ALUSrc RegDst MemtoReg MemWrite Branch Jump ExtOp ALUop<2> ALUop<1> ALUop<0>

CS420/520 datapath.57

UC. Colorado Springs

Adapted from UCB97 & UCB03

PLA Implementation of the Main Control: Drawing II


op5 op4 op3 op2 op1 op0

RegWrite

MemWrite

. . .

CS420/520 datapath.58

UC. Colorado Springs

Adapted from UCB97 & UCB03

Putting it All Together: A Single Cycle Processor


ALUop op 6 Instr<31:26> RegDst Main Control ALUSrc 3 func Instr<5:0> 6 ALU Control ALUctr 3

:
Rt Rs 5 5 Rt

Branch Jump Clk ALUctr Instruction Fetch Unit

Instruction<31:0> <21:25> <16:20> <11:15> <0:15>

Rd RegDst

1 Mux 0 RegWr 5

Rs Zero ALU

Rt

Rd

Imm16 MemtoReg 0 Mux

busW 32 Clk

busA Rw Ra Rb 32 32 32-bit Registers busB 0 32 Extender 1 32

MemWr

32 32 WrEn Adr

Mux Data In 32 Clk

imm16 Instr<15:0>

16

Data Memory

ALUSrc ExtOp
CS420/520 datapath.59 UC. Colorado Springs Adapted from UCB97 & UCB03

Putting it All Together: An Extended View

CS420/520 datapath.60

UC. Colorado Springs

Adapted from UCB97 & UCB03

Where to get more information?


CO2: Chapter 4.1 to 4.5 (pp.210 236); Chapter 5.1 to 5.3 CO3: Chapter 3.1 3.3 (pp. 160 176); Chapter 5.1 to 5.4 David Patterson and John Hennessy, Computer Organization & Design: The Hardware / Software Interface, Morgan Kaufman Publishers; CO2 (2nd edition) and CO3 (3rd edition) One of the best PhD thesis on processor design: Manolis Katevenis, Reduced Instruction Set Computer Architecture for VLSI, PhD Dissertation, EECS, U.C. Berkeley, 1982. For a reference on the MIPS architecture: Gerry Kane, MIPS RISC Architecture, Prentice Hall.

CS420/520 datapath.61

UC. Colorado Springs

Adapted from UCB97 & UCB03

CO review: Twos Complement Representation


2s complement representation of negative numbers (signed) Bitwise inverse and add 1 The MSB is always 1 for negative number => sign bit Biggest 4-bit Binary Number: 7 Smallest 4-bit Binary Number: -8
Bitwise Inverse (1s Com.) 1111 1110 1101 1100 1011 1010 1001 1000 0111

Decimal 0 1 2 3 4 5 6 7 8

Binary 0000 0001 0010 0011 0100 0101 0110 0111 1000

Decimal 0 -1 -2 -3 -4 -5 -6 -7 -8

2s Complement 0000 1111 1110 1101 1100 1011 1010 1001 1000

Illegal Positive Number!


CS420/520 datapath.62 UC. Colorado Springs Adapted from UCB97 & UCB03

CO review: A One-bit Full Adder


This is also called a (3, 2) adder Half Adder: No CarryIn nor CarryOut Truth Table:
Inputs A 0 0 0 0 1 1 1 1
CS420/520 datapath.63

CarryIn A B 1-bit Full Adder CarryOut

Outputs CarryIn 0 1 0 1 0 1 0 1 CarryOut 0 0 0 1 0 1 1 1 Sum 0 1 1 0 1 0 0 1 Comments 0 + 0 + 0 = 00 0 + 0 + 1 = 01 0 + 1 + 0 = 01 0 + 1 + 1 = 10 1 + 0 + 0 = 01 1 + 0 + 1 = 10 1 + 1 + 0 = 10 1 + 1 + 1 = 11


Adapted from UCB97 & UCB03

B 0 0 1 1 0 0 1 1

UC. Colorado Springs

CO review: Logic Diagrams for CarryOut


CarryOut = (B & CarryIn) | (A & CarryIn) | (A & B)
CarryIn A

CarryOut

Sum = A XOR B XOR CarryIn


CarryIn

A Sum B

CS420/520 datapath.64

UC. Colorado Springs

Adapted from UCB97 & UCB03

CO review: Overflow Detection


Overflow: the result is too large (or too small) to represent properly Example: - 8 < = 4-bit binary number <= 7 When adding operands with different signs, overflow cannot occur! Overflow occurs when adding: 2 positive numbers and the sum is negative 2 negative numbers and the sum is positive Optional homework exercise: Prove you can detect overflow by: Carry into MSB ! = Carry out of MSB

1 0

1 1 0 0

1 1 1 1 1 1 0 7 3 -6

0 1 1 0 1 0 1 1 0 1 1 -4 -5 7

0 1

1 0

CS420/520 datapath.65

UC. Colorado Springs

Adapted from UCB97 & UCB03

CO review: Overflow Detection Logic


Carry into MSB ! = Carry out of MSB For a N-bit ALU: Overflow = CarryIn[N - 1] XOR CarryOut[N - 1]
CarryIn0 A0 B0 A1 B1 A2 B2 A3 B3 1-bit Result0 ALU CarryIn1 CarryOut0 1-bit Result1 ALU CarryIn2 CarryOut1 1-bit ALU CarryIn3 1-bit ALU CarryOut3
CS420/520 datapath.66 UC. Colorado Springs Adapted from UCB97 & UCB03

X 0 0 1 1

Y 0 1 0 1

X XOR Y 0 1 1 0

Result2 Overflow Result3

Você também pode gostar