Escolar Documentos
Profissional Documentos
Cultura Documentos
GroupBIPS
Group B
John Krasich
Liz Garcia
Kai Luo
Sam Kim
Page |2
Table of Contents
Executive Summary……………………………………………………………………………………………3
Introduction………………….……………………………………………………………………………………4
Body…………………………..….……………………………………………………………………………………5
Conclusion………………..….……………………………………………………………………………………11
Appendices…………………………………………………………………………………………………..…...12
Executive Summary
This report discusses the design and implementation of our microprocessor project.
Furthermore, this paper talks about our instruction set design, the implementation, our design
in Xilinx, and how our team went about testing our 16-bit multicycle microprocessor. At the end
of this paper will include our complete design, our process journal, and the test results.
Page |4
Introduction
We were given the task of designing a “miniscule instruction set” general purpose processor
that can execute programs stored in external memory. We chose to design a 16-bit multi-cycle
data path. This processor must be capable of executing programs that is stored in the external
memory using a 16-bit address bus and a 16 bit data bus. Further requirements that our design
must support include:
Body
Instruction Set
The instruction set was the first part of our project that our grouped worked on. We
needed a set of instructions that would be used to perform general computations and based off
of Euclid’s Algorithm. After determining which 16 instructions were going to be used in our
project, we categorized them into four different instruction types: the R-type, L-type, I-type,
and J-type. As part of our requirement, we needed to break up the instructions to fit 16-bits.
For the R-type instructions, we reserved bits 15:11 for our opcode. Bits 11:8 are for our
first source register named RS, bits 7:4 are for our second source register named RT, and the
remaining bits, 3:0, are for our destination register, RD. The R-type instructions perform the
operations in the ALU, such as arithmetic and logical operations, as specified by an opcode. Our
R-type instructions include: ADD, SUB, AND, OR, SLT, and JR.
For our L-type instructions, we assigned our opcode to bits 15:11, our register, RD, is
assigned to bits 11:8, and the remaining bits, 7:0, are assigned to the immediate field. The L-
type instructions perform a specified operation between register RD and the immediate and
store the result in the ALUOut register. The L-type instruction includes addi and lui.
For our I-type instruction set, we used bits 15:0 for our opcode, 11:8 for our first source
register, 7:4 for second source register, and the remaining bits for the immediate. The
instructions assigned to this type are beq, sll, sra, lw, and sw. These set of instructions helped
our design to be more flexible when doing operations. For branch operations, both source
registers are passed into our ALU to be used for comparison tests to determine if our PC gets
shifted by the immediate amount or remain as is. In the execution stage for our memory access
instructions, the value in our source register adds the value of the immediate and store that
into the ALUOut register. For lw, the memory is read from a specified location provided by our
ALUOut register and stores it in our MemoryDataRegister (MDR). The sw stores a value in
memory in which the address is specified by the ALUOut. For the non-memory type
insturctions, one of our source register, RT, and the immediate get passed into the ALU at the
execution stage and performs some operation as specified by the opcode and places that result
in the RS register.
Our J-type instructions include j and jal. Bits 15:11 were reserved for our opcode and the
remaining bits were assigned to the immediate field to where an address for the procedure is
stored at. Having these instructions allowed us to go to and from a procedure and increase the
speed of our microprocessor.
Page |6
Page |7
For our ALU, it must be able to perform logical and arithmetic operations. The ALU will take two
inputs and will output the result to our ALUOut register. We implemented and designed the
ALU by building a 1-bit adder based off of the following logic:
r = ci · a · b + ci · a · b + ci · a · b + ci · a · b (1)
and
co = ci · a + ci · b + a · b, (2)
Using the 1-bit adder along with additional logic gates to support the other operations we
designed a 1-bit ALU shown below in figure 1a. Figure 1b is our complete 16-bit ALU schematic.
The memory was designed to take an address as the input and output an instruction and works
in a way a stack memory would; push and pop data from the stack, the new data is placed on
top of the stack. The memory was created in Xilinx by using the memory block generator tool.
The general purpose registers take in two instructions for reading, one for writing, and data
coming in from memory. The output is connected to two 16-bit source registers, RS and RT. It
was implemented in Xilinx using the tools in Xilinx.
Our control was designed to take a 4-bit opcode and transmit the signal to component(s) to
execute a task such as allowing for reading or writing and which arithmetic and/or logical
operations to execute. This component was created in Xilinx using Verilog code to create each
operation.
Our PCSource is a mux that takes in the ALU Result, ALUOut, and a shift left 2 instructions and
outputs one of the inputs back into the PC register.
Our Program Counter, PC, takes the input of a 4 x 1 Mux whose inputs are ALUOut and RS
which is for Jump. The output signal of the PC is the current program count value.
Additional components included in our deign are two sign extends, one which takes in a 4-bit
input and the other takes an 8-bit input and both sign extends outputs 16-bits. We also had
several registers such as our RS, RT, RD, ALUOut, RegDst and six muxes. Most of these
components were built using the tools that are already included in the Xilinx program. The
complete Xilinx schematic or our multi-cycle microprocessor can be seen in the appendices at
the end of this report.
Page |9
Testing
Our testing methodology was to test each component and ensure that it is properly working
before creating a new component and then group the components. For each component, we
ensured proper functionality by observing the waveform at various clock cycles in the ISim
simulation tool to verify our expectations.
Issues
There were several issues that we had encounter while creating our project. First off, Xilinx
would sometimes crash which caused us to lose some data and forced us to start over. It
occurred several times while creating the ALU. A second issue that we ran into was that our
code would at times get mixed up with the MIPS instruction set. A third issue was that our
project was word addressed not byte addressed which caused the program to skip words thus
allowing us to use up more memory and created difficulties in other parts of our project. The
way we fixed this was by adding zeros between each instruction. We’ve also added more
signals to block unwanted signals that were coming in and created additional waveforms to
catch any hidden errors. A final issue was that when it came time to testing the project on the
FPGA board, the project was too big to test it out on the hardware.
P a g e | 10
Final Results
186 bytes of memory
54.444 ns
184.46 us
9312 gates
---------------------------
Number of IOs: 69
Conclusion
This project has helped us understand the important aspects of the design process and
apply the principles of what we’ve learned throughout the quarter into this project.
Furthermore, working effectively as members of a team allowed us to design, test, and
complete this project that we believe to be as efficient as possible.
The team meetings have been very effective because each group member has
contributed to the project that would have helped the team move forward in completing the
design. Overall, what our group has learned from this project was not only the how to create a
multi-cycle microprocessor, but also the importance of organization, good planning, a
background knowledge of computer architecture, and how to deal with arising issues.
Despite the fact that we did not get the opportunity to test our project on the FPGA board, we
feel that we have met all of the requirements that have been outlined in the Final Project
description. This project has been a challenging, but yet valuable learning experience in which
the skills and knowledge obtained from completing this project will be applied elsewhere in
future courses and/or job experience.
P a g e | 12
Appendices
Milestone 1
Register Description
The programmer will be able to use 16 registers. The registers will be used in the following manner:
Type Register
R-type
op rs rt rd
4 bit 4 bit 4 bit 4 bit
R-type: The 4 bit opcode determines the instruction and the following 8-bits determine the
source registers and the last 4-bits determine the destination register.
I-Type
op rs rt immediate
4 bit 4 bit 4 bit 4 bit
I-type: The 4 bit opcode determines the instruction and the following 8-bits determine the
source registers and the last 4-bits are used as the immediate to perform theinstruction.
P a g e | 13
L-type
op rd immediate
4 bit 4 bit 8 bit
L-type: There is one destination register and uses the 8 bit to perform the instruction.
J-type
op immediate
4 bit 12 bit
J-type: There is a 4 bit opcode and the remaining is used as the immediate which is the address
to jump to.
Machine Language
$v syscall
0x0000 print int
0x0001 print str
0x0002 read int
0x0003 read str
0x0004 exit
relPrime:
while:
j while #0xd004
exit:
jr $ra #0xef00
while2:
j while2 #0xd020
else:
j while2 #0xd020
exitA:
jr $ra #0xef00
eixtB:
jr $ra #0xef00
.data
.text
syscall
syscall
syscall
Interrupt Detected:
.data
.asciiz
main:
syscall
Milestone 2
Step Action for R-type Action for L-type Action for I-type Action for J-
instructions instructions instructions type
instructions
1: IF IR = Memory[PC]
PC = PC + 2
2: ID RS = IR[11:8] RD = IR[11:8] RS = IR[11:8] jal:
RT = IR[7:4] Imm = SE(IR[7:0]) RT = IR[7:4] $ra = jal? PC:
RD = IR[3:0] Imm = SE(IR[3:0]) $ra
Bam =
shift_left(Imm) Imm = IR[11:0]
LW and SW:
ALUOut = RT +
Imm
LW:
MDR =
Memory[ALUOut]
SW:
Memory[ALUOut]
P a g e | 18
= RS
5: WB X LW:
RS = MDR
Each instruction begins with identical IF (Instruction Fetch) stage. Here, the Instruction Register
reads the memory at the PC’s current address. Additionally, the PC is incremented by 2 bytes.
The Instruction Decode (ID) stage assigns registers RS, RT, and RD to bits [11:8], [7:4],
and [3:0] of the instruction, respectively
The Execution (EX) stage passes RS and RT to the ALU, which performs operations as
specified by the opcode, and stores the result into the ALUOut register
The Memory (MEM) stage puts the result in ALUOut into the RD register
The ID stage sets RD to bits [11:8] of the instruction, and sends bits [7:0] to the 8-bit sign
extender as the immediate
The EX stage performs the specified operation between RD and the Immediate and stores
the result into the ALUOut register
The MEM stages then stores the result in ALUOut back into RD
Registers RS and RT are assigned to bits [11:8] and [7:4] of the instruction, the
Immediate [3:0] is passed through the 4-bit sign extender, and additionally passed to the
shift left component
For Branches, the EX stage passes RS and RT into the ALU for equivalence or non-
equivalence tests. The result determines if PC gains PC + the shifted immediate or
remains as is
For Memory access instructions, the EX stage adds the value in RT with the Immediate
and stores that value into ALUOut
For Non-memory I-Types, the EX stage passes RT and the Immediate into the ALU,
performing the operation specified by the opcode
For lw, the MEM stage reads memory at the location specified by ALUOut and stores it
into the MemoryDataRegister (MDR)
With sw, the MEM stage performs takes the value of RS and stores it in memory at the
address specified by ALUOut
P a g e | 19
For Non-memory types, the MEM stage places the result in ALUOut into the RS register
Load instructions include a Write Back (WB) stage, where the value in the MDR is put
into the register RS
If the instruction is of the jal type, the return address register ($ra) is given the PC’s
current address. The last 12 bits of the J-Type is used as the immediate
For jr instructions, the EX stage assigns the PC register to the value in $ra
For j and jal instructions, the EX stage concatenates the first 4 bits of the PC address with
the 12-bit immediate, and is passed into the PC register
P a g e | 20
ALU: The ALU takes in two inputs from two different muxes. The first mux is the
ALUSrcA mux(chooses RS or PC) and the second mux is the ALUSrcB mux (chooses RT
or 2 or 4-bit sign extend or 8-bit sign extend or 4-bit sign extend & shift left 2) and
executes an instruction depending upon the opcode.
General-Purpose Registers: Takes in two instructions for reading, an instruction for write
(from the RegDst Mux), and data from the MemtoReg mux and stores both data and
addresses into registers and outputs them to 2 16-bit registers , RT and RS.
Output: RS, RT
Muxes
I or D Mux: The first 2-1 mux takes in a PC and an ALUOut and decides which is
outputted to the memory.
Control Signals: I or D
RegDst Mux: The second 2-1 mux takes in two instructions and sends it to the registers.
MemtoReg Mux: The third 2-1 mux takes in the ALUOut result and Memory data and
decides which gets written to data.
ALUSrcA Mux: The fourth 2-1 mux takes in register RS and PC and outputs one or the
other to the ALU.
Input: RS, PC
Output: RS or PC
ALUSrcB Mux: The fifth 5-1 mux takes in RT, 2, a sign extend, and a sign extend & shift
left 2 and outputs one of the inputs as the second input to the ALU
Input: RT, 2, 4-bit sign extend, 8-bit sign extend, 4-bit sign extend & shift left 2
Output: RT or 4 or 4-bit sign extend or 8-bit sign extend or 4-bit sign extend & shift left 2
PCSource Mux: The3-1 sixth mux takes in the ALU Result, ALUOut, and a shift left 2
instruction and outputs one of the inputs back into the PC register.
The Control takes the 4-bit opcode of the instruction as an input and transmits one, two, and
four bit signals to different components to execute the desired task.
There are two different sign extending components – one which takes a 4 bit input and one
which takes an 8 bit input. Each sign extended takes the input and extends it to 16 bits by
repeating the most significant bit.
The instruction memory takes the 16 bit PC as an input and outputs the 16 bit data in memory
at the location of the PC. This data is the instruction of the task to be completed.
P a g e | 22
The data memory has two inputs, a 16 bit address in memory and a single bit control signal to
write into memory. The address is used to locate a section in memory and either reads the
location or writes to the location. The output is the 16 bit value in the address’s location in
memory.
ALUOut: input signal: the result of the ALU. Output signal: the result of the ALU. The result of
the ALU is passed into the ALUOut register, then the output the ALUOut register will be passed
Memory/Instruction: the input signal of the Instruction register is Memory Data which is from
the memory. The output of the Instruction register is Instruction[15-12], Instruction[11-8],
Instruction[7-4] and Instruction[4-0]. Instruction[15-12] will be used as the 4-bit opcode.
Instruction[11-8], Instruction[7-4] and Instruction[3-0] will be used to perform functions
required by the ALUop. The output signal of the Memory Data register will be passed into a
2by1 Mux which is controlled by MemToReg. The input signal of the Memory Data Register
will be the data that we want to write to the register. The output signal of the Memory Data
Register will be exactly same as the input signal and it will be written back into the register file.
Rs: The input signal is Reg[IR[11-8]], which is from the register file. The output signal is
Reg[IR[11-8]]. The register file stored the Reg[IR[11-8]] into Rs in the previous clock cycle and
then the Reg[IR[11-8]] will be passed into a 2by1 Mux in the next clock cycle.
Rt: The input signal is Reg[IR[7-4]], which is from the register file. The output signal is
Reg[IR[7-4]]. The register file stored the Reg[IR[7-4]] into Rt in the previous clock cycle and
then the Reg[IR[7-4]] will be passed into a 4by1 Mux in the next clock cycle.
Necessary Tests
To ensure the correctness of the RTL, the following tests will need to be passed:
add 0x0
sub 0x1
and 0x2
or 0x3
addi 0x4
lui 0x5
sll 0x6
sra 0x7
slt 0x8
beq 0x9
syscall 0xa
jr 0xe
j 0xc
jal 0xd
lw 0xb
sw 0xf
Instruction Set
Machine
Code Instruction Comments
5E7F, lui $sp 7F Load the stack pointer
2001, and $v $0 $0 INTERRUPT1: Set the v register to 0
4101, addi $v 1 Add 1 to the v register
A000, syscall Reads the interupt value
6DD4, sll $d 4 Shifts the display
3DCD, or $d $ir $d ORs the display with the interrupt
2001, and $v $0 $0 Set the v register to 0
A000, syscall Displays the new display register
900F, beq $0 $0 -1 Waits for interrupt 2
9D0F, beq $d $0 -1 INTERRUPT2: Does nothing if display is 0
00D2, add $a0 $0 $d Adds the display to the argument reg for
D020, jal RELPRIME jump and link to relPrime
001D, add $d $0 $v Stores v into the display regist
2001, and $v $0 $0 Set the v register to 0
P a g e | 24
Journal
[1/12/2011][9:00 AM]
It was a cold and cloudy day. Snow accumulation of approximately 5 inches. 18 degrees.
Worked on Milestone 1
These were chosen based on what we felt were most important for general purpose
programming
arguments, returns, stack pointer, return address, kernel, display, and interrupt
Distributed the remaining instruction bits for the R, I, J, and a new L type
L type originated from the need for a larger immediate value for the addi instruction
Instructions were assigned a type, following MIPS' example with a few exceptions
sll and sra were assigned as an I type due to a lack of a "shamt" in our R-Type
addi and lui were assigned the L type to allow a larger immediate value
we picked the most important syscalls we felt were needed, but additional syscalls may be
added in the future
[1/12/2011][7:00 PM]
It was decided that j instructions takes the least 12 significant bits as its immediate, and grabs
the upper 4 bits from the PC
John drafted an excerpt for interrupt handling. The group decided the best way to handle the
interrupt for now is to simply print out the interrupt code
[1/17/2011][7:00PM]
Although slower than a pipelined design, the multicycle lacks any special circumstances (like a
beq) and data hazards that exist when pipelining
Group drafted and discussed RTL design for the different instruction types
All ALU operations that use two registers begin with a '00'
ALU operations with one register and an immediate begin with '01'
The operations were grouped together this way so the control looks at fewer bits to distinguish
between similar functions
The remaining members discussed the components needed for our design. This was based
upon the MIPS design discussed over class with a few exceptions.
Because we use 2 types of instructions with different sized immediates, there must be two
different sign extenders.
For jump commands, a component is needed to concatenate the given immediate with the first
4 bits of the PC address
John and Sam formulated the necessary tests for the RTL design. The ALU, being such a major
component, is tested separately. Memory addressing is tested, along with reassigning the PC
through jumps or branches. Lastly, the control is tested. This ensures that the correct signals
are passed to every component
[1/25/2011][4:30PM]
Group met and drew up a schematic for the datapath. Determined control lines necessary
based on previous lecture material.
P a g e | 29
Tests were formulated to test each component individually, and then groups of components.
The groups were organized as ALU operations, including the ALU itself, the Control, and
Register File
Branches were the next test chosen, to make sure the PC control and registers work
The steps for instruction fetching and decode were chosen next, as every instruction relies on
this step
The Execution stage would be tested afterwards, testing for two registers and one register with
an immediate
[1/27/2011][10:04AM]
[2/4/2011][7:00PM]
Continued work on control. Used Verilog finite state machine to code, since it is relatively easy
to code once the syntax was learned.
[2/5/2011][7:00PM]
P a g e | 30
Finished control. Test bench doesn't assert values, but rather we compared the levels of each
control line to what we expected to be at certain stages.
Sam continued work on ALU, completed 1 bit level and began 16 bit level. Started writing test
bench at the 16 bit level, asserting expected values for a variety of tests.
[2/9/2011][8:37PM]
John has placed many of the components needed for the processor, including control, ALU and
registers. He has also begun work on the interrupts.
Liz and Kai wrote integrated test benches for once the remaining components are complete.
P a g e | 31
Memo
[1/12/2011]
Laid the conceptual design for our assembler language, including syntax, instruction set,
instruction type, register allocation, and machine code conversion
[1/18/2011]
The design is currently established as a working multi-cycle datapath with thoughts of possibly
implementing a pipeline datapath instead. A more efficient opcode has also been
implemented.
[1/25/2011]
A draft of the datapath has been completed, committing to the multi-cycle design. Plans for
designing the individual components have been completed, as has our plans for unit testing and
integration testing
[2/4/2011]
P a g e | 32
The control has been completed. It functions as a finite state machine, where it activates
particular control lines at appropriate times. It has been tested by comparing values to what we
expected.
The ALU is partially completed. 16 1-bit ALUs have been strung together. Add, And, and Or work
as expected. A testbench is currently being built to exercise every operation the ALU must
perform.
[2/9/2011]
The ALU has been completed and thoroughly tested. The processor is complete except
exceptions, interrupts, and memory. Every component has been tested individually, and a test
has been planned for integration once these are complete. Progress on the interrupts has been
started.