Escolar Documentos
Profissional Documentos
Cultura Documentos
INTRODUCTION
This project report is about 16-bit RISC processor design based on simple LOAD/STORE
architecture. This chapter discusses the introduction to this project which covers the
background research, research motivation, scope of work and the report organization. The
objective that lead to implementation of this project is also discussed.
1.1Research Background
Microprocessor is one of the greatest inventions in 20th century to fulfill the people
needs such as in daily works and communications where nowadays, people are
communicating anytime and from anywhere. As a result, people need a communication
device such smartphone or tablet computer for them to communicate with various high end
applications running on that device which need a high end computing system. RISC is one of
the simple and yet popular processor architectures in computing industry. To develop a high
performance computing, yet lower powered and small area usage, a microprocessor system
which meet that specification must be design.
In this project, a processor is designed which is based on RISC processor design. The
design philosophy of RISC processor is to reduce the complexity of the ISA by limiting the
instruction set in to a smaller number of more frequently used instruction that yields better
efficiency in modern computing.
1.2Project Background
The RISC processor design proposed is based on simple LOAD/STORE architecture is
designed using Verilog HDL design entry and the design methodology is based on
1 Project Report on 16-bit RISC Processor| Sandeepani School of Embedded System Design
hierarchical modularity of RTL design methodology so that the functional unit of the
processor can be modeled using behavioral programming style and the all functional blocks
will be integrated into a system using structural modeling technique.
Hardwired control approach will be applied to design the control unit as against micro-
programmed control approach in conventional Complex Instruction Set Computer (CISC)
processor. CISC processor has gained the major marketplace in world of computing over the
decades. They support various addressing modes and data types. The instruction is complex
and the length is varies from one instruction to another instruction. The CISC processor is
also frequently accessing data in external memory for the processor to execute its instruction
and this is very slow.
Compared to RISC processor, it operates on very few data types, simple and yet limited
addressing modes, and does only the simple instructions. It supports very few addressing
modes and is mostly register based. Most of the instructions operate on data present in the
register files, so called register-to-register operation, and this is faster than CISC’s memory-
to-memory operation. Only load and store data from and into memory are working on
memory accessing. Furthermore, the RISC instruction length is fixed and hence the decoding
technique is easier compared to CISC micro-programmed decoding technique to generate the
control signals.
1.3Objective
The objective of this project is to study, design, and validate a 16-bit RISC processor
based on simple LOAD/STORE architecture. It covers the study of simple LOAD/STORE
architecture design and investigation on how the processor executes its instruction.
1.4Scope of Work
The scope of works in this project covers the design of a 16-bit RISC processor with
implementation of 4-stage that can execute two main types of instruction set architecture
which are data processing and single data transfer. The project covers the design entry using
Verilog HDL and synthesizing using Xilinx ISE Tool.
2 Project Report on 16-bit RISC Processor| Sandeepani School of Embedded System Design
Chapter 2
INTRODUCTION OF RISC PROCESSOR
The RISC stands for Reduced Introduction Set Computer. RISC is a microprocessor which
runs using a pipelining arcitechture to improve the performance of a processor. Generally
speaking this means faster machine, mostly by improving MIPS (which stands for millions of
instructions per sec, meaning higher MIPS are better). It is important to note that
improvement of MIPS isn't always result in faster machine. This chapter covers simple
introduction of RISC processor architecture and comparison between RISC and CISC.
The idea was inspired by the discovery that many of the features that were included in
traditional CPU designs to facilitate coding were being ignored by the programs that were
running on them. Also these more complex features took several processor cycles to be
performed. Additionally, the performance gap between the processor and main memory was
increasing. This led to a number of techniques to streamline processing within the CPU,
while at the same time attempting to reduce the total number of memory accesses.
When the controller design become more complex in CISC and the performance was
also not up to expectations, people started looking on some other alternatives. It had been
found that when a processor talks to the memory the speed gets killed. So the one
improvement on CPI was to keep the instruction set very simple. Simple in not the way it
works but the way it looks. That’s why we have very few instructions in any typical RISC
architecture where processor asks data from memory probably not other than Load and Store.
We avoid keeping such addressing modes. The complexity of controller design has been
overcome with the help of operands and Opcode bits fixed in instruction register. At the end
the pipelining added a new dimension in the speed just with the help of some additional
registers. Now what pipeline does is it increases throughput by reducing CPI. The instruction
3 Project Report on 16-bit RISC Processor| Sandeepani School of Embedded System Design
can be executed effectively in one clock cycle. The pipelining in any kind of architecture took
birth from the inherent parallelism and the idle states of components.
The pipelined architecture could be further enhanced with the concepts known as
super-scaling. There we provide more than one execution unit. The time when one unit is 10
busy with the current execution task, the fetch unit can probably fetch he next instruction
which would be executed with the help of some other execution unit present in system.
# Uniform instruction encoding (for example the op-code is always in the same bit position in
each instruction, which is always one word long), which allows faster decoding.
# A homogeneous register set, allowing any register to be used in any context and simplifying
compiler design.
# simple addressing modes (complex addressing modes are replaced by sequences of simple
arithmetic instructions).
# Few data types supported in hardware (for example, some CISC machines had instructions
for dealing with byte strings. Others had support for polynomials and complex numbers. Such
instructions are unlikely to be found on a RISC machine).
Over many years, RISC instruction sets have tended to grow in size. Thus, some have
started using the term "load-store" to describe RISC processors, since this is the key element
of all such designs. Instead of the CPU itself handling many addressing modes, load-store
architecture uses a separate unit dedicated to handling very simple forms of load and store
operations. CISC processors are then termed "register-memory" or "memory-memory".
Today RISC CPUs (and microcontrollers) represent the vast majority of all CPUs in
use. The RISC design technique offers power in even small sizes, and thus has come to
completely dominate the market for low-power "embedded" CPUs. Embedded CPUs are by
far the largest market for processors. RISC had also completely taken over the market for
larger workstations for much of the 90s. After the release of the Sun SPARCstation the other
vendors rushed to compete with RISC based solutions of their own. Even the mainframe
world is now completely RISC based.
4 Project Report on 16-bit RISC Processor| Sandeepani School of Embedded System Design
2.2.2 The bridge toward RISC (Historical factors)
The capabilities of CISC allowed more operations to be performed into the same
program size. During that period, program and data storage were given more importance
since cost of memory was high.
An attempt was made to narrow the semantic gap, that is, the gap that existed between
machine instruction sets and high level language constructs with complicated instructions and
addressing modes to obtain performance increase. Most of these “improvements” were
rejected by compiler writers on the context that they did not fit well with the language
requirements and were of only limited usefulness. At the same time, research conducted by
David Patterson and Donald Knuth showed that 85% of a program’s statements were
assignments, conditional or procedure calls. Nearly 80% of the assignment statements were
MOVE instructions with no arithmetic operations.
As more and more capabilities were added to the processors, it was found increasingly
difficult to support higher clock speeds that would otherwise have been possible. Complex
instructions and addressing modes worked against higher clock speeds, because of the greater
number of microscopic actions that had to be performed per instruction. Moreover, RAM
prices dropped sufficiently so that the pressure on system designers was less to design
instructions that did more that it was to design systems that were faster. It was also becoming
cost-effective to employ small amounts of higher-speed cache memory to reduce memory
latency i.e. the writing time between when a memory is made and when it has been satisfied.
5 Project Report on 16-bit RISC Processor| Sandeepani School of Embedded System Design
CISC machine designers incorporated pre-fetching, pipelining and superscalar operation
in their designs but with instructions that were long and complex and operand access
depending on complex address arithmetic, it was difficult to make efficient use of these
new speed-up techniques. Furthermore, complex instructions and addressing modes hold
down clock speed compared to simple instructions. RISC machines were designed to
efficiently exploit the caching, pre-fetching, pipelining and superscalar methods that were
invented in the days of CISC machines.
6 Project Report on 16-bit RISC Processor| Sandeepani School of Embedded System Design
Chapter 3
INTRODUCTION OF 16-bit RISC PROCESSOR
We implemented a 16-bit RISC microprocessor based on a simplified version of the
LOAD/STORE architecture. The processor has 16-bit instruction words and 16 * 16 Data
memory. Every instruction is completed in four cycles. An external clock is used as the
timing mechanism for the control and datapath units. This section includes a summary of the
main features of the processor, a description of the pins, a high level diagram, sub level
blocks, and the instruction word formats.
7 Project Report on 16-bit RISC Processor| Sandeepani School of Embedded System Design
Table 3.1 Signal Description of Top level block
0000 0 Addition
0001 1 Subtraction
0100 0 AND
0101 0 OR
8 Project Report on 16-bit RISC Processor| Sandeepani School of Embedded System Design
0110 0 NOT
1000 0 NAND
1001 0 NOR
1010 0 XOR
1011 0 XNOR
Figure 3.2 shows the sub level block diagram of 16-bit RISC processor, which is
divided into three main parts, Instruction decoder, Arithmetic and logical unit and Data
memory.
9 Project Report on 16-bit RISC Processor| Sandeepani School of Embedded System Design
Figure 3.2 Sub level Block diagram
RST (reset) is an active high synchronous signal. Required 4 clock cycles to complete
one instruction. One instruction cycle equals to 4 clock cycles. Each instruction has 4
machine cycles (Instruction fetch and decode, execution, write back). 16 bit ALU, it can
perform 15 operations. It contains 16 X 16 bits memory. OPCODE of size 4 bits. Input
operand 1 has 4 bits of size, input operand 2 (immediate data) of 8 bits of size. ALU has 2
outputs, one is of size 16 bits and CB is of size 1 bit .
10 Project Report on 16-bit RISC Processor| Sandeepani School of Embedded System Design
DATA_OUT Output data16 bit
ADDRESS Address of the operand
R/W bar For read and write operation
CS Chip select
ALU_OPERATION Operation to be performed by ALU according to the
OPCODE
ALU_OPR1 Data stored in the address of the operand 1
ALU_OPR2 Immediate data
One instruction cycle equals to 4 clock cycles. Each instruction has 4 machine cycles
(Instruction fetch and decode, execution, write back).
INIT: RST signal is asserted the instruction decoder will enter into initial state. After it’s de-
assertion, for every positive edge of clock the state will change from init to fetch, decode,
execute and then to load.
Fetch: In this cycle Fetch control signal becomes low, Decode control signal becomes high
and other control signals (Execute and Load) are low. OPCODE bits are loaded into
ALU_OPERATION, OPERAND_1 bits are loaded into ADDRESS and OPERAND_2 bits
are loaded into ALU_OPR2. R/W bar and CS signals are high while WB signal is low.
Decode: In this cycle Decode control signal becomes low, Execute control signal becomes
high and other control signals (Fetch and Load) are low. 16-bit data comes from the memory
(DATA_OUT) are loaded into ALU_OPR1. . R/W bar , CS and WB signals are low.
Execute: In this cycle Execute control signal becomes low, Load control signal becomes high
and other control signals (Fetch and Decode) are low. 16-bit data comes from the memory
(DATA_OUT) are loaded into ALU_OPR1. CS and WB signals are high, R/W bar signal is
11 Project Report on 16-bit RISC Processor| Sandeepani School of Embedded System Design
low. In this cycle ALU performes operation depending upon ALU_OPERATION on 16-bit
data 0PR1 and OPR2 and output of the operation is available on ALU_OP and CB. ALU_OP
is applied to DATA_IN of the memory through multiplexer.
Load: In this cycle Load control signal becomes low, Fetch control signal becomes high and
other control signals ( Decode and Execute) are low. CS , WB and R/W bar signals are low.
Data available on DATA_IN is loaded into the memory depending upon the ADDRESS.
It performes all arithmetic and logical operations. It is a 16-bit. The output of the
ALU is registered. Figure 3.5 shows the top level block of ALU.
Arithmetic and Logical unit is divided into two parts comninational circuit and
sequential circuit.
12 Project Report on 16-bit RISC Processor| Sandeepani School of Embedded System Design
1. Combinational circuit
It is made of arithmetic circuit, logical circuit , shifter circuit and multiplexer.
Arithmetic circuit
It is made of full adder and multiplexer. It performes all artimetic operations like
addition, subtraction, addition with carry, increment, decrement
Logical circuit
It is made of AND, OR, XOR, NAND, NOR, NOT, XNOR gates and multiplexer. It
performes all logical operations.
Shifter circuit
It is made of multiplexers. It is divided into two circuits, arithmetic shifter and logical
shifter. Arithmetic shifter circuit performes two operation arithmetic right shift and arithmetic
left shift operation. Logical shifter circuit performes two operations, logical right shift and
logical left shift.
2. Sequential circuit
It is made of 16-bit register and 1-flip flop. Output of the combinational circuit
isapplied to the sequential circuit.
13 Project Report on 16-bit RISC Processor| Sandeepani School of Embedded System Design
3.3.3 DATA MEMORY
It has width of 16-bit and depth of 16 locations. It is a single port memory which is
used to provide 16-bit data and to store 16-bit result of the ALU. Figure 3.7 shows the top
level diagram of Data memory.
Write operation: CS control signal is high and R/W bar control signal is low, the data
available at the DATA_IN is loaded into memory as per the address given, at posedge of
clock signal.
Read operation: CS and R/W bar control signals are high, the data available at the specified
address location into the memory is fetched at posedge of clock signal and It is available at
the DATA_OUT signal.
14 Project Report on 16-bit RISC Processor| Sandeepani School of Embedded System Design