1.1 Research Background: Project Report On 16-Bit RISC Processor - Sandeepani School of Embedded System Design

Chapter 1
INTRODUCTION
This project report is about 16-bit RISC processor design based on simple LOAD/STORE
architecture. This chapter discusses the introduction to this project which covers the
background research, research motivation, scope of work and the report organization. The
objective that lead to implementation of this project is also discussed.
1.1Research Background
Microprocessor is one of the greatest inventions in 20th century to fulfill the people
needs such as in daily works and communications where nowadays, people are
communicating anytime and from anywhere. As a result, people need a communication
device such smartphone or tablet computer for them to communicate with various high end
applications running on that device which need a high end computing system. RISC is one of
the simple and yet popular processor architectures in computing industry. To develop a high
performance computing, yet lower powered and small area usage, a microprocessor system
which meet that specification must be design.
In this project, a processor is designed which is based on RISC processor design. The
design philosophy of RISC processor is to reduce the complexity of the ISA by limiting the
instruction set in to a smaller number of more frequently used instruction that yields better
efficiency in modern computing.
Besides, the RISC processor throughput is improved by implementation of the pipeline

mechanism that brings the processor to achieve a high performance in speed because all the
operations are done by the registers. RISC architecture was first introduced by IBM in 1975
[16]. However, RISC designs such as Berkeley’s RISC processor and Stanford’s MIPS
processor which were introduced by respective university research teams were gaining higher
popularity in term of public RISC design.
1.2Project Background
The RISC processor design proposed is based on simple LOAD/STORE architecture is
designed using Verilog HDL design entry and the design methodology is based on
1 Project Report on 16-bit RISC Processor| Sandeepani School of Embedded System Design
hierarchical modularity of RTL design methodology so that the functional unit of the
processor can be modeled using behavioral programming style and the all functional blocks
will be integrated into a system using structural modeling technique.
Hardwired control approach will be applied to design the control unit as against micro-
programmed control approach in conventional Complex Instruction Set Computer (CISC)
processor. CISC processor has gained the major marketplace in world of computing over the
decades. They support various addressing modes and data types. The instruction is complex
and the length is varies from one instruction to another instruction. The CISC processor is
also frequently accessing data in external memory for the processor to execute its instruction
and this is very slow.
Compared to RISC processor, it operates on very few data types, simple and yet limited
addressing modes, and does only the simple instructions. It supports very few addressing
modes and is mostly register based. Most of the instructions operate on data present in the
register files, so called register-to-register operation, and this is faster than CISC’s memory-
to-memory operation. Only load and store data from and into memory are working on
memory accessing. Furthermore, the RISC instruction length is fixed and hence the decoding
technique is easier compared to CISC micro-programmed decoding technique to generate the
control signals.
1.3Objective
The objective of this project is to study, design, and validate a 16-bit RISC processor
based on simple LOAD/STORE architecture. It covers the study of simple LOAD/STORE
architecture design and investigation on how the processor executes its instruction.
1.4Scope of Work
The scope of works in this project covers the design of a 16-bit RISC processor with
implementation of 4-stage that can execute two main types of instruction set architecture
which are data processing and single data transfer. The project covers the design entry using
Verilog HDL and synthesizing using Xilinx ISE Tool.
Chapter 2
INTRODUCTION OF RISC PROCESSOR
The RISC stands for Reduced Introduction Set Computer. RISC is a microprocessor which
runs using a pipelining arcitechture to improve the performance of a processor. Generally
speaking this means faster machine, mostly by improving MIPS (which stands for millions of
instructions per sec, meaning higher MIPS are better). It is important to note that
improvement of MIPS isn't always result in faster machine. This chapter covers simple
introduction of RISC processor architecture and comparison between RISC and CISC.
There seems to be now an overwhelming case in favor of Reduced Instruction Set

Computers (RISC) as high performance computing engines. RISC processors, first developed
in the eighties, seem predestined to dominate the computer industry in the nineties and to
relegate old microprocessor architectures into oblivion.
2.1 Introduction of RISC Architecture

The reduced instruction set computer, or RISC, is a microprocessor CPU design
philosophy that favors a smaller and simpler set of instructions that all take about the same
amount of time to execute. The most common RISC microprocessors are ARM, DEC Alpha,
PA-RISC, SPARC, MIPS, and IBM's PowerPC.
The idea was inspired by the discovery that many of the features that were included in
traditional CPU designs to facilitate coding were being ignored by the programs that were
running on them. Also these more complex features took several processor cycles to be
performed. Additionally, the performance gap between the processor and main memory was
increasing. This led to a number of techniques to streamline processing within the CPU,
while at the same time attempting to reduce the total number of memory accesses.
When the controller design become more complex in CISC and the performance was
also not up to expectations, people started looking on some other alternatives. It had been
found that when a processor talks to the memory the speed gets killed. So the one
improvement on CPI was to keep the instruction set very simple. Simple in not the way it
works but the way it looks. That’s why we have very few instructions in any typical RISC
architecture where processor asks data from memory probably not other than Load and Store.
We avoid keeping such addressing modes. The complexity of controller design has been
overcome with the help of operands and Opcode bits fixed in instruction register. At the end
the pipelining added a new dimension in the speed just with the help of some additional
registers. Now what pipeline does is it increases throughput by reducing CPI. The instruction
can be executed effectively in one clock cycle. The pipelining in any kind of architecture took
birth from the inherent parallelism and the idle states of components.
The pipelined architecture could be further enhanced with the concepts known as
super-scaling. There we provide more than one execution unit. The time when one unit is 10
busy with the current execution task, the fetch unit can probably fetch he next instruction
which would be executed with the help of some other execution unit present in system.
Features which are generally found in RISC designs are:
# Uniform instruction encoding (for example the op-code is always in the same bit position in
each instruction, which is always one word long), which allows faster decoding.
# A homogeneous register set, allowing any register to be used in any context and simplifying
compiler design.
# simple addressing modes (complex addressing modes are replaced by sequences of simple
arithmetic instructions).
# Few data types supported in hardware (for example, some CISC machines had instructions
for dealing with byte strings. Others had support for polynomials and complex numbers. Such
instructions are unlikely to be found on a RISC machine).
Over many years, RISC instruction sets have tended to grow in size. Thus, some have
started using the term "load-store" to describe RISC processors, since this is the key element
of all such designs. Instead of the CPU itself handling many addressing modes, load-store
architecture uses a separate unit dedicated to handling very simple forms of load and store
operations. CISC processors are then termed "register-memory" or "memory-memory".
Today RISC CPUs (and microcontrollers) represent the vast majority of all CPUs in
use. The RISC design technique offers power in even small sizes, and thus has come to
completely dominate the market for low-power "embedded" CPUs. Embedded CPUs are by
far the largest market for processors. RISC had also completely taken over the market for
larger workstations for much of the 90s. After the release of the Sun SPARCstation the other
vendors rushed to compete with RISC based solutions of their own. Even the mainframe
world is now completely RISC based.
2.2 RISC Vs CISC

2.2.1 CISC Designs
An overriding characteristic of CISC machines is an approach to instruction set

architecture that emphasizes doing more with each instruction. As a result, CISC machines
have a wide variety of addressing modes. CISC machines take a “have it your way” approach
to the location and number of operands in various instructions. As a result instructions are of
widely varying length and execution times.
2.2.2 The bridge toward RISC (Historical factors)
The capabilities of CISC allowed more operations to be performed into the same
program size. During that period, program and data storage were given more importance
since cost of memory was high.
An attempt was made to narrow the semantic gap, that is, the gap that existed between
machine instruction sets and high level language constructs with complicated instructions and
addressing modes to obtain performance increase. Most of these “improvements” were
rejected by compiler writers on the context that they did not fit well with the language
requirements and were of only limited usefulness. At the same time, research conducted by
David Patterson and Donald Knuth showed that 85% of a program’s statements were
assignments, conditional or procedure calls. Nearly 80% of the assignment statements were
MOVE instructions with no arithmetic operations.
As more and more capabilities were added to the processors, it was found increasingly
difficult to support higher clock speeds that would otherwise have been possible. Complex
instructions and addressing modes worked against higher clock speeds, because of the greater
number of microscopic actions that had to be performed per instruction. Moreover, RAM
prices dropped sufficiently so that the pressure on system designers was less to design
instructions that did more that it was to design systems that were faster. It was also becoming
cost-effective to employ small amounts of higher-speed cache memory to reduce memory
latency i.e. the writing time between when a memory is made and when it has been satisfied.
2.3 Why RISC?

Various attempts have been made to increase the instruction execution rates by
overlapping the execution of more than one instruction since the earliest day of computing.
The most common ways of overlapping are pre-fetching, pipelining and superscalar
operation.
1) Pre-fetching: The process of fetching next instruction or instructions into an event

queue before the current instruction is complete is called pre-fetching. The earliest 16-
bit microprocessor, the Intel 8086/8, pre-fetches into a non-board queue up to six
bytes following the byte currently being executed thereby making them immediately
available for decoding and execution, without latency.
2) Pipelining: Pipelining instructions means starting or issuing an instruction prior to the
completion of the currently executing one. The current generation of machines carries
this to a considerable extent. The PowerPC 601 has 20 separate pipeline stages in
which various portions of various instructions are executing simultaneously.
3) Superscalar operation: Superscalar operation refers to a processor that can issue
more than one instruction simultaneously. The PPC 601 has independent integer,
floating-point and branch units, each of which can be executing an instruction
simultaneously.
CISC machine designers incorporated pre-fetching, pipelining and superscalar operation
in their designs but with instructions that were long and complex and operand access
depending on complex address arithmetic, it was difficult to make efficient use of these
new speed-up techniques. Furthermore, complex instructions and addressing modes hold
down clock speed compared to simple instructions. RISC machines were designed to
efficiently exploit the caching, pre-fetching, pipelining and superscalar methods that were
invented in the days of CISC machines.
Chapter 3
INTRODUCTION OF 16-bit RISC PROCESSOR
We implemented a 16-bit RISC microprocessor based on a simplified version of the
LOAD/STORE architecture. The processor has 16-bit instruction words and 16 * 16 Data
memory. Every instruction is completed in four cycles. An external clock is used as the
timing mechanism for the control and datapath units. This section includes a summary of the
main features of the processor, a description of the pins, a high level diagram, sub level
blocks, and the instruction word formats.
3.1Top level diagram

RST (reset) is an active high synchronous signal, requires 4 clock cycles to complete
one instruction. OPCODE of size 4 bits, Input operand-1 has 4 bits of size, input operand-2
(immediate data) of 8 bits of size ALU has 2 outputs, one is of size 16 bits and another is of
size 1 bit. Figure 3.1 shows the top level of RISC processor.
Figure 3.1 Top level Block diagram of 16-bit RISC Processor
Table 3.1 Signal Description of Top level block
Signal Name Description

OPCODE 4 bit size, specifies the operation performed by the ALU on given
operands
OPERAND_1 First operand which consists the address of location where the first
operand stored in memory
OPERAND_2 8 bit immediate data
CLK Processor clock
CIN Carry in for ALU, used for arithmetic operations
RST Reset (when RST is high then register ALU and instruction decoder
will reset)
ALU_OP 16 bit data output from ALU
CB 1-bit carry/borrow out from ALU
3.2 Instruction Set Architecture (ISA)

The ISA of this processor consists of 15 instructions with a 4-bit fixed size operation
code. The instruction words are 16-bits long.
OPCODE OPERAND-2 OPERAND-1

4- bits 8- bits 4- bits
Opcode decides the operation to be performed which consists of 4 bits. Operand-1

consists of address of the data on which operation is to be performed, is stored in memory.
Operand-2 is an immediate data which is of 8-bits.
Table 3.2 Number of Operations
OPCODE CIN OPERATION
0000 0 Addition
0000 1 Addition with carry
0001 0 Subtraction with borrow
0001 1 Subtraction
0010 1 Increment by 1 [OPERAND_1]
0011 0 Decrement by 1 [OPERAND_1]
0100 0 AND
0101 0 OR
0110 0 NOT
1000 0 NAND
1001 0 NOR
1010 0 XOR
1011 0 XNOR
1100 0 Logical left shift of data [OPERAND_1]
1101 0 Logical Right shift of data [OPERAND_1]
1110 0 Arithmetic left shift of data [OPERAND_1]
1111 0 Arithmetic right shift of data [OPERAND_1]
3.3 Sub level Block diagram
Figure 3.2 shows the sub level block diagram of 16-bit RISC processor, which is
divided into three main parts, Instruction decoder, Arithmetic and logical unit and Data
memory.
Figure 3.2 Sub level Block diagram
RST (reset) is an active high synchronous signal. Required 4 clock cycles to complete
one instruction. One instruction cycle equals to 4 clock cycles. Each instruction has 4
machine cycles (Instruction fetch and decode, execution, write back). 16 bit ALU, it can
perform 15 operations. It contains 16 X 16 bits memory. OPCODE of size 4 bits. Input
operand 1 has 4 bits of size, input operand 2 (immediate data) of 8 bits of size. ALU has 2
outputs, one is of size 16 bits and CB is of size 1 bit .
3.3.1 INSTUCTION DECODER

Instruction decoder is an FSM, which has 4 states. If RST signal is asserted the
instruction decoder will enter into initial state. After it’s de-assertion, for every positive edge
of clock the state will change from init to fetch, decode, execute and then to load. Figure 3.3
shows the top level diagram of Instruction decoder.
Figure 3.3 Top level diagram of Instruction decoder
Table 3.3 Signal Description of Instruction decoder

OPCODE 4 bit size, specifies the operation performed by the ALU on
given operands
OPERAND_1 First operand which consists the address of location where the
first operand stored in memory
OPERAND_2 8 bit immediate data
RST Reset (when RST is high then register ALU and instruction
decoder will reset)
CLK Processor clock
WB Write back
DATA_OUT Output data16 bit
ADDRESS Address of the operand
R/W bar For read and write operation
CS Chip select
ALU_OPERATION Operation to be performed by ALU according to the
OPCODE
ALU_OPR1 Data stored in the address of the operand 1
ALU_OPR2 Immediate data
State diagram of Instruction decoder
One instruction cycle equals to 4 clock cycles. Each instruction has 4 machine cycles
(Instruction fetch and decode, execution, write back).
Figure 3.4 State diagram of Instruction decoder
INIT: RST signal is asserted the instruction decoder will enter into initial state. After it’s de-
assertion, for every positive edge of clock the state will change from init to fetch, decode,
execute and then to load.
Fetch: In this cycle Fetch control signal becomes low, Decode control signal becomes high
and other control signals (Execute and Load) are low. OPCODE bits are loaded into
ALU_OPERATION, OPERAND_1 bits are loaded into ADDRESS and OPERAND_2 bits
are loaded into ALU_OPR2. R/W bar and CS signals are high while WB signal is low.
Decode: In this cycle Decode control signal becomes low, Execute control signal becomes
high and other control signals (Fetch and Load) are low. 16-bit data comes from the memory
(DATA_OUT) are loaded into ALU_OPR1. . R/W bar , CS and WB signals are low.
Execute: In this cycle Execute control signal becomes low, Load control signal becomes high
and other control signals (Fetch and Decode) are low. 16-bit data comes from the memory
(DATA_OUT) are loaded into ALU_OPR1. CS and WB signals are high, R/W bar signal is
low. In this cycle ALU performes operation depending upon ALU_OPERATION on 16-bit
data 0PR1 and OPR2 and output of the operation is available on ALU_OP and CB. ALU_OP
is applied to DATA_IN of the memory through multiplexer.
Load: In this cycle Load control signal becomes low, Fetch control signal becomes high and
other control signals ( Decode and Execute) are low. CS , WB and R/W bar signals are low.
Data available on DATA_IN is loaded into the memory depending upon the ADDRESS.
3.3.2 ARITHMETIC AND LOGICAL UNIT
It performes all arithmetic and logical operations. It is a 16-bit. The output of the
ALU is registered. Figure 3.5 shows the top level block of ALU.
Figure 3.5 Top level diagram of Arithmetic and Logical Unit
Table 3.4 Signal Description of Arithmetic and Logical unit

ALU_OPERATION 4 bit size, specifies the operation performed by the ALU on
given operands
ALU_OPERAND_1 It comes from the memory
ALU_OPERAND_2 16 bit input, sign extended immediate data
CIN Carry in for ALU, used for arithmetic operations
CLK System clock
RST If RST is high then entire ALU will become reset
ALU_OP 16 bit data output from ALU
CB 1 bit carry/borrow out from ALU
Arithmetic and Logical unit is divided into two parts comninational circuit and
sequential circuit.
1. Combinational circuit
It is made of arithmetic circuit, logical circuit , shifter circuit and multiplexer.
Arithmetic circuit
It is made of full adder and multiplexer. It performes all artimetic operations like
addition, subtraction, addition with carry, increment, decrement
Logical circuit
It is made of AND, OR, XOR, NAND, NOR, NOT, XNOR gates and multiplexer. It
performes all logical operations.
Shifter circuit
It is made of multiplexers. It is divided into two circuits, arithmetic shifter and logical
shifter. Arithmetic shifter circuit performes two operation arithmetic right shift and arithmetic
left shift operation. Logical shifter circuit performes two operations, logical right shift and
logical left shift.
2. Sequential circuit
It is made of 16-bit register and 1-flip flop. Output of the combinational circuit
isapplied to the sequential circuit.
3.3.3 DATA MEMORY
It has width of 16-bit and depth of 16 locations. It is a single port memory which is
used to provide 16-bit data and to store 16-bit result of the ALU. Figure 3.7 shows the top
level diagram of Data memory.
Table 3.5 Signal Description of Data memory

ADDRESS It provides the address where the data is stored or

fetched
DATA_IN Input data is available at this input signala
CS Chip select , it is used to select the memory
CLK Processor clock, provides clock signal to the memory
R/W bar It is a control signal for read and write operation
DATA_OUT Data fetched from the memory is available at this signal
Write operation: CS control signal is high and R/W bar control signal is low, the data
available at the DATA_IN is loaded into memory as per the address given, at posedge of
clock signal.
Read operation: CS and R/W bar control signals are high, the data available at the specified
address location into the memory is fetched at posedge of clock signal and It is available at
the DATA_OUT signal.

1.1 Research Background: Project Report On 16-Bit RISC Processor - Sandeepani School of Embedded System Design

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

1.1 Research Background: Project Report On 16-Bit RISC Processor - Sandeepani School of Embedded System Design

Enviado por

Direitos autorais:

Formatos disponíveis

Chapter 1

Besides, the RISC processor throughput is improved by implementation of the pipeline

There seems to be now an overwhelming case in favor of Reduced Instruction Set

2.1 Introduction of RISC Architecture

Features which are generally found in RISC designs are:

2.2 RISC Vs CISC

An overriding characteristic of CISC machines is an approach to instruction set

2.3 Why RISC?

1) Pre-fetching: The process of fetching next instruction or instructions into an event

3.1Top level diagram

Figure 3.1 Top level Block diagram of 16-bit RISC Processor

Signal Name Description

3.2 Instruction Set Architecture (ISA)

OPCODE OPERAND-2 OPERAND-1

Opcode decides the operation to be performed which consists of 4 bits. Operand-1

OPCODE CIN OPERATION

0000 1 Addition with carry

0001 0 Subtraction with borrow

0010 1 Increment by 1 [OPERAND_1]

0011 0 Decrement by 1 [OPERAND_1]

1100 0 Logical left shift of data [OPERAND_1]

1101 0 Logical Right shift of data [OPERAND_1]

1110 0 Arithmetic left shift of data [OPERAND_1]

1111 0 Arithmetic right shift of data [OPERAND_1]

3.3 Sub level Block diagram

3.3.1 INSTUCTION DECODER

Figure 3.3 Top level diagram of Instruction decoder

Table 3.3 Signal Description of Instruction decoder

Signal Name Description

State diagram of Instruction decoder

Figure 3.4 State diagram of Instruction decoder

3.3.2 ARITHMETIC AND LOGICAL UNIT

Figure 3.5 Top level diagram of Arithmetic and Logical Unit

Table 3.4 Signal Description of Arithmetic and Logical unit

Signal Name Description

Table 3.5 Signal Description of Data memory

ADDRESS It provides the address where the data is stored or

DATA_IN Input data is available at this input signala

CS Chip select , it is used to select the memory

CLK Processor clock, provides clock signal to the memory

R/W bar It is a control signal for read and write operation

DATA_OUT Data fetched from the memory is available at this signal

Você também pode gostar