VLSI Design: A Basic RISC Processor

VLSI Design: A Basic RISC Processor
by: Farrah A Djidjeli Heiner Castro Gutierrez Han Chen Lee Mohd Taufiq Mohd Yusof Saad Qayyum
University of Southampton
Faculty of Mathematics, Science and Engineering School of Electronics and Computer Science
2007
Table of contents
Introduction .................................................................................................................... 3 1. Left buffer design ................................................................................................... 4 1.1. 1.2. 1.3. Design specifications ..................................................................................... 4 Design strategy............................................................................................... 5 Design Process ............................................................................................... 6 Equal rise and fall time first inverter design. ........................................ 6 Clock, nReset and Test stages calculation. ............................................ 7 Selection of the number of stages and scale constant ............................ 8 Calculation of width of PMOS and NMOS for each signal ................... 8 Designing leftbuf in Magic ................................................................. 9 Simulation of leftbuf cell using Magic .............................................. 10
1.3.1. 1.3.2. 1.3.3. 1.3.4. 1.3.5. 1.3.6. 1.4. 2.
Design Optimization .................................................................................... 11
Datapath Design ................................................................................................... 12 2.1. 2.2. 2.3. 2.4. 2.5. ALU (Arithmetic-Logic Unit) design .......................................................... 13 Bitslice Reg design ...................................................................................... 16 Bitslice IR design ......................................................................................... 19 Topcell design .............................................................................................. 24 Execution of Instructions ............................................................................. 27 Instruction Mapping............................................................................. 28
2.5.1. 3.
Control Unit ......................................................................................................... 33 3.1. 3.2. 3.3. Description of Control Signals..................................................................... 33 Synthesis of Control Unit............................................................................. 34 Place & Route .............................................................................................. 35
4.
Microprocessor Test............................................................................................. 38
Summarising and Conclusions ..................................................................................... 41 References .................................................................................................................... 42 Appendix A Division of Labour .................................................................................. 43 Appendix B Stimulus File for ALU simulation ........................................................... 44 Appendix C Stimulus File for bitslice reg simulation ................................................. 46 Appendix D Stimulus File for bitslice IR simulation .................................................. 48 Appendix E Stimulus File for topcell circuit ............................................................... 50 Appendix F Test Program verilog file ......................................................................... 52
Introduction
This paper intends to describe all the process developed to design a basic novel microprocessor. First, the left buffer utilised in the design is described. The strategy followed for designing the left buffer is explained as well as the design process. Next, the results of the simulations over the left buffer are shown to verify its functionality. Secondly, the datapath implementation is illustrated. The different sections of the datapath are explained in detail and results of simulations for each part of the datapath are shown to validate the whole design. Sequentially, the executions of all instructions are mapped in the datapath to follow the behaviour of the system. Thirdly, The Control Unit is described briefly. The results of the synthesis on the Control Unit are shown as well as the place and route process with the datapath. Finally, the strategy for testing the microprocessor is illustrated.
1. Left buffer design

1.1. Design specifications
The leftbuf cell is a modification from the leftend design with the addition of a buffer. It is that suitable for a large design that needs enhancement in the performance of signal flow. There are two reasons of modifying the leftend from previous design; improving the signal speed and avoiding clock skew in a long signal transmission line. There are three requirements that should be met when the design is modified. The Clock, nReset and Test signals are changed accordingly to meet the requirements. First, the number of buffer stages is even to avoid the inversion of driving signal. Secondly, the design should be able to drive all the cells in a row. And lastly, the leftbuf cell will be a leaf cell. Thus, no hierarchy are existed inside this cell. The method to design this cell is by optimizing the individual transistor inside the buffer cell. The potential of clock skew is identified by analyzing the clock skew between output of single cell and a row of continuous D-types cells with a length of 2000 micrometers. The circuit construction for analyzing clock skew is shown below [ 1 ].
Figure 1 Circuit configuration for testing clock skew
Figure 1 shows that the signal OUT1 and OUT2 are the output from D-Types flip flop. This circuit shows that the first stage (OUT1) has only one load flip flop whereas the second stage has higher load due to a large number of flip flop. Thus, the signal Ck2 will be delayed if compare with Ck1. This might introduce race hazard. Figure 2 shows the occurrence of clock skew in a poor design [ 1 ].
Figure 2 Good CLK design (left ) and poor CLK design with clock skew (right)
Figure 2 on the left shows the output signal from a good design of leftbuf. On the other hand, the right plot shows the hazard by clock skew. There are two main criteria that can be identified from this figure. First, the good design (left) will have the change of output signal from zero to one for OUT1 appeared after Ck2 signal change from zero to one. It shows that the clock of Ck2 is lag with respect to Ck1 due to the amount of load for Ck2. But, Ck2 should change earlier than OUT1. Secondly, the signal OUT2 is only changing the value from zero to one in the second clock pulse. This is because the input for OUT2 depends on the changes of OUT1 and it will only change after next rising edge of clock. The design is aims to produce the accurate output, improving speed and avoiding the clock skew that might produce error in the next designing stage.
1.2.
Design strategy
Several strategies are designed to achieve the aim and objective of producing non hazard leftbuf cell. These strategies are grouped into four main categories to make it easier for analyzing the circuit. 1. Identifying the affected signal The leftbuf design consists of several input and output signal inside it. But not all the signal will be passed trouht the buffer inside it. The leftbuf signals consist of power signal (Vdd! and GND!), SDI, SDO, Clock, nReset and Test. The buffers are designed for only three signals; Clock, nReset and Test. These signals are used to activate the other cell and identify the sequence of operations. 2. Buffer stages The design of a buffer is developed by connecting several inverters together. There should be even number of inverters to produce the un-inverted output signal. The size of transistor for each stage is in increased from first stage to the last stage. The scale for each stage is determined by the load. The propagation delay for the design can be reduced by increasing the number of stages. 3. Designing the buffer The design of buffer is done using Magic layout tool software. The design process is started by producing the inverter with equal rise and fall time. The value of ris and fall time is determined using HSpice simulation. The size of PMOS and NMOS is adjusted to produce the same value of rise and fall time. The design should also consider the area that will be occupied by leftbuf with the buffer inside it. The transistor will be configured to achieve as small area as possible. The number of tap used inside the design also is increased to consider the increase of transistor size. 4. Simulation The last stage of design will be the simulation the output of leftbuf. The simulation should shows that there are no clock skew and the rise and fall time difference is small. The simulation is done using HSpice simulator.
1.3.
Design Process
The design phases for leftbuf cell design are described below. There are a total of six steps to complete the leftbuf design. 1.3.1. Equal rise and fall time first inverter design. The previous design inside the cell library has the fixed width for PMOS and NMOS. The PMOS width (Wp) is 2.4 m while NMOS width (Wn) is 1.4 m. The length of both transistors is 0.5 m. But for leftbuf cell, the transistor width is changed to produce the design with same rise and fall time. This will ensure that there is less timing effect when loading with a high number of cells. There are several procedure used to get the accurate value of rise and fall time difference. a) The initial design is based on NMOS width of 1.4 m and PMOS width of 2.4 m. Initially, the NMOS width is fixed and the PMOS width will be varied. This design is drawn in Magic. b) The simulation of the cell using fixed NMOS width and varying PMOS width is done using HSpice simulation. The stimulus data inside the .sp file is edited for the required input and output. The sweeping of PMOS in .spice file will give the estimated width for the smallest different between rise and fall time. The estimated width for PMOS after simulation (Wp = 5.8 m) is recorded. c) The width of PMOS transistor is edited to be 5.8 m inside Magic file. The file is saved and extracted again. d) The new extracted file is again simulated using HSpice to show the actual different in rise and fall time. The different between them is analised. If it produces a big difference between the rise and fall time, the NMOS transistor width will be changed, else the final design for inverter is ready. e) If there is a big difference of timing, the NMOS transistor width is changed inside .spice using parameter sweeping. The simulation parameter for NMOS width is recorded (Wn = 2.9 m). f) The Magic file is edited according to the new value. The inverter performance is analised again using HSpice simulations. In the final design, the value of rise time and fall time are nearly the same. The values are as below. Table 1 shows that the different of timing is far less using the new inverter design.
Table 1 Design parameter and the time difference Parameter Initial Design Final Design PMOS Width (Wp) 2.4 m 5.8 m NMOS Width (Wn) 1.4 m 2.9 m -11 -11 Fall Time 8.75210 S 8.590710 S 10 -11 Rise Time 1.18010- S 8.524610 S 11 -11 Difference 3.04810- S 0.066110 S
Figure 3 shows the rise and fall time of the final design. The input signal is in yellow while the output is in red. The output has an overshoot might be from the unloaded design. But the rise and fall time is nearly the same.
Figure 3 Simulation of final inverter design
1.3.2. Clock, nReset and Test stages calculation. The stages required are depending on the initial total cell width of the estimated design. The total cell width is estimated to be 2000 m. Thus, the nReset, Clock and Test signal should able to drive this total cell width without clock skew. a) nReset The nReset signal is attached with D-Type cell with three input connections. The width of D-Type inside the cell library is 26.1m.
WidthOfLoad
2000m 76.628 26.1m
Since nReset is attached to three input inside D-Type cell, then

TotalLoad 76.628 3 229.885
The scale factor for each stage by considering 4 and 6 stages is
4Stages 6Stages
4 6
229.885 3.894 229.885 2.475
b) Clock The Clock signal is attached with D-Type cell with two input connections. The width of D-Type inside the cell library is 26.1m. Then
WidthOfLoad
2000m 76.628 26.1m
Since Clock is attached to two inputs inside D-Type cell, so
TotalLoad 76.628 2 153.256
4Stages 6Stages
4 6
153.256 3.518 153.256 2.313
c) Test The Test signal is attached with SMux3 cell with two input connections. The width of SMux3 inside the cell library is 18.8 m. Then
WidthOfLoad
2000m 106.383 18.8m
Since Test is attached to two inputs inside D-Type cell, so

TotalLoad 106.383 2 212.766
4Stages 6Stages
4 6
212.766 3.819 212.766 2.443
1.3.3. Selection of the number of stages and scale constant The number of stages is depending on the scale factor. The value of scale factor is usually between 2.5 to 6. From the value of scale factor in Step 2, the suitable number of stages is four. On the other hand, the scale factor should be bigger than the calculated value to avoid the increase of this factor at the last stage. The selected scale factor for these signal are: a) nReset: 4.3 b) Clock: 3.8 c) Test: 4.2
1.3.4. Calculation of width of PMOS and NMOS for each signal The final width of each stage is calculated from the initial width of PMOS/NMOS transistor and the scale factor.
Table 2 nReset, Clock and Test width
Stage Wp Wn nReset (Scale Factor:4.3) 1 2 3 4 5.8 24.94 107.24 461.14 2.9 12.47 53.62 230.57 1 5.8 2.9 Clock (Scale Factor:3.8) 2 3 4 24.36 102.31 429.71 12.18 51.15 214.85 1 5.8 2.9 Test (Scale Factor:4.2) 2 3 4 22.04 83.75 318.25 11.02 41.87 159.12
1.3.5. Designing leftbuf in Magic The design of leftbuf cell is constructed from the calculated value of transistor width in previous steps. There are some considerations applied when designing the leftbuf cell. a) The buffer is design in one row. The left buffer is for nReset, followed by Clock and Test buffer. The modification of cell will be easier if there are any mistakes arise in the simulation if we separate these three buffers location. b) The width the cell can be reduced by using the zigzag transistor shape. Thus the performance of the design can be optimized ( See Figure 4).
Figure 4 The dimension of NMOS (left) and PMOS (right) in micrometers
The dimension of Yn (in NMOS) and Yp (in PMOS) are determined using the total transistor width. The value of 5 and 10 are from the number of transistor in horizontal direction for NMOS and PMOS respectively. Then
TotalWidth NMOS 2 19 4 27 5 TotalWidth PMOS 2 19 9 27 Yp 10 Yn

c) Each stage is connected with a metal1 layer between them. Metal2 layer is used as the connection between the buffer output and the cell output to avoid the connection between horizontal and vertical metal layer. d) The number of contact and tap are increased to compensate the larger width of the transistor. e) The source and drain for the transistors will follow the zigzag shape to increase the cell performance. It allows the same transfer of electron between source and drain in the hardware design. The overall design of leftbuf cell is shown in Figure 5.
Figure 5 Leftbuf cell
1.3.6. Simulation of leftbuf cell using Magic The final design of leftbuf is simulated to analise the existence of clock skew. If it happens, then the design is reconstructed. The magic cell that consists of one load and 2000 micrometer load is constructed to test the output signal. The design is the same as the circuit shown in Figure 1. Then, a stimulus file is created to test the behaviour of the output for the rising and falling inputs.
Figure 6 Leftbuf cell with one load and 2000 micrometers load
Figure 7 shows the analysis of the output signals (OUT1 and OUT2) for rise and fall time. The OUT1 signal should be rise when IN and Ck1 are SET (signal become 1). OUT2 will change to 1 in the next clock cycle because D-type will trigger based on the input from OUT1.
Figure 7 Rise and fall time simulations
10
Figure 8 Rise time (0-10 nanoseconds)
Figure 9 Fall time (10-20 nanoseconds)
The simulation of leftbuf shows that the signals does not have clock skew. Figure 7 shows the behaviour of signal for rising of IN signal and falling of IN signal. Better signal is shown by Figure 8 and Figure 9. OUT1 is rising in the first clock cycle while OUT2 is rising in the second clock cycle. But there is a difference in settling time between them. OUT1 has a slower settling time compare to OUT2. This might be due to the parallel connection between row one and two as shown in Figure 6. From this simulation, the design is concluded to perform operation without producing clock skew that will affect the overall design for microprocessor.
1.4.
Design Optimization
Although design able to improve speed and avoid clock skew is completed, there is some optimizations steps to produce better leftbuf cell with smaller size and better performance. This optimization can be implemented in the future due to the time constraint of designing the complete microprocessor architecture. Here is the list of suggested future optimization for leftbuf cell.
11
a) The connection inside the cell can be rearranged to reduce the overall cell size. For instance, stage 1 and 2 can be combined in one column to reduce the cell width. Moreover, the location of buffer for nReset, Clock and Test can be reallocated. b) The zigzag shape can be changed with better shape to produce longer transistor width with a smaller size. c) Design can be reconstructed to produce better output signal settling time (e.g. OUT1 has slow response compare to OUT2).
2. Datapath Design
The design of the datapath was divided into four parts: the first part is the ALU (in the red box), as shown in Figure 10, the second part is called reg (yellow region), the third part is called IR (in the green box) and the fourth part is the topcell (in the blue box) that contains the Flag register and other logic.
Figure 10. Datapath architecture, showing the four parts used for simulation.
Figure 11 shows the complete designed Datapath. In this figure, the different blocks that constitute the datapath are shown, such as, reg (reg and reg0), IR (IR0, IR1, IR2 and IR3), topcell (set of cells at the up-right) and ALU sections. In addition, the rightend (small cells at the right of each ALU), and leftbuf cells are also shown. The final datapath implementation occupies an area equal to 137925.2 m2 (1113.5m x 1215.2m).
12
Figure 11 Complete Datapath Structure
2.1.
ALU (Arithmetic-Logic Unit) design
The design of ALU is illustrated in Figure 12, with the inclusion of OR gate for Flag Z. The inputs of the ALU are: A, B, Cin, Aplus, Zi, control, A+, S2, S1 and S0. A and B are the operands for the current operation; Cin is the carry in of the add/subtract unit, this input is connected to the carry out of the next bitslice, except for the first bitslice where is connected to the input control; Zi is the ripple zero flag input, which is connected to the previous Zo, except for the very first bitslice where is connected to zero; control sets the functionality of the add/subtract unit 1 for subtraction and 0 for addition; A+ is the input for shift right operation and it is connected to the input A of the next bitslice, except for the last bitslice where is connected to zero; S2 S0 and S0 select the operation to be done according to Table 3. The outputs of the ALU are: Z, Cout, Zo. Z is the output of the selected operation; Cout is the carry output of the add/subtract unit unit; Zo is the ripple flag zero output.
13
Figure 12. Arithmetic Logic Unit (ALU).
Table 3. ALU Operation selection
S2 S1 S0 00X 010 010 011 100 101 110 111
Control X 0 1 X X X X X
Operation selected LD ADD SUB SRR XNOR XOR OR AND
Syntax Z<=B Z<=A+B+Cin Z<=A-(B+Cin) Z<=A+ Z<= !(A B) Z<= A B Z<= A B Z<= AB
Figure 13 shows the simulation results corresponding to the given input signals. Thus, when control is 0, the addition of (A + B) is done as follows: 0 + 0 = 0, Cout is 0, 0 + 1 = 1, Cout is 0, 1+ 0 = 1, Cout is 0, 1 + 1 = 0, the Cout is 1.
14
Figure 13. ALU simulation results
When control is 1 the subtraction is doing the following steps, notice that the Cin is always 1.Thus, as shown in Figure 13, the result of the subtraction (A B) is ADDSUB. 0 0 = 0, and the Cout is 1 (because Cin is 1), 1 0 = 1, and Cout is 1(because Cin is 1), 1 1 = 0, and Cout is 1(because Cin is 1), 0 1 = 1, with a borrow bit from Cin (thus Cout is 0). For the other logic operations (AND, OR, etc), the simulation showed successful results. For example: the logic operation AND has the following results as they are illustrated in Figure 13. 00=0 01=0 10=0 11=1 From the above discussion, it can be concluded that the ALU is working successfully. The final implementation of the ALU is shown in Figure 14.
Figure 14. Final implementation of the ALU
15
2.2.
Bitslice Reg design
In the 16-bit datapath construction, there are two different configurations for the bitslice reg, which are called: reg0 and reg. Reg0 is just used for the first bitslice (bitslice0). The reason is the input carry of the adder in reg0 is connected to low, whereas the input carry of the adder in reg is connected to the output carry of previous reg bitslice. Figure 15 illustrates the structure of the bitslice reg. This bitslice contains the following inputs: EN_Addr ,EN_Data, LoadPC, LoadR , LoadR2, LoadR3, LoadR4, LoadR5, LoadR6, SEL_ALU, SEL_Add, SEL_IMN (multiplexer), SEL_S1[7:0], SEL_S2[7:0], Unsign, mux and outalu. In addition, the following inputs are not shown in Figure 15: Clock, nReset, SDI and Test. Clock and nReset (not shown in Figure 15) are the trigger and reset inputs of the registers respectively. SDI and Test (not shown in Figure 15) are test inputs of the registers to perform a scan test. EN_Addr is the control input of a tristate buffer to connect the current memory address to the SYSTEMBUS. Similarly EN_Data is the control input of the tristate buffer to connect a data from the registers to the SYSTEMBUS. LoadPC, LoadR1, LoadR2, LoadR3, LoadR4, LoadR5 and LoadR6 are the load signals for each general purpose registers where the names have its natural meaning. Unsign is the output of the Unsig unit in bitslice IR described below. SEL_IMN is the control input of a multiplexer to select data from the Unsign input (short immediate addressing) and the output of a general purpose register (RegisterRegister addressing). SEL_ALU selects the input B of the ALU from the data available in the SYSTEMBUS (data from memory) or from the output of the multiplexer described immediately before. SEL_Add is the control input of a multiplexer to select the current address from the output of the PC or from the addition of the output of a general purpose register plus the Unsign input data (indexed addressing). SEL_S1[7:0] and SEL_S2[7:0] are used to select the output of the general purpose registers to be connected to two buses as illustrated in Figure 15. The input mux is the data to be stored in the program counter PC and it is an output of the bitslice IR. Finally outalu is the input of the general purpose registers and it is connected to the ALU output in bitslice ALU. The outputs of the bitslice reg are: A, B, SYSTEMBUS, and SDO. SDO (not shown in Figure 15) is the output of the scan test. A (which is the same as S1) and B are the operands of the ALU and they are connected to the inputs A and B of the bitslice ALU. SYSTEMBUS is the datapath output to be connected to the memory interface system.
16
Figure 15. Bitslice reg architecture
In the next discussion, simulation is done on the bitslice reg as shown in Figure 16 and Figure 17.
Figure 16. Bitslice reg simulation with SEL_IMN = 0, SEL_ALU = 0, and SEL_Add = 1.
17
Figure 17. Bitslice reg simulation with SEL_IMN = 1, SEL_ALU = 1, and SEL_Add = 0.
Loading the general purpose registers (PC, R6, R5, R4, R3, R2, and R1) is done using LoadPC and LoadR1 to LoadR6. Next, the output of each register is selected by its corresponding binary number, as shown in Table 4.
Table 4. Output Assigment of each register.
PC R6 R5 R4 R3 R2 R1 R0
S1/S2 <7> 1 0 0 0 0 0 0 0
S1/S2 <6> 0 1 0 0 0 0 0 0
S1/S2 <5> 0 0 1 0 0 0 0 0
S1/S2 <4> 0 0 0 1 0 0 0 0
S1/S2 <3> 0 0 0 0 1 0 0 0
S1/S2 <2> 0 0 0 0 0 1 0 0
S1/S2 <1> 0 0 0 0 0 0 1 0
S1/S2 <0> 0 0 0 0 0 0 0 1
SEL_S1 /S2[7:0] 128 64 32 16 8 4 2 1
As it can be seen from Figure 16, the outputs outalu and mux are kept to 1 to activate the registers R1 - R6 and PC, respectively. When the R6 is selected with the LoadR6 = 1, the output of ALU (outalu =1) is transmitted to the R6, then by selecting SEL_S2 of this register (with the corresponding value of 64 in decimal as shown in Table 4), S2 becomes equal to 1. Next, SEL_IMN and SEL_ALU are reset, so the data coming from S2 can be connected to B, where B is 1. B is 0 when SEL_S1 is selected. The same procedures have been applied to test R1 to R5, the simulation results are successfully indicated as shown in Figure 16. In the design the register R0 is not a proper register, this register is implemented as a wire connected to low. Thus, when SEL_S1 or SEL_S2 are selected equal to 1, S1 (A) and S2 are showing 0. When SEL_S1 is equal to 128, PC is selected. However, the value of ADDRESSBUS is always following the output of PC. Next, EN_Data is kept 0; EN_Addr and SEL_Add are set to 1, so that the output of the adder in datapath can be shown at SYSTEMBUS. Thus, when Unsign and S1 are both 1, the SYSTEMBUS indicates 0 since 1 + 1 = 0.
18
For Figure 17, the simulation is showing the cases when SEL_IMN = 1, SEL_ALU = 1, and SEL_Add = 0. As it is illustrated in Figure 17, the SYSTEMBUS signal (with EN_Addr = 1) is following the ADDRESSBUS signal, where they are both equal to 0 until the PC is selected and then its value (which is 1) is transmitted to ADDRESSBUS and SYSTEMBUS. In addition, B signal is following the SYSTEMBUS signal as expected. Figure 18 and Figure 19 illustrate the final implementation of the bitslice reg0 and reg, respectively.
Figure 18. Final implementation of the bitslice reg0
Figure 19. Final implementation of the bitslice reg
2.3.
Bitslice IR design
The bitslice IR architecture is shown in Figure 20. In the datapath construction there are four different bitslice configurations corresponding to the four different IR, which are IR0, IR1, IR2, and IR3. IR0 is just used for the first bitslice (bitslice0). The reason is the input carry of the adder in IR0 is connected to a low signal, whereas the input carry of the adder in IR1, IR2, and IR3 are connected to the output carry of the previous bitslice IR. Besides, one of input of multiplexer as shown in Figure 20 in red circle is given high only at IR0, the remaining bitslices (IR1, IR2 and IR3) will be connected to low. The bitslice IR contains the following inputs: TrisMem, LoadIR, SEL_Offset, SEL_PC, ADDRESSBUS and outalu. In addition the bitslice IR has the following inputs (not shown in Figure 20): Clock, nReset, SDI and Test which have the same function as explained above. TrisMem is the control input of a tristate buffer to allow connecting the SYSTEMBUS to the instruction register (IR) when an instruction is coming from memory. LoadIR is the load signal for the instruction register. SEL_Offset is the control signal of a multiplexer to select the increment of the PC in either one (sequential execution) or a data from the sign extend unit (for control transfer instructions). SEL_PC is the control signal of a multiplexer to select the next content of the PC either the ALU output or the calculated PC offset by the adder in
19
Figure 20. ADDRESSBUS is the current PC data from the bitslice reg. Finally outalu is the output of the ALU from the bitslice ALU.
Figure 20 Bitslice IR architecture
The outputs of the bitslice IR are: Unsign, Opcode, and mux. Unsign is the output of the unsigned unit described below. Opcode contains the sufficient information (14 bits) about the current instruction to be used by the Control Unit to assign all control signals. Finally mux is the next content of the Program Counter register (PC) and it is connected to the bitslice reg. Due to the configurations of sign and unsign extend units in bitslice IR, as shown in Figure 20, it has been separated to three parts, which are IR1, IR2, and IR3. The bitslice IR1 has the same architecture of Figure 20. Then, IR2 is applied with sign extend configuration only (no unsign extend) and IR3 is constructed without any extend (no unsign and sign extend). The reason is sign extend used in datapath is 12bit, whereas unsign extend is 5-bit. The connection of the bits will remain up to the twelfth bit (from 0 to 11) only, after that these connections are cancelled for remaining bits (from 12 to 15) which are connected to the bit twelfth (sign extension), as shown in Figure 21 (a). This condition is the same for the unsign extend connection. The bit connections will be cancelled at the fifth bitslice (0 to 4) and being connected to ground for the remaining ones (unsing extension), as shown in Figure 21 (b).
20
Figure 21 (a) Sign extend connection (b) Unsign extend connection
Figure 22 Bitslice IR simulation results.
Figure 22 shows the result of the simulation over this bitslice. As it can be seen from Figure 22, when SEL_PC is equal to 1, the value of mux is following the value of outalu. Whereas, when SEL_PC is equal to 0 the value of mux is following the output value of the fulladder. For SEL_Offset equal to 0 and ADDRESSBUS equal to 0, the output of the fulladder is always 1. Now, keeping the last condition and SEL_PC equal to 0, thus mux is equal to 1, as shown in Figure 22. Once SEL_Offset is 1, LoadIR equal to 0 and ADDRESSBUS keeping in 0, the fulladder input will follow the output of the register IR, and the result is 0 (as 0 + 0 = 0). Next, when LoadIR is 1 the value of the register IR, which is 1 as SYSTEMBUS and TrisMem are selected to be both always at 1, is transmitted to the input of fulladder showing mux equal to1. For the case of SEL_PC at 0, SEL-Offset and ADDRESBUS both at 1, LoadIR equal to 1, the output mux is 0 as 1 + 1 = 0. The final implementation of the bitslice IR0, IR1, IR2, and IR3 are shown in Figure 23 to Figure 26, respectively.
21
Figure 23 Final implementation of the bitslice IR0
22
23
2.4.
Topcell design
The top cell contains two different parts: the function conversion circuit and the flags management circuit. The function conversion circuit is required as the ALU has four bits for selecting the functionality and the Control Unit selects this functionality with three bits (8 different ALU functions). Table 5 shows the Control Unit output codes and the ALU input codes. According to Table 5 the function conversion circuit was designed and is shown in Figure 27. Obviously the circuit was much simpler if the codes F2, F1 and F0 matched the codes S2, S1 and S0, respectively. The unmatched codes was discovered when the Control Unit had already been synthesised and the complete datapath finished, so designing the function conversion circuit was more convenient that change tested implementation. Furthermore, the total area of the implementation is not increased, since there is plenty of space for placing the top cell as it can be seen in Figure 11.
Table 5 Conversion from Control Unit outputs to ALU inputs.
Function LD ADD SUB AND OR XOR XNOR SRR
Control Unit Outputs F2 F1 F0 000 001 010 011 100 101 110 111
ALU inputs S2 S1 S0 control 000 X 010 0 010 1 111 X 110 X 101 X 100 X 011 X
Figure 28 shows the flags management circuit. The Z flag comes from the datapath and it is the result of a NOR function among all the bits of the ALU output. Therefore, the Z flag is 1 if the every bit in the ALU output is 0, otherwise Z flag is 0. Then, the Z flag is connected trough an inverter from the datapath to the topcell. On the other hand, the value of C flag is more complicated owing to the C flag not only depends on the output of the carry but also on the A<0> (LSB bit of the ALU input A) and on any executed logic instruction. Specifically C flag is updated depending on tree cases: first, C flag is equal to the output carry of the last add/subtract unit if an arithmetic (ADD or SUB)1 instruction is executed; second, C flag is equal to A<0> (LSB bit of the ALU input A) if a right shift instruction (SRR) is performed; and third C flag is 0 if any logic instruction (AND, OR, XOR and XNOR) is executed; otherwise is not cared. The implementation of this logic is shown in Figure 28 taking into account the codes shown in Table 5.
Rigorously speaking C flag is not updated if an ADD instruction is executed and any source register (RS1 or RS2) is the register R0 (see Programers guide for details). But this functionality is handled for the control unit and not for the topcell circuit.
24
Figure 27 Function conversion circuit.
Figure 28 Flags management circuit.
25
Figure 29 illustrates the simulation results for the topcell circuit. All possible combination of the inputs F2, F1, F0 was covered and it can be seen that the response of the S2, S1, S0 is equal to the described in Table 5, the signal control is connected directly as Function<1>. The output Flag<0> is the Z flag and it can be seen from Figure 29 that it is updated in each positive clock pulse as the inversion of the input Z (Zout in Figure 29) The output Flag<1> is the C flag and it can be seen from Figure 29 is updated in each positive clock pulse as: the input C (Cout in Figure 29), if the input vector Function (F2, F1 and F0) is either 001 (ADD instruction) or 010 (SUB instruction); a low level, if the vector Function is 011, 100, 101 or 110 (AND, OR, XOR or XNOR respectively); the input A<0>, if the vector Function is 111 (SRR instruction)
Figure 29 Topcell simulation results.
Figure 31 shows the final implementation of the topcell circuit. Figure 30 shows the topcell zooming in of the topcell magic cell.
Figure 30 Final implementation of the topcell.
Figure 31 Zoom in of the topcell magic cells.
After the datapath was completed this was simulated with the behavioural control unit using the testing programs (see Programmers Guide for details) with successful results.
26
2.5.
Execution of Instructions
The basic unit of executing an instruction is called a machine cycle. Some instructions take one machine cycle while others take two. A machine cycle can be either of states Fetch or Execute (with binary codes 0 and 1 respectively). Each machine cycle is divided into four sub-states i.e. Address Setup, Address Hold, Data Setup and Data Hold (with binary codes 00,01,10,11 respectively). Note that gray-coded counting scheme is being used in order to avoid possibility of glitches between sub-state transitions. Figure 32 shows the basic diagram of the designed state machine with the states, substates and the control signals asserted in each one.
IF_Add_Setup ALE, EN, EN_Add
IF_Add_Hold ME, EN , EN_Add IF_Data_Setup ME, OE, LoadIR, LoadR, Sel_Offset IF_Data_Hold OE, Flags, Function, LoadPC
Yes
2nd memory access? No
No
ST? Yes EX_Add_Setup ALE, EN, Sel_Add, EN_Add EX_Add_Hold ME, EN, Sel_Add, EN_Add EX_Data_Setup ME, EN, WR, EN_Data, Sel_S2 (RS)
LD? Yes EX_Add_Setup ALE, EN, EN_Add EX_Add_Hold ME, EN, Sel_Add, EN_Add EX_Data_Setup ME, OE, TrisMem, Sel_ALU, Sel_S1
No
EX_Add_Setup ALE, EN, EN_Add EX_Add_Hold ME, EN, EN_Add EX_Data_Setup ME, OE, LoadR, TrisMem, Function, Flags, Sel_ALU EX_Data_Hold f
EX_Data_Hold f EN, WR, EN_Data, Sel_S2 (RS)
EX_Data_Hold f OE, LoadR
OE, LoadPC, LoadR
Figure 32 ASM flow of Instructions
27
2.5.1. Instruction Mapping This section explains the flow of data through components and busses of datapath for all instructions. 2.5.1.1. Store instruction mapping: ST Rd, [Rs1, const] This instruction is a single word instruction. Rd, Rs1 are 3-bit codes representing one of the addressable registers from the register bank whereas const is a 5-bit unsigned value. It takes 2 machine cycles to get executed; hence mode bit (m) is set to 1. In first machine cycle the address (where data is to be stored) is calculated. The mapping of this instruction on the datapath is illustrated in Figure 33. The address is the sum of value stored in source register Rs1 (path highlighted green) and unsigned extension of const (path highlighted blue). Calculated address (path highlighted red) is then latched into the Address latch (outside the processor). In second machine cycle, value stored at Rd is moved onto the data lines of the memory unit (path highlighted yellow).
TO CONTROL UNIT Opcode [13:0] (= IR [15:2]) R0 R1 14 R2 R3 R4 5 UNSIGN EXTEND SIGN EXTEND 1 1 1 0 S 0 S 1 R5 R6 PC 0 1 0 1 Z C ALU
12
IR 0
System Bus
Data_in FROM MEMORY
Data_out TO MEMORY
Figure 33 Store instruction mapping
2.5.1.2.Load instruction mapping: LD Rd, [Rs1, const] This instruction is a single word instruction. Rd, Rs1 are 3-bit codes representing one of the addressable registers from the register bank whereas const is a 5-bit unsigned value. It takes 2 machine cycles to get executed; hence mode bit (m) is set to 1. In first machine cycle the address (where data is to be read) is calculated. The mapping of this instruction on the datapath is illustrated in Figure 3. The address is the sum of value stored in source register Rs1 (path highlighted green) and unsigned extension of const (path highlighted blue). Calculated address (path highlighted red) is then latched into the Address latch (outside the processor). In second machine cycle, value read from memory is stored at register Rd (path highlighted yellow). During subroutine management this instruction can be used to load values in PC as well.
28
12
IR 0
System Bus
Data_in FROM MEMORY
Data_out TO MEMORY
Figure 34 Load instruction mapping
2.5.1.3.Register to register instruction mapping: ADD Rd, Rs1, Rs2 This instruction is a single word instruction. Rd, Rs1 and Rs2 are 3-bit codes representing one of the addressable registers from the register bank. It takes one machine cycle to get executed; hence mode bit (m) is set to 0. The mapping of this instruction on the datapath is illustrated in Figure 35. The ALU inputs are selected to be two registers indicated by codes Rs1 and Rs2 (paths highlighted blue and green respectively). The ALU result is stored in register Rd (path highlighted yellow). It is important to notice that PC can be used as both destination and source register. Note as well, the instruction ADD was taken as an example as SUB and all logic instructions are suitable for this mapping.
12
IR 0
System Bus
Data_in FROM MEMORY
Data_out TO MEMORY
Figure 35 Register to register mapping
29
2.5.1.4.Long immediate instructions mapping: ADD Rd, Rs1, const This instruction is a two word instruction. Rd, Rs1 are 3-bit codes representing one of the addressable registers from the register bank whereas const is a 16-bit unsigned value. It takes 2 machine cycles to get executed; hence mode bit (m) is set to 1. The mapping of this instruction on the datapath is illustrated in Figure 36. The first operand being one of the registers specified by Rs1 (path highlighted green), second fetch is required to bring the 16-bit unsigned constant (path highlighted blue). Once both operands are ready, the result calculated by ALU is stored in Rd (path highlighted yellow). It is important to notice that PC can be used as both destination and source register. Note as well, the instruction ADD was taken as an example as SUB and all logic instructions are suitable for this mapping.
12
IR 0
System Bus
Data_in FROM MEMORY
Data_out TO MEMORY
Figure 36 Long immediate instructions mapping.
2.5.1.5.Short immediate arithmetic instructions: ADDSI Rd, Rs1, const This instruction is a single word instruction. Rd, Rs1 are 3-bit codes representing one of the addressable registers from the register bank whereas const is a 5-bit unsigned value. It takes one machine cycle to get executed; hence mode bit (m) is set to 0. The mapping of this instruction on the datapath is illustrated in Figure 37. The first operand being one of the registers specified by Rs1 (path highlighted green), second operand is the 5-bit unsigned extended constant (path highlighted blue). The result calculated by ALU is stored in Rd (path highlighted yellow). It is important to notice that PC can be used as both destination and source register. Note as well the instruction ADDSI was taken as an example as SUBSI is suitable for this mapping.
30
12
IR 0
System Bus
Data_in FROM MEMORY
Data_out TO MEMORY
Figure 37 Short immediate arithmetic instructions mapping.
2.5.1.6.JMP offset and conditional True Jumps mapping The format of these instructions is as follows, OPCODE offset OPCODE can be either of the unconditional or conditional branches while offset is 12-bit signed value. The mapping of this instruction on the datapath is illustrated in Figure 38. PC is loaded with the sum (path highlighted yellow) of its current value (path highlighted green) plus sign extended offset (path highlighted blue). This mapping is performed in a unconditional jump instruction or if the condition of a conditional jump instruction is true.
12
IR 0
System Bus
Data_in FROM MEMORY
Data_out TO MEMORY
Figure 38 JMP offset and conditional True Jumps mapping
31
2.5.1.7.False conditional jumps mapping If the condition of jump instruction is false the data flows on the datapath as illustrated in Figure 39. The content of PC (path highlighted yellow) is loaded with incremented value (paths highlighted green and red) to fetch the next instruction in the memory. This mapping is used in normal operation of the processor to increment the PC and to fetch the sequence of instructions in memory as well (path highlighted blue).
12
IR 0
System Bus
Data_in FROM MEMORY
Data_out TO MEMORY
Figure 39 False conditional jumps mapping
32
3. Control Unit
Control Unit is an important part of a microprocessor. Control unit can be thought of as a Finite State Machine (FSM). It is responsible for the flow of data through the datapath. Thus, it generates control signals required to control this flow of data. The control signals thus generated are responsible for controlling other devices in the system like general purpose registers, arithmetic and logic unit, instruction register and busses. The functions performed by the control unit are greatly dependent on the internal architecture of the microprocessor, described in Section 2 for this case, because it is the control unit that is responsible to make that architecture work as desired.
3.1.
Description of Control Signals
[2:0]Function This control signal determines the function performed by the ALU as illustrated in Table 5. [2:0]Sel_S1, Sel_S2 These signals are the select lines for the two sets of tristate buffers that are responsible to select ALU inputs. (see Section 2.2) LoadF, LoadPC, LoadIR, LoadR1, LoadR2, LoadR3, LoadR4, LoadR5, LoadR6 These are the Load signals for Flags register, Program Counter, Instruction Register and general purpose registers R1 to R6 respectively. ENB, nME, ALE, RnW, nOE These control signals are responsible for reading/writing data from/to the memory. Figure 40 illustrates the sequence of generation of these signals during memory read/write cycles. En_Addr, En_Data, TrisMem These control signals are the control lines for three tristate buffers in the datapath (See Section 2.2 and 2.3 for details). Sel_Imm, Sel_Offset, Sel_PC, Sel_ALU, Sel_Addr These control signals are select lines for various multiplexers in the datapath (See Section 2.2 and 2.3 for details).
33
Figure 40 Memory Read/Write Operations [ 2 ]
3.2.
Synthesis of Control Unit
The behavioural model of control unit needs to be synthesised before it can be fabricated. Synthesis can either be done using MAGIC of L-Edit. L-Edit synthesis allows making multiple rows of cells therefore causing reduction in the total area of the synthesised design. Therefore, L-Edit was used to accomplish the Placement, Routing and Synthesis of the behavioural model of control unit. Following sequence of steps is required to do the synthesis of behavioural model using L-Edit on UNIX platform as it can be followed in [ 3 ]. After following all these steps the synthesised control unit results and it is shown in Figure 41. The final dimensions of the control unit were 1037.4m x 222m for an area equal to 230302.8 m2.
Figure 41 Results of the synthesis of the Control Unit
Once the synthesis process was finished the control unit was simulated with the behavioural datapath using the testing programs (see Programmers Guide for details) with satisfactory results. Afterwards, the synthesised control unit was simulated with the extracted datapath and everything run nicely.
34
3.3.
Place & Route
First the place and route of the connections between the Control Unit and the Datapath was achieved. First, several position configurations of these blocks were routed to study the configuration spending less area. The different configurations tested are shown in Figure 42. All the configurations have similar results in routing achievement and in area. The configuration with less area and without routing problem was c) and this was chosen for the final routing. In addition, this configuration gives more flexibility to connect Vdd and Gnd between the two modules.
Control
Datapath Control
Datapath
a)
b)
Datapath Control
Datapath
Control c) d)
Figure 42 Multiple Control-Datapath configurations tested.
The final routing module between Control and Datapath is called CPU core and it is illustrated in Figure 43. The minimum area achieved for this module was 1836096.9 m2 (1173m x 1565.3m). The CPU core was extracted and simulated successfully using the testing programs (see Programmers Guide for details).
35
Figure 43 final CPU core
Afterwards, the place and route between the provided Pad-Ring and the CPU core was accomplished which results are shown in Figure 44. The minimum Pad-Ring dimensions to accomplish the place and route process successfully were 1410m x 1681.9m for a total area of 2371479m2. As it can be seen in Figure 44 few area is wasting in the Pad-Ring as the minimum width available was utilised. Therefore, probably this is the most optimum area achievement for this design. The complete design was simulated satisfactory applying the testing programs
36
Figure 44 Place and route of the Pad-Ring and the CPU core
37
4. Microprocessor Test
This processor has of 16 instructions in the instruction sets. Each instruction needs to be verified and ensure that the processor has the accurate architecture and datapath design. The program for testing the operation of the processor is written in verilog. This is a simple program with a specific output and it will test all the instructions and will store the last value in the LEDS. The considerations of this program are listed below which were included in the program. 1. The arithmetic and logic instructions have two modes of operation while the others have only one operation mode. The two modes of all arithmetic and logic instruction were tested. 2. All the internal registers in loading and storing operation were checked. 3. The conditional jump instructions are checked for both true and false conditions. The tester program is shown below with comments regarding the contents of the registers. The program tests all the instruction with all possibilities covered for each instruction such as: (besides from described above) addition/subtraction with/without carry/borrow, shifting register right with LSB 0/1 input, large numbers were used for testing arithmetic/logic operations, big numbers were used to test short immediate operations, among others.
SWITCHES = 2 048 SWITCHES VALUE = 44 853 LEDS = 2 560
// comment showing the expected result

START ADD LD ADDSI SUBSI AND SUB XOR OR XNOR ADD ADD SUB XOR XNOR AND OR SRR OR JE AND JE LOC_2 ADD JC ADD JC R3, R0, SWITCHES R1, [R3, 0] R2, R1, 21 R1, R1, 31 R4, R1, R2 R5, R2, R4 R6, R4, R5 R1, R5, R6 R2, R6, R1 R4, R5, R3 R1, R4, 15018 R3, R2, 34062 R2, R6, 61680 R5, R1, 3855 R6, R3, 43690 R4, R5, 21845 R1, R2 R1, R3, 34082 START1 R2, R4, R0 LOC_1 R3, R5, R3 START2 R4, R6, R1 LOC_3 // // // // // // // // // // // // // // // // // // // // // // // // // R3 = 2 048 R1 = 44 853 R2 = 44 874 R1 = 44 822 R4 = 44 802 R5 = 72 R6 = 44 874 R1 = 44 874 R2 = 65 535 R4 = 2 120 R1 = 17 138 R3 = 31 473 R2 = 24 506 R5 = 45 570 R6 = 10 912 R4 = 63319 R1 = 12 253 R1 = 65 523 NOT JUMP R2 = 0 JUMP TO LOC_1 R3 = 33 677 NOT JUMP R4 = 65 508 JUMP LOC_4
38
LOC_1 SUB JNC SUBSI JNC LOC_3 XOR JNE XNOR JNE ADD ADD ADD ADD ADD ADD LAST ADD ADD ADD ADD ADD ADD ST ST ST ST ST ST LD LD LD LD LD LD ADD ADD ST
R5, R1, START3 R6, R2, LOC_2 R0, R3, START4 R1, R4, LAST R1, R0, R2, R0, R3, R0, R4, R0, R5, R0, R6, R0, R1, R0, R2, R0, R3, R0, R4, R0, R5, R0, R6, R0, R6, R1, R5, R2, R4, R3, R3, R4, R2, R5, R1, R6, R6, R1, R5, R2, R4, R3, R3, R3, R2, R2, R1, R1, R1, R3, R3, R0, R1, R3,
R4 15 33677 R5 R0 R0 R0 R0 R0 R0 300 301 302 303 304 305 31 24 18 12 6 1 31 24 18 18 24 31 R5 LEDS // // // // // // // // // // // //
// // // //
R5 = 2 204 NOT JUMP R6 = 65 521 JUMP TO LOC_2 // Z = 1 // NOT JUMP // R1 = 2 183 // JUMP TO LAST
IGNORE IGNORE IGNORE IGNORE IGNORE IGNORE R1 = 300 R2 = 301 R3 = 302 R4 = 303 R5 = 304 R6 = 305
// // // // // // // // //
R6 = 300 R5 = 301 R4 = 302 R3 = 302 R2 = 301 R1 = 300 R1 = 603 LOAD R3 WITH LEDS ADDRESS PUT R1 INTO LEDS
Figure 45 Behavioural simulation
Figure 46 Extracted simulation
39
The program was tested using the behavioural model of the microprocessor to verify its functionality. The simulation shows that the flow of program follows exactly the written program in verilog (see Figure 45). Every time the PC is changed, the output is observed for any changes of the internal registers according to the current instruction. After all the sequence was already verified, the program was simulated using the extracted model. Both simulations for behavioural and extracted are showing that the value stored into LEDS is 603. These simulations show that all the instruction in this microprocessor are able to execute accurately. Even though this is not an exhaustive test, the test could have pointed some problems in the design, fortunately the microprocessor pass all the tests performed. For more exhaustive testing more conditions must be taken into account i.e. test all instruction with all possible combinations of source and destination registers, test the execution of all instructions after the execution of every instruction with all possible combinations. Clearly a test of this magnitude could take too much time to simulate, and it is beyond of the scope of this design.
40
Summarising and Conclusions

All the involved steps in the implementation of a new microprocessor were described. The design of the left buffer was presented and the simulation demonstrated no clock skew. Each piece of the datapath was described and some simulations to verify proper behaviour. The mapping of every instruction was presented on the designed datapath. The implementation process of the design onto silicon was presented. A successful synthesis of the control unit was shown and the place and route process with the datapath and afterwards with the provided pad ring. In each stage simulations were performed to verify the correct functionality of the system. Finally the entire system was tested and the results of these tests were satisfactory.
41
References
[ 1 ] McNally I. Left End Buffer Design. Available from: http://users.ecs.soton.ac.uk/~bim/notes/fcde/left_buffer_07.html [cited 22 May 2007]. [ 2 ] McNally I. VLSI Design Project 2006/2007 - Design Phase Microprocessor Specification Available from: http://users.ecs.soton.ac.uk/~bim/notes/fcde/upspec_07.html [cited 23 May 2007]. [ 3 ] McNally I. L-Edit Place and Route[online] Available from: http://users.ecs.soton.ac.uk/~bim/notes/cad/ledit/place_and_route.html [cited 23 May 2007]. [ 4 ] Weste N., Harris D. CMOS VLSI Design: A circuit and System Perspective. Addison Wesley. 2005
42
Appendix A Division of Labour Task 1 2 3 4 5 5 7 8 9 10 11 12 13 14 14 Cell library Completion (End of Row Cells) Initial Design Verilog Behavioural Model Multiply Program Magic Datapath Verilog Cross Simulation Control Unit Synthesis Magic Control Unit Final Floorplanning, Placement and Routing Factorial Program Random Program Verilog Final Simulations and Cadence DRC Assembler (if done) Programmer's Guide Documentation Final Report OVERALL EFFORT Signature: 20 20 20 20 100 20 20 20 20 20 20 10 10 50 50 20 25 50 50 20 25 100 100 40 40 30 25 30 25 20 20 50 Percentage Effort on each Task ECSID: hcg206 Sq106 mtmmy106 fad106 hcl106 20 20 50 100 50 50 20 20 20 20 20 20
43
Appendix B Stimulus File for ALU simulation
=======Stimulus file for ALU ======== `timescale 100ps / 10ps // stimulus file ALU_sim_stim.v for ALU_sim // created by ext2vmod 2.5 module ALU_sim_stim; reg A ; reg B ; reg Cin ; reg I1 ; reg S0 ; reg S1 ; reg S2 ; reg Zin ; reg control ; wire Cout ; wire Zout ; ALU_sim instance( .Cout ( Cout ), .Zout ( Zout ), .A ( A ), .B ( B ), .Cin ( Cin ), .I1 ( I1 ), .S0 ( S0 ), .S1 ( S1 ), .S2 ( S2 ), .Zin ( Zin ), .control ( control ) ); // stimulus information follows initial begin A = 0; //ADDER I1 = 0; B = 0; Cin = 0; S0 = 0; S1 = 0; S2 = 0; Zin = 0; control = 0; #1000 A = 0; I1 = 0; B = 1; Cin = 0; S0 = 0; S1 = 0; S2 = 1; Zin = 0; control = 0; #1000 A = 1; I1 = 0; B = 0; Cin = 0; S0 = 0; S1 = 1; S2 = 0; Zin = 0; control = 0; #1000 A = 1; I1 = 0; B = 1; Cin = 0;
S0 = 0; S1 = 1; S2 = 1; Zin = 0; control = 0;
#1000 A = 0; I1 = 0; B = 0; Cin = 0; S0 = 1; S1 = 0; S2 = 0; Zin = 0; control = 0; #1000 A = 0; I1 = 0; B = 1; Cin = 0; S0 = 1; S1 = 0; S2 = 1; Zin = 0; control = 0; #1000 A = 1; I1 = 0; B = 0; Cin = 0; S0 = 1; S1 = 1; S2 = 0; Zin = 0; control = 0; #1000 A = 1; I1 = 0; B = 1; Cin = 0; S0 = 1; S1 = 1; S2 = 1; Zin = 0; control = 0; #1000 A = 0; I1 = 0; B = 0; Cin = 1; S0 = 0; S1 = 0; S2 = 0; Zin = 0; control = 1; #1000 A = 0; I1 = 0; B = 1; Cin = 1; S0 = 0;
//SUBTRACTOR
44
S1 = 0; S2 = 1; Zin = 0; control = 1; I1 = 0; #1000 A = 1; I1 = 0; B = 0; Cin = 1; S0 = 0; S1 = 1; S2 = 0; Zin = 0; control = 1; #1000 A = 1; I1 = 0; B = 1; Cin = 1; S0 = 0; S1 = 1; S2 = 1; Zin = 0; control = 1; #1000 A = 0; I1 = 0; B = 0; Cin = 1; S0 = 1; S1 = 0; S2 = 0; Zin = 0; control = 1; #1000 A = 0; I1 = 0; B = 1; Cin = 1; S0 = 1; S1 = 0; S2 = 1; Zin = 0; control = 1; #1000 A = 1; I1 = 0; B = 0; Cin = 1; S0 = 1; S1 = 1; S2 = 0; Zin = 0; control = 1; #1000 A = 1; I1 = 0; B = 1; Cin = 1; S0 = 1; S1 = 1; S2 = 1; Zin = 0; control = 1; #1000 $stop; $finish; end // probe information follows initial $monitor($time, ,"%b", A , ,"%b", B , ,"%b", Cin ,
,"%b", I1 , ,"%b", S0 , ,"%b", S1 , ,"%b", S2 , ,"%b", Zin , ,"%b", control , ,"%b", Cout , ,"%b", Zout , ,"%b", instance.ADDSUB , ,"%b", instance.AND , ,"%b", instance.OR , ,"%b", instance.XNOR , ,"%b", instance.XOR , ); //SIMVISION COMMAND: source ALU_sim.sv endmodule
45
Appendix C Stimulus File for bitslice reg simulation

========Stimulus file for reg ======== `timescale 100ps / 10ps // stimulus file reg1_stim.v for reg1 // created by ext2vmod 2.5 module reg1_stim; reg Clock ; reg EN_Addr ; reg EN_Data ; reg LoadPC ; reg LoadR1 ; reg LoadR2 ; reg LoadR3 ; reg LoadR4 ; reg LoadR5 ; reg LoadR6 ; reg SDI ; reg SEL_ALU ; reg SEL_Add ; reg SEL_IMN ; reg [7:0] SEL_S1 ; reg [7:0] SEL_S2 ; reg Test ; reg Unsign ; reg mux ; reg nReset ; reg outalu ; wire B ; wire SDO ; reg1 instance( .B ( B ), .SDO ( SDO ), .Clock ( Clock ), .EN_Addr ( EN_Addr ), .EN_Data ( EN_Data ), .LoadPC ( LoadPC ), .LoadR1 ( LoadR1 ), .LoadR2 ( LoadR2 ), .LoadR3 ( LoadR3 ), .LoadR4 ( LoadR4 ), .LoadR5 ( LoadR5 ), .LoadR6 ( LoadR6 ), .SDI ( SDI ), .SEL_ALU ( SEL_ALU ), .SEL_Add ( SEL_Add ), .SEL_IMN ( SEL_IMN ), .SEL_S1 ( SEL_S1 ), .SEL_S2 ( SEL_S2 ), .Test ( Test ), .Unsign ( Unsign ), .mux ( mux ), .nReset ( nReset ), .outalu ( outalu ) ); // stimulus information follows always begin Clock = 0; #250 Clock = 1; #500 Clock = 0; #250 Clock = 0; // Clock cycle=1000ns end initial begin nReset = 0; SDI = 0; Test = 0; EN_Addr = 1; EN_Data = 0; LoadPC = 0; LoadR1 = 0; LoadR2 = 0; LoadR3 = 0; LoadR4 = 0; LoadR5 = 0; LoadR6 = 0; SEL_ALU = 0; SEL_Add = 1; SEL_IMN = 0; SEL_S1 = 1; SEL_S2 = 1; Unsign = 1; mux = 1; outalu = 1; #500 nReset = 1; #1000 LoadPC = 0; LoadR1 = 0; LoadR2 = 0; LoadR3 = 0; LoadR4 = 0; LoadR5 = 0; LoadR6 = 1; SEL_S1 = 64; SEL_S2 = 1; #1000 LoadPC = 0; LoadR1 = 0; LoadR2 = 0; LoadR3 = 0; LoadR4 = 0; LoadR5 = 0; LoadR6 = 1; SEL_S1 = 1; SEL_S2 = 64; #1000 LoadPC = 0; LoadR1 = 0; LoadR2 = 0; LoadR3 = 0; LoadR4 = 0; LoadR5 = 1; LoadR6 = 0; SEL_S1 = 32; SEL_S2 = 1; #1000 LoadPC = 0; LoadR1 = 0; LoadR2 = 0; LoadR3 = 0; LoadR4 = 0; LoadR5 = 1; LoadR6 = 0; SEL_S1 = 1; SEL_S2 = 32; #1000 LoadPC = 0; LoadR1 = 0; LoadR2 = 0; LoadR3 = 0; LoadR4 = 1; LoadR5 = 0; LoadR6 = 0; SEL_S1 = 16; SEL_S2 = 1; #1000 LoadPC = 0; LoadR1 = 0; LoadR2 = 0;
46
LoadR3 = 0; LoadR4 = 1; LoadR5 = 0; LoadR6 = 0; SEL_S1 = 1; SEL_S2 = 16; #1000 LoadPC = 0; LoadR1 = 0; LoadR2 = 0; LoadR3 = 1; LoadR4 = 0; LoadR5 = 0; LoadR6 = 0; SEL_S1 = 8; SEL_S2 = 1; #1000 LoadPC = 0; LoadR1 = 0; LoadR2 = 0; LoadR3 = 1; LoadR4 = 0; LoadR5 = 0; LoadR6 = 0; SEL_S1 = 1; SEL_S2 = 8; #1000 LoadPC = 0; LoadR1 = 0; LoadR2 = 1; LoadR3 = 0; LoadR4 = 0; LoadR5 = 0; LoadR6 = 0; SEL_S1 = 4; SEL_S2 = 1; #1000 LoadPC = 0; LoadR1 = 0; LoadR2 = 1; LoadR3 = 0; LoadR4 = 0; LoadR5 = 0; LoadR6 = 0; SEL_S1 = 1; SEL_S2 = 4; #1000 LoadPC = 0; LoadR1 = 1; LoadR2 = 0; LoadR3 = 0; LoadR4 = 0; LoadR5 = 0; LoadR6 = 0; SEL_S1 = 2; SEL_S2 = 1; #1000 LoadPC = 0; LoadR1 = 1; LoadR2 = 0; LoadR3 = 0; LoadR4 = 0; LoadR5 = 0; LoadR6 = 0; SEL_S1 = 1; SEL_S2 = 2; #1000 LoadPC = 0; LoadR1 = 0; LoadR2 = 0; LoadR3 = 0; LoadR4 = 0; LoadR5 = 0; LoadR6 = 0; SEL_S1 = 1;
SEL_S2 = 1; #1000 LoadPC = 0; LoadR1 = 0; LoadR2 = 0; LoadR3 = 0; LoadR4 = 0; LoadR5 = 0; LoadR6 = 0; SEL_S1 = 1; SEL_S2 = 1; #1000 LoadPC = 1; LoadR1 = 0; LoadR2 = 0; LoadR3 = 0; LoadR4 = 0; LoadR5 = 0; LoadR6 = 0; SEL_S1 = 128; SEL_S2 = 1; #1000 LoadPC = 1; LoadR1 = 0; LoadR2 = 0; LoadR3 = 0; LoadR4 = 0; LoadR5 = 0; LoadR6 = 0; SEL_S1 = 1; SEL_S2 = 128; #1000 LoadPC = 0; LoadR1 = 1; LoadR2 = 1; LoadR3 = 0; LoadR4 = 0; LoadR5 = 0; LoadR6 = 0; SEL_S1 = 2; SEL_S2 = 4; #1000 $stop; $finish; end // probe information follows initial $monitor($time, ,"%b", Clock , ,"%b", EN_Addr , ,"%b", EN_Data , ,"%b", LoadPC , ,"%b", LoadR1 , ,"%b", LoadR2 , ,"%b", LoadR3 , ,"%b", LoadR4 , ,"%b", LoadR5 , ,"%b", LoadR6 , ,"%b", SDI , ,"%b", SEL_ALU , ,"%b", SEL_Add , ,"%b", SEL_IMN , ,"%b", SEL_S1 , ,"%b", SEL_S2 , ,"%b", Test , ,"%b", Unsign , ,"%b", mux , ,"%b", nReset , ,"%b", outalu ,"%b", B ,"%b", SDO ,"%b", instance.ADDRESSBUS ,"%b", instance.S1 ,"%b", instance.S2 , "%b", instance.SYSTEMBUS ,); //SIMVISION COMMAND: source reg1.sv Endmodule
47
Appendix D Stimulus File for bitslice IR simulation

========Stimulus file for IR ======== `timescale 100ps / 10ps // stimulus file IR_stim.v for IR // created by ext2vmod 2.5 module IR_stim; reg ADDRESSBUS ; reg Clock ; reg Data_in ; reg LoadIR ; reg SDI ; reg SEL_PC ; reg SEL_offset ; reg Test ; reg TrisMem ; reg nReset ; reg outalu ; wire Unsign ; wire mux ; IR instance( .Unsign ( Unsign ), .mux ( mux ), .ADDRESSBUS ( ADDRESSBUS ), .Clock ( Clock ), .Data_in ( Data_in ), .LoadIR ( LoadIR ), .SDI ( SDI ), .SEL_PC ( SEL_PC ), .SEL_offset ( SEL_offset ), .Test ( Test ), .TrisMem ( TrisMem ), .nReset ( nReset ), .outalu ( outalu ) ); // stimulus information follows always begin Clock = 0; #250 Clock = 1; #500 Clock = 0; #250 Clock = 0; // Clock cycle=1000ns end initial begin nReset = 0; SDI = 0; Test = 0; ADDRESSBUS = 0; Data_in = 1; LoadIR = 0; SEL_PC = 0; SEL_offset = 0; TrisMem = 1; outalu = 0; #500 nReset = 1; #1000 ADDRESSBUS = 0; LoadIR = 0; SEL_PC = 0; SEL_offset = 0; outalu = 1; #1000 ADDRESSBUS = 0; LoadIR = 0; SEL_PC = 0; SEL_offset = 1; outalu = 0; #1000 ADDRESSBUS = 0; LoadIR = 0; SEL_PC = 0; SEL_offset = 1; outalu = 1; #1000 ADDRESSBUS = 0; LoadIR = 0; SEL_PC = 1; SEL_offset = 0; outalu = 0; #1000 ADDRESSBUS = 0; LoadIR = 0; SEL_PC = 1; SEL_offset = 0; outalu = 1; #1000 ADDRESSBUS = 0; LoadIR = 0; SEL_PC = 1; SEL_offset = 1; outalu = 0; #1000 ADDRESSBUS = 0; LoadIR = 0; SEL_PC = 1; SEL_offset = 1; outalu = 1; #1000 ADDRESSBUS = 0; LoadIR = 1; SEL_PC = 0; SEL_offset = 0; outalu = 0; #1000 ADDRESSBUS = 0; LoadIR = 1; SEL_PC = 0; SEL_offset = 0; outalu = 1; #1000 ADDRESSBUS = 0; LoadIR = 1; SEL_PC = 0; SEL_offset = 1; outalu = 0; #1000 ADDRESSBUS = 0; LoadIR = 1; SEL_PC = 0; SEL_offset = 1; outalu = 1; #1000 ADDRESSBUS = 0; LoadIR = 1; SEL_PC = 1; SEL_offset = 0; outalu = 0; #1000 ADDRESSBUS = 0; LoadIR = 1; SEL_PC = 1; SEL_offset = 0; outalu = 1; #1000 ADDRESSBUS = 0; LoadIR = 1; SEL_PC = 1; SEL_offset = 1; outalu = 0; #1000
48
ADDRESSBUS = 0; LoadIR = 1; SEL_PC = 1; SEL_offset = 1; outalu = 1; #1000 ADDRESSBUS = 1; Data_in = 1; LoadIR = 0; SEL_PC = 0; SEL_offset = 0; TrisMem = 1; outalu = 0; #1000 ADDRESSBUS = 1; LoadIR = 0; SEL_PC = 0; SEL_offset = 0; outalu = 1; #1000 ADDRESSBUS = 1; LoadIR = 0; SEL_PC = 0; SEL_offset = 1; outalu = 0; #1000 ADDRESSBUS = 1; LoadIR = 0; SEL_PC = 0; SEL_offset = 1; outalu = 1; #1000 ADDRESSBUS = 1; LoadIR = 0; SEL_PC = 1; SEL_offset = 0; outalu = 0; #1000 ADDRESSBUS = 1; LoadIR = 0; SEL_PC = 1; SEL_offset = 0; outalu = 1; #1000 ADDRESSBUS = 1; LoadIR = 0; SEL_PC = 1; SEL_offset = 1; outalu = 0; #1000 ADDRESSBUS = 1; LoadIR = 0; SEL_PC = 1; SEL_offset = 1; outalu = 1; #1000 ADDRESSBUS = 1; LoadIR = 1; SEL_PC = 0; SEL_offset = 0; outalu = 0; #1000 ADDRESSBUS = 1; LoadIR = 1; SEL_PC = 0; SEL_offset = 0; outalu = 1; #1000 ADDRESSBUS = 1; LoadIR = 1; SEL_PC = 0; SEL_offset = 1; outalu = 0; #1000 ADDRESSBUS = 1;
LoadIR = 1; SEL_PC = 0; SEL_offset = 1; outalu = 1; #1000 ADDRESSBUS = 1; LoadIR = 1; SEL_PC = 1; SEL_offset = 0; outalu = 0; #1000 ADDRESSBUS = 1; LoadIR = 1; SEL_PC = 1; SEL_offset = 0; outalu = 1; #1000 ADDRESSBUS = 1; LoadIR = 1; SEL_PC = 1; SEL_offset = 1; outalu = 0; #1000 ADDRESSBUS = 1; LoadIR = 1; SEL_PC = 1; SEL_offset = 1; outalu = 1; #1000 $stop; $finish; end // probe information follows initial $monitor($time, ,"%b", ADDRESSBUS , ,"%b", Clock , ,"%b", Data_in , ,"%b", LoadIR , ,"%b", SDI , ,"%b", SEL_PC , ,"%b", SEL_offset , ,"%b", Test , ,"%b", TrisMem , ,"%b", nReset , ,"%b", outalu , ,"%b", Unsign , ,"%b", mux , ); //SIMVISION COMMAND: source IR.sv endmodule
49
Appendix E Stimulus File for topcell circuit

Function = 1; A = 1; C = 1; #1000 Function = 1; A = 0; C = 1; #1000 Function = 1; A = 1; C = 0; #1000 Function = 1; A = 0; C = 0; #1000 Function = 2; A = 1; C = 1; #1000 Function = 2; A = 0; C = 1; #1000 Function = 2; A = 1; C = 0; #1000 Function = 2; A = 0; C = 0; #1000 Function = 3; A = 1; C = 1; #1000 Function = 3; A = 0; C = 1; #1000 Function = 3; A = 1; C = 0; #1000 Function = 3; A = 0; C = 0; Z = 1; #1000 Function = 4; A = 1; C = 1; #1000 Function = 4; A = 0; C = 1; #1000 Function = 4; A = 1; C = 0; #1000 Function = 4; A = 0; C = 0; #1000 Function = 5; A = 1; C = 1; #1000 Function = 5;
========Stimulus file for topcell ======== `timescale 100ps / 10ps // stimulus file topcell_stim.v for topcell // created by ext2vmod 2.5 module topcell_stim; reg [0:0] A ; reg Clock ; reg C ; reg [2:0] Function ; reg LoadF ; reg SDI ; reg Test ; reg Z ; reg nReset ; wire [1:0] Flag ; wire S0 ; wire S1 ; wire S2 ; wire SDO ; topcell instance( .Flag ( Flag ), .S0 ( S0 ), .S1 ( S1 ), .S2 ( S2 ), .SDO ( SDO ), .A ( A ), .Clock ( Clock ), .C ( C ), .Function ( Function ), .LoadF ( LoadF ), .SDI ( SDI ), .Test ( Test ), .Z ( Z ), .nReset ( nReset ) ); // stimulus information follows always begin Clock = 0; #250 Clock = 1; #500 Clock = 0; #250 Clock = 0; // Clock cycle=1000ns end initial begin nReset = 0; SDI = 0; Test = 0; A = 0; C = 0; Function = 0; LoadF = 1; Z = 0; #500 nReset = 1; #1000 A = 1; C = 1; #1000 A = 0; C = 1; #1000 A = 1; C = 0; #1000 A = 0; C = 0; #1000
50
A = 0; C = 1; #1000 Function = 5; A = 1; C = 0; #1000 Function = 5; A = 0; C = 0; #1000 Function = 6; A = 1; C = 1; #1000 Function = 6; A = 0; C = 1; #1000 Function = 6; A = 1; C = 0; #1000 Function = 6; A = 0; C = 0; #1000 Function = 7; A = 1; C = 1; #1000 Function = 7; A = 0; C = 1; #1000 Function = 7; A = 1; C = 0; #1000 Function = 7; A = 0; C = 0; #1000 $stop; $finish; end // probe information follows initial $monitor($time, ,"%b", A , ,"%b", Clock , ,"%b", C , ,"%b", Function , ,"%b", LoadF , ,"%b", SDI , ,"%b", Test , ,"%b", Z , ,"%b", nReset , ,"%b", Flag , ,"%b", S0 , ,"%b", S1 , ,"%b", S2 , ,"%b", SDO , ); //SIMVISION COMMAND: source topcell.sv endmodul
51
Appendix F Test Program verilog file

///////////////////////////////////////////////////////////////////// // // // program module // This module is used to perform a test on opcodes // // ///////////////////////////////////////////////////////////////////// // ìnclude "opcodes.v" ìfdef prog_file // already defined - do nothing èlse `define prog_file "program.hex" èndif `timescale 1ns / 100ps `define FIRST a = 0; `define NEXT a = a+1; module program(); reg [15:0] Data_stored [ 0 : 255 ]; parameter R0 = 3'd0, R1 = 3'd1, R2 = 3'd2, R3 = 3'd3, R4 = 3'd4, R5 = 3'd5, R6 = 3'd6, PC = 3'd7, SP = 3'd6; parameter nmode = 1'b0, mode = 1'b1; integer a; initial begin //multiply program // DATA - i/o in i/o module `define SWITCHES 12'd2048 `define STACK `define CONST_255 `define LEDS `define `define `define `define `define `define `define `define 16'd2047 16'd255 12'd2560
LOC_ONE 12'd5 LOC_TWO 12'hff9 LOC_THREE 12'd5 LAST 12'd7 START1 START2 START3 START4 12'hfe6 12'hfe2 12'hfde 12'hfd9
52
`FIRST //00 START Data_stored[a] = {ÀDD,mode,R3,R0,5'b11111}; Data_stored[a] = {`SWITCHES}; Data_stored[a] = {`LD,mode,R1,R3,5'd0}; Data_stored[a] = {ÀDDSI,nmode,R2,R1,5'd21}; Data_stored[a] = {`SUBSI,nmode,R1,R1,5'd31}; //05 Data_stored[a] = {ÀND,nmode,R4,R1,R2,2'b11}; Data_stored[a] Data_stored[a] Data_stored[a] Data_stored[a] //10 Data_stored[a] Data_stored[a] Data_stored[a] Data_stored[a] Data_stored[a] //15 Data_stored[a] Data_stored[a] Data_stored[a] Data_stored[a] Data_stored[a] //20 Data_stored[a] Data_stored[a] Data_stored[a] Data_stored[a] Data_stored[a] //25 Data_stored[a] Data_stored[a] Data_stored[a] Data_stored[a] //LOC_TWO Data_stored[a] //30 Data_stored[a] Data_stored[a] Data_stored[a] //LOC_ONE Data_stored[a] Data_stored[a] //35 Data_stored[a] Data_stored[a] //LOC_THREE Data_stored[a] Data_stored[a] Data_stored[a] //40 Data_stored[a] = = = = {16'd34082}; {`JE,`START1}; {ÀND,nmode,R2,R4,R0,2'b11}; {`JE,`LOC_ONE}; `NEXT `NEXT `NEXT `NEXT `NEXT `NEXT `NEXT `NEXT `NEXT `NEXT `NEXT `NEXT `NEXT `NEXT `NEXT `NEXT = = = = = {16'd43690}; {ÒR,mode,R4,R5,5'b11111}; {16'd21845}; {`SRR,nmode,R1,R2,5'b11111}; {ÒR,mode,R1,R3,5'b11111}; `NEXT `NEXT `NEXT `NEXT `NEXT = = = = = {`XOR,mode,R2,R6,5'b11111}; {16'd61680}; {`XNOR,mode,R5,R1,5'b11111}; {16'd3855}; {ÀND,mode,R6,R3,5'b11111}; `NEXT `NEXT `NEXT `NEXT `NEXT = = = = = {ÀDD,nmode,R4,R5,R3,2'b11}; {ÀDD,mode,R1,R4,5'b11111}; {16'd15018}; {`SUB,mode,R3,R2,5'b11111}; {16'd34062}; `NEXT `NEXT `NEXT `NEXT `NEXT = = = = {`SUB,nmode,R5,R2,R4,2'b11}; {`XOR,nmode,R6,R4,R5,2'b11}; {ÒR,nmode,R1,R5,R6,2'b11}; {`XNOR,nmode,R2,R6,R1,2'b11}; `NEXT `NEXT `NEXT `NEXT `NEXT `NEXT `NEXT `NEXT `NEXT `NEXT
= {ÀDD,nmode,R3,R5,R3,2'b11}; = {`JC,`START2}; = {ÀDD,nmode,R4,R6,R1,2'b11}; = {`JC,`LOC_THREE}; = {`SUB,nmode,R5,R1,R4,2'b11}; = {`JNC,`START3}; = {`SUBSI,nmode,R6,R2,5'd15}; = {`JNC,`LOC_TWO}; = {`XOR,mode,R0,R3,5'b11111}; = {16'd33677}; = {`JNE,`START4}; = {`XNOR,nmode,R1,R4,R5,2'b11};
53
Data_stored[a] Data_stored[a] Data_stored[a] Data_stored[a] //45
= = = =
{`JNE,`LAST}; {ÀDD,nmode,R1,R0,R0,2'b11}; {ÀDD,nmode,R2,R0,R0,2'b11}; {ÀDD,nmode,R3,R0,R0,2'b11};
`NEXT `NEXT `NEXT `NEXT `NEXT `NEXT `NEXT `NEXT `NEXT `NEXT `NEXT `NEXT `NEXT `NEXT `NEXT `NEXT `NEXT `NEXT `NEXT `NEXT `NEXT `NEXT `NEXT `NEXT `NEXT `NEXT `NEXT `NEXT `NEXT `NEXT `NEXT `NEXT `NEXT `NEXT `NEXT `NEXT
Data_stored[a] = {ÀDD,nmode,R4,R0,R0,2'b11}; Data_stored[a] = {ÀDD,nmode,R5,R0,R0,2'b11}; Data_stored[a] = {ÀDD,nmode,R6,R0,R0,2'b11}; //LAST Data_stored[a] = {ÀDD,mode,R1,R0,5'b11111}; Data_stored[a] = {16'd300}; //50 Data_stored[a] Data_stored[a] Data_stored[a] Data_stored[a] Data_stored[a] //55 Data_stored[a] Data_stored[a] Data_stored[a] Data_stored[a] Data_stored[a] //60 Data_stored[a] Data_stored[a] Data_stored[a] Data_stored[a] Data_stored[a] //65 Data_stored[a] Data_stored[a] Data_stored[a] Data_stored[a] Data_stored[a] //70 Data_stored[a] Data_stored[a] Data_stored[a] Data_stored[a] Data_stored[a] //75 Data_stored[a] = {`ST,mode,R1,R3,5'd0}; Data_stored[a] = {`JMP,12'd0}; $writememh( `prog_file, Data_stored ); end endmodule = = = = = {`LD,mode,R2,R2,5'd24}; {`LD,mode,R1,R1,5'd31}; {ÀDD,nmode,R1,R3,R5,2'b11}; {ÀDD,mode,R3,R0,5'b11111}; {`LEDS}; = = = = = {`ST,mode,R6,R6,5'd1}; {`LD,mode,R6,R1,5'd31}; {`LD,mode,R5,R2,5'd24}; {`LD,mode,R4,R3,5'd18}; {`LD,mode,R3,R3,5'd18}; = = = = = {`ST,mode,R1,R1,5'd31}; {`ST,mode,R2,R2,5'd24}; {`ST,mode,R3,R3,5'd18}; {`ST,mode,R4,R4,5'd12}; {`ST,mode,R5,R5,5'd6}; = = = = = {16'd303}; {ÀDD,mode,R5,R0,5'b11111}; {16'd304}; {ÀDD,mode,R6,R0,5'b11111}; {16'd305}; = = = = = {ÀDD,mode,R2,R0,5'b11111}; {16'd301}; {ÀDD,mode,R3,R0,5'b11111}; {16'd302}; {ÀDD,mode,R4,R0,5'b11111};
54

VLSI Design: A Basic RISC Processor

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

VLSI Design: A Basic RISC Processor

Enviado por

Direitos autorais:

Formatos disponíveis

VLSI Design: A Basic RISC Processor

1.3.1. 1.3.2. 1.3.3. 1.3.4. 1.3.5. 1.3.6. 1.4. 2.

Design Optimization .................................................................................... 11

1. Left buffer design

Figure 1 Circuit configuration for testing clock skew

Figure 3 Simulation of final inverter design

2000m 76.628 26.1m

Since nReset is attached to three input inside D-Type cell, then

The scale factor for each stage by considering 4 and 6 stages is

229.885 3.894 229.885 2.475

2000m 76.628 26.1m

Since Clock is attached to two inputs inside D-Type cell, so

TotalLoad 76.628 2 153.256

The scale factor for each stage by considering 4 and 6 stages is

153.256 3.518 153.256 2.313

2000m 106.383 18.8m

Since Test is attached to two inputs inside D-Type cell, so

The scale factor for each stage by considering 4 and 6 stages is

212.766 3.819 212.766 2.443

Figure 4 The dimension of NMOS (left) and PMOS (right) in micrometers

TotalWidth NMOS 2 19 4 27 5 TotalWidth PMOS 2 19 9 27 Yp 10 Yn

Figure 5 Leftbuf cell

Figure 7 Rise and fall time simulations

Figure 8 Rise time (0-10 nanoseconds)

Figure 9 Fall time (10-20 nanoseconds)

Figure 11 Complete Datapath Structure

ALU (Arithmetic-Logic Unit) design

Figure 12. Arithmetic Logic Unit (ALU).

Table 3. ALU Operation selection

S2 S1 S0 00X 010 010 011 100 101 110 111

Operation selected LD ADD SUB SRR XNOR XOR OR AND

Figure 13. ALU simulation results

Figure 14. Final implementation of the ALU

Bitslice Reg design

Figure 15. Bitslice reg architecture

SEL_S1 /S2[7:0] 128 64 32 16 8 4 2 1

Figure 18. Final implementation of the bitslice reg0

Figure 19. Final implementation of the bitslice reg

Figure 20 Bitslice IR architecture

Figure 21 (a) Sign extend connection (b) Unsign extend connection

Figure 22 Bitslice IR simulation results.

Figure 23 Final implementation of the bitslice IR0

Figure 24 Final implementation of the bitslice IR1

Figure 25 Final implementation of the bitslice IR2

Figure 26 Final implementation of the bitslice IR3

Function LD ADD SUB AND OR XOR XNOR SRR

Figure 27 Function conversion circuit.

Figure 28 Flags management circuit.

Figure 29 Topcell simulation results.

Figure 30 Final implementation of the topcell.

Figure 31 Zoom in of the topcell magic cells.

2nd memory access? No

EX_Data_Hold f EN, WR, EN_Data, Sel_S2 (RS)

EX_Data_Hold f OE, LoadR

OE, LoadPC, LoadR

Figure 32 ASM flow of Instructions

Data_in FROM MEMORY

Figure 33 Store instruction mapping

Data_in FROM MEMORY

Figure 34 Load instruction mapping

Data_in FROM MEMORY

Figure 35 Register to register mapping

Data_in FROM MEMORY