Você está na página 1de 2

Latch-based FPGA emulation method the other with a flip-flop-based register file (Pff ).

Note that Pff requires


for design verification: case study with special forwarding paths to overcome the RAW hazard explained
earlier. The processor is based on ARM9, which has five pipeline
microprocessor stages: Instruction Fetch (IF), Instruction Decode (ID), Execution
(EX), Memory Access (MEM), and Write-Back (WB). It is based on
M. Kim, J. Kong, T. Suh and S.W. Chung
ARMv5 instructions except supplementary instructions such as copro-
cessor, thumb, and load/store multiple instructions.
Using latches in a digital design is considered wrong owing to the timing
issue. Field-programmable gate array (FPGA) vendors also recommend
The register file in Pl consists of 15 latch-based registers and one flip-
flip-flops instead of latches in emulation. In this reported work, however, flop-based register; the 15 registers are general purpose registers and the
the usefulness and benefit of utilising latches in FPGA emulation for only register with flip-flops is the program counter (PC). Since latches
processor design verification is demonstrated. The study shows that a are level-triggered, the data written in the first half of the clock can be
latch-based register file provides the seamless capability of read in the second half of the clock. Thus, the RAW dependency is
functionality validation, whereas the flip-flop based one requires modi- naturally resolved without any additional forwarding path. Fig. 1
fication to the original design, potentially harming the completeness shows an example of the dependency. In the case of the latch-based
of functional verification. Experiment results with Xilinx and Altera register file, the result of the first instruction (mov r0, #1) is written
devices show marginal differences in terms of emulation performance
back in the register r0 in the first half of clock cycle 4. In the second
and area requirement in both approaches. This study reveals that
replacing SRAM with latches rather than flip-flops is appealing and
half of the same clock cycle, the register r0 is read by the fourth instruc-
preferable in emulation with FPGAs. tion (add r4, r0, r5). Therefore, the register file in Pl does not need a
forwarding path from the WB stage to in front of the ID/EX pipeline
register (dotted arrows in Fig. 1). Note that Pff requires this forwarding
Introduction: In digital design, one of the most time-consuming path to resolve the hazard.
processes is verification. Software-based hardware description language
(HDL) simulation is beneficial in a sense that internal signals of interest clock cycle

can be observed. However, it is impractical to validate logic with high 0 1 2 3 4 5 6 7

complexity using HDL simulation because of intolerable simulation 0


IF ID EX MEM WB
time. To remedy this shortcoming, field-programmable gate array mov r0, #1

(FPGA) based emulation has been most widely used. It provides the
1
IF ID EX MEM WB
capability of validating the design more than 1000 times faster than the mov r1, #1

R0
traditional software-based simulation [1]. However, the FPGA-based
2
emulation often requires modification to the original design owing to subs r3, r2, #1 IF ID EX MEM WB

the restricted internal structures and limited resources in FPGAs. For


3
example, large caches in modern microprocessors do not typically fit add r4, r0, r5 IF ID EX MEM WB

into a single FPGA, and they should be split into several FPGAs. Instruction No.
A microprocessor is one of the most complex digital designs including Instruction

various logics and memories. Its validation requires the exhaustive Fig. 1 Example of forwarding from WB to in front of ID/EX pipeline register
coverage of different combinations of all the instructions, interrupts and
exceptions. Therefore, FPGA-based emulation is typically an inevitable During the actual implementation of Pl , however, the register file
step in the design process. However, some of the logics are not seamlessly suffered from timing errors caused by glitch. To remove glitch, we
translated to the FPGA fabric. One of such logics is the register file since it utilised an AND gate. Inputs to the AND gate are a phase-shifted
is often custom-designed with SRAM [2] and the required number of ports clock signal (908 in our study) and the original write enable. Then,
varies depending on the instruction set architecture (ISA). A simple dual- the output of the AND gate is connected to the write-enable for each
issue microprocessor usually requires two write ports and four read ports register. As a result, the write enable signal is kept low for one fourth
in the register file [2]. In FPGAs, the memory elements (see Note) support of a clock cycle, ignoring wrong data generated by timing errors, as
a limited number of ports. For example, the Altera Cyclone II [3] provides shown in Fig. 2. Note that the AND gate is located inside the register
only two read ports and one write port in the memory element. Thus, the file and does not affect the original processor design outside the register
register file should be converted by using the logic elements and there are file. The 908 phase-shifted clock is not specially contrived for the latch-
two options for implementation: latches or flip-flops. FPGA vendors based register file. It was constructed to maintain the same memory
recommend flip-flops rather than latches, insisting that using latches (or cache) access latency of one cycle as the original design in the
incurs complicated timing problems [4]. MEM pipeline stage. The read latency of the memory elements in
The operational difference between latches and flip-flops has a direct FPGAs is more than one cycle because of its input register (flip-flops).
effect on the digital design. A flip-flop is an edge-triggered device
enabling a write operation at a rising (or falling) edge of a clock,
whereas a latch is a level-triggered one at the high (or low) level of a original clock
clock. Therefore, the operation of a latch-based register file is similar to (0o phase shifted)

that of the original SRAM-based design. The adoption of the flip-flop-


based register file in emulation requires the modification of the original phase shifted clock
(90o phase shifted)
design, potentially affecting the validation correctness. Specifically, it
causes the Read-After-Write (RAW) hazard, which does not exist in
write enable of register
the original SRAM-based register file. The hazard occurs when the (before conjugation)
destination register of a write operation is the same as the source register
write enable of register
of a subsequent read operation. Owing to the edge-triggered nature of a (after conjugation)
flip-flop-based register file, the data to be read is not available in the
current clock cycle because the write operation occurs at the end of the data
clock period. Therefore, the hazard should be resolved by adding
additional forwarding paths or by stalling the microprocessor. This
wrong data
design change impedes the main purpose of emulation and could harm
the completeness of functional verification. Fig. 2 Resolving glitch by utilising phase shifted clock
In this Letter, we implement a microprocessor with the latch-based
register file for validation using FPGA emulation and compare it with The register file in Pff purely consists of flip-flops and enables a write
the flip-flop-based one in terms of performance and area. Throughout operation only at the rising (or falling) edge of a clock cycle. As a result,
the Letter, we show the usefulness and benefit of using latches in the read and write operation to a register cannot take place in the same
validation with FPGAs. clock cycle, resulting in the RAW hazard. There are two options to
resolve the RAW hazard: forwarding from the WB stage to in front of
Implemented microprocessors: We compare two versions of a micro- the ID/EX pipeline or stall to prevent the execution of the fourth instruc-
processor in emulation: one with a latch-based register file (Pl ) and tion with wrong data. Stalling the processor for one cycle leads to a

ELECTRONICS LETTERS 28th April 2011 Vol. 47 No. 9


different execution time of a program compared to the original design latch-based register file provides the seamless capability of functional
with the SRAM-based register file. Furthermore, the stall logic should validation, whereas the flip-flop-based one requires extra logic in a
be added as well. The forwarding option resolves the RAW hazard processor which potentially harms the functional verification. Both
without affecting the execution cycle time. Nevertheless, the forwarding approaches do not show the notable differences in terms of emulation
path is located outside the register file and may cause unexpected side- speed and area requirement. Our study shows that the latch based approach
effects such as functional errors hidden in the extra forwarding path. for the register file is appealing and preferable in functional validation
Thus, Pff requires extra verification process after replacing the register with emulation using FPGAs.
file with the original SRAM-based one and removing forwarding paths.
Note: An FPGA usually includes two kinds of elements: memory
Analysis and discussion: In this Section, we present experiment results element and logic element. Memory element can only be configured
with FPGAs (Altera Cyclone II and Xilinx XC3S500E FPGAs): as memory whereas logic element is able to be configured into many
maximum frequency and area for Pl and Pff. The maximum frequency different kinds of combinational or sequential logics.
is obtained by analysing the critical path of Pl and Pff from the synthesis
report of the design tools for each FPGA (Altera Quartus II 9.1 Web Acknowledgments: This work was supported in part by the Ministry of
Edition and Xilinx ISE 12.2). The area is also obtained from the same Knowledge Economy, Korea, under the Information Technology
report. Research Centre support programme supervised by the National IT
The maximum clock rates of Pl and Pff are similar on both FPGAs, as Industry Promotion Agency (NIPA-2011-C1090-1121-0010).
shown in Table 1. Cyclone II reports a 5MHz lower frequency for Pl than
that of Pff. XC3S500E reports exactly the same frequency for Pl and Pff. # The Institution of Engineering and Technology 2011
The difference in clock rates is caused by the characteristic of the storage 19 February 2011
elements (flip-flops or latches) in each FPGA. Cyclone II has configur- doi: 10.1049/el.2011.0462
able storage elements called dedicated logic registers, which are located
inside each logic element. However, the dedicated logic registers can M. Kim, J. Kong and S.W. Chung (Division of Computer and
only be used as flip-flops. In other words, the latches are implemented Communication Engineering, Korea University, Seoul 136-713,
by configuring and routing logic elements, consuming more logic Republic of Korea)
elements. On the other hand, XC3S500E can configure the storage E-mail: swchung@korea.ac.kr
elements (called slice flip-flops) as latches. Hence, the implementation T. Suh (Department of Computer Science Education, College of
of a latch does not require an additional logic element to be configured Education, Korea University, Seoul 136-713, Republic of Korea)
or routed, compared to the flip-flop implementation. This feature of
Cyclone II impacts more significantly on the area. Pl occupies a larger References
area than Pff by 14.3% on Cyclone II, while Pl utilises only a 0.2%
larger area than Pff on XC3S500E. 1 Nakamura, Y., Hosokawa, K., Kuroda, I., Yoshikawa, K., and
Yoshimura, T.: ‘A fast hardware/software co-verification method for
system-on-chip by using a C/C + + simulator and FPGA emulator
Table 1: Area and performance of Pl and Pff with shared register communication’. Proc. of 41st Annual Design
Automation Conf., (DAC’04), San Diego, CA, USA, 2004, pp. 299–304
FPGA type Altera Cyclone II Xilinx XC3S500E
2 Homayoun, H., Gupta, A., Veidenbaum, A., Sasan, A., Kurdahi, F., and
Register Flip-flop Latch Flip-flop Latch Dutt, N.: ‘RELOCATE: register file local access pattern redistribution
File type based (Pff ) based (Pl ) based (Pff ) based (Pl ) mechanism for power and thermal management in out-of-order
Area 4058 LEs 4639 LEs 2474 slices 2478 slices embedded processor’, Lect. Notes Comput. Sci., 2010, 5952/2010,
Performance pp. 216– 231
(clock frequency) 55 MHz 50 MHz 35 MHz 35 MHz 3 Altera Corporation: ‘Cyclone II memory blocks’, Cyclone II Device
Handbook, Vol. 1, Chapter 8, February 2008
4 Xilinx: ‘Xilinx design reuse methodology for ASIC and FPGA
Conclusion: We have demonstrated the usefulness and benefit of utilis-
designers’, Reuse Methodology Manual For System-on-Chip Designs
ing latches in emulation with FPGAs. In the processor emulation, the

ELECTRONICS LETTERS 28th April 2011 Vol. 47 No. 9

Você também pode gostar