Você está na página 1de 9

Exploiting FPGA-based Techniques for Fault Injection Campaigns on VLSI Circuits

P. Civera, L. Macchiarulo, M. Rebaudengo, M. Sonza Reorda, M. Violante Politecnico di Torino Torino, Italy {civera, luca, reba, sonza, violante}@polito.it Abstract1
In this paper we propose a FPGA-based system to speed-up Fault Injection campaigns for the evaluation of the fault-tolerant capabilities of VLSI circuits. An environment is described, relying on FPGA-based emulation of the circuit. Suitable techniques are described, allowing emulating the effects of faults and to observe faulty behavior. The proposed approach allows combining the speed of hardware-based techniques, and the flexibility of simulation-based techniques. Experimental results are provided showing that significant speed-up figures with respect to state-of-the-art simulation-based techniques can be achieved.

1. Introduction
In recent years, there has been a rapid increase in the use of computer-based systems in areas where failures can cost lives and/or money, such as railway traffic control, aircraft flight, telecommunications, and others. This trend has led to a growing interest in the techniques for the validation of the fault tolerance properties of these systems and for the evaluation of their reliability. On the other side, the continuous increase in the integration level of electronic systems is making it increasingly difficult to guarantee an acceptable degree of reliability, due to the occurrence of transient faults (often modeled as soft errors) that can dramatically affect the behavior of a system. As an example, the decrease in the magnitude of the electric charges used to carry and store information is seriously raising the probability that alpha particles and neutrons hitting the circuit could introduce transient errors in its behavior (often modeled as Single Error Upsets, or SEUs) [1]. To face the above issues, mechanisms are required to increase the robustness of electronic devices and systems with respect to possible errors occurring during their normal function. At the same time, designers strongly need effective techniques and methods to debug and verify the correctness of their design and implementation. Fault Injection [2] imposed itself as a viable solution to the above problems. As pointed out in [2], physical Fault Injection (hardware- and software-implemented Fault Injection approaches) is well suited when a prototype of the system is already available, or when the system itself is too large to be modeled and simulated at an acceptable cost. Conversely, simulation-based Fault Injection is very effective in allowing early and detailed analysis of designed systems, since it can be exploited when a prototype is not yet available, and allows the analysis of practically any possible fault, but requires very high CPU time to simulate the model of the system (provided that it is available).
1

This work has been partially supported by ASI (Italian Space Agency). Contact address: Matteo Sonza Reorda, Dipartimento di Automatica e Informatica, Politecnico di Torino, Corso Duca degli Abruzzi 24, I-10129 Torino (Italy), e-mail sonza@polito.it

Proceedings of the 2001 IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT01) 1063-6722/01 $17.00 2001 IEEE

An intermediate solution, providing most of the benefits of both physical and simulationbased techniques, while avoiding most of their disadvantages, has recently become viable, thanks to the latest advancements in the FPGA technology. Modern FPGA-devices can be fruitfully exploited to emulate systems composed of hundred of thousands gates at a reasonable cost. The FPGA technology can be effectively exploited not only for rapid prototyping and for small-volume productions, but even to perform Fault Injection experiments. This allows performing the evaluation of the fault tolerance properties of a circuit when only a model is available and its prototypical implementation is not yet ready. By resorting to FPGAs we can perform Fault Injection campaigns at the same speed of hardware prototypes but at a negligible fraction of their costs. Moreover, FPGA-based Fault Injection techniques generally support a more accurate fault behavior analysis than hardware-based ones, since they may allow the injection of a wider set of faults (e.g., specific faults inside the circuit). Several works already explored the usage of FPGAs for speeding-up Fault Simulation [3] [4] of permanent single stuck-at faults. In [5], the extension of their usage to Fault Injection is proposed, but the approach is based on reprogramming the FPGA once for each fault. The reconfiguration, which can be only partial, results in a time overhead reducing its efficiency. In [6] a new method based on a mutant generation on a FPGA circuit has been presented. This method avoids circuit reconfiguration but could introduce a significant area overhead. In this paper we describe an alternative approach for performing Fault Injection exploiting FPGA devices based on instrumented circuit. Our solution does not require FPGA reconfiguration for each fault experiment, thus attaining a much greater efficiency in terms of elapsed time. Moreover, our solution allows not only to efficiently perform Fault Injection, but also to effectively support the observation of faulty behavior, (e.g., for the observation of latent faults). A prototypical version of the proposed Fault Injection environment has been developed, showing that the whole Fault Injection process can be highly automated and easily introduced into existing VLSI design flows. Experimental results are provided on some benchmark circuits, which allow a preliminary evaluation of the approach in terms of required time for Fault Injection. When compared with an efficient in house developed state-of-the-art Fault Injection environment based on gate-level fault simulation techniques [7], our method is faster by a factor up to 60. On the other side, if Fault Injection is performed using commercial VHDL simulation tools (such as in [8]), the speed-up factor becomes greater than 4 orders of magnitude. The effectiveness of our method scales well with the circuit size, and the speed-up with respect to simulation-based Fault Injection techniques increases when the circuit becomes larger. The speed-up also increases when the length of the Fault Injection experiments (in terms of number of considered input vectors) grows, thus showing that the method is particularly suitable for large-scale verification of fault tolerance properties of VLSI circuits. The paper is organized as follows. Section 2 describes the proposed Fault Injection system, describing its whole architecture, the circuit instrumentation technique we developed to support Fault Injection once the circuit is mapped on a FPGA, and the process for performing a Fault Injection campaign. Section 3 reports some experimental results and evaluates the cost of the approach in terms of FPGA area overhead and time requirements. Section 4 draws some conclusions.

2. The Fault Injection System


For the purpose of this paper (but without any loss of generality) we assume that the system to be considered is a VLSI circuit (or part of it), possibly including some fault tolerance mechanism. Moreover, we assume that a gate-level description of the system itself is available.

Proceedings of the 2001 IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT01) 1063-6722/01 $17.00 2001 IEEE

We also assume that the input stimuli to be used during the Fault Injection campaign are already available, and we do not deal with their generation nor evaluation. The fault model we target is the single transient bit-flip in the circuit storage elements; however, our method can be easily extended to support multiple bit-flips. The bit-flip fault model perfectly matches the characteristics of SEUs, whose effects are increasingly important not only for systems targeted to space, but even for ground applications. Finally, we assume that fault effects are classified according to the following categories: Silent: the output trace of the faulty circuit and its state at the end of the simulation correspond to the ones of the fault-free circuit. Latent: the output trace of the faulty circuit corresponds to the fault-free one, but their states at the end of the simulation do not match. As a consequence, the fault is still active in the circuit and may produce wrong outputs in the following clock cycles. Failure: the output trace of the faulty circuit does not match the one of the fault-free circuit Detected: the fault is detected by some Error Detection Mechanism (EDM) existing in the system. The architecture of the whole environment is summarized in Figure 1. A typical Fault Injection environment is composed of three modules: Fault List Manager: this module is in charge of generating the list of faults to be injected according to the system, the input data and the possible indications of the designer (e.g., into the most critical portions in the system). Moreover, it is in charge of implementing techniques for reducing the fault list size by collapsing equivalent faults, or removing faults whose behavior is known a priori [9]. Fault Injection Manager: this module is the core of the Fault Injection environment, and is in charge of orchestrating the selection of a new fault, its injection in the system, and the observation of the resulting behavior. Result Analyzer: the task of this module is to analyze the output data produced by the system during each fault injection experiment, categorize faults according to their effects, and produce statistical information.
Host Computer Fault List Manager Fault Injection Manager
Circuit Instrumenter Fault Injection Master
Communication Channel

Result Analyzer

FPGA board

Fault Injection Interface

Emulated Circuit

Figure 1: Fault Injection environment architecture 2.1. FPGA-based Fault Injection Manager This module is normally the most critical module, since it may require a huge amount of time for performing its task when the size of the circuit, or the number of faults to be injected, or the length of the input sequence to be considered become large. In our approach this module exploits a FPGA board that emulates the gate-level system with and without faults. The FPGA board is driven by a host computer where the other modules and the user interface run. The

Proceedings of the 2001 IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT01) 1063-6722/01 $17.00 2001 IEEE

Fault Injection Manager is implemented as a hardware/software system where the software partition runs on the host, and the hardware partition is located on the FPGA board along with the emulated circuit. To efficiently determine the behavior of the circuit when a fault appears, the FPGA board emulates an instrumented version of the circuit, which allows both the injection of each fault, and the observation of the corresponding faulty behavior. 2.2. Circuit instrumentation for supporting Fault Injection Being interested in injecting transient faults in the circuit flip-flops, we developed a method for easily modifying their value; moreover, the method implements an effective way to trigger the occurrence of faults at the injection time and for supporting the observation of the faulty behavior. The architecture we devised, reported in Figure 2, is based on adding to the original circuit a register (named Mask Chain) storing the binary information about which flip-flop(s) should be affected by the fault, and an ad-hoc logic for performing the Fault Injection. The signal inject controls the Fault Injection: the Fault Injection Master asserts it to force the occurrence of the selected bit-flip(s). The modules added to the circuit are the following: Mask Chain: each bit in FFs is associated to a bit in Mask Chain, which is a register with parallel- and serial-load capabilities (the load and mode signals control the operation of the register). The register operates as follows: a. During the experiment initialization phase, it works as a shift-register, and loads a bit stream coming over the scan_in signal. The bit stream contains a 1 in the positions corresponding to the flip flop(s) where the fault must be injected, 0s elsewhere. b. At the injection time (i.e., when the Fault Injection Master asserts the inject signal), each bit in Mask Chain set to 1 produces a bit-flip, i.e., the corresponding FFs module loads the complement of the value coming from the Combinational Circuitry. c. The Mask Chain may load the content of the FFs module. The state of the circuit can then be read through the scan_out signal by operating the module as a shift register. This feature may be exploited at the end of each experiment to better classify fault effects. M: it is the combinational logic in charge of possibly complementing the output of the Combinational Circuitry which is loaded into the FFs module. The behavior of M depends on the contents of the Mask Chain module and on the inject signal.
PI Combinational circuitry scan_out
FFs

PO M I S R load mode

M
Mask chain

inject

scan_in

Figure 2: Instrumented circuit

Proceedings of the 2001 IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT01) 1063-6722/01 $17.00 2001 IEEE

The Fault Injection system is also in charge of analyzing the faulty behavior, either for pure fault classification, or for more detailed analysis of the fault behavior. For this purpose two features are currently supported by our system: Output Trace: the Fault Injection Interface samples the outputs of the circuit at each clock cycle, and sends their value to the host; simulation can possibly be stopped as soon as an unexpected output value is observed. Output Compaction: when the goal is just to classify faults, a more effective solution corresponds to compacting the output values generated by the circuit using a Multiple Input Signature Register (MISR). In this case the host receives the signature stored by the MISR at the end of each experiment, and classifies the fault as Failure if it does not match the one of the fault-free circuit. This solution is more efficient in terms of Fault Injection speed, since it does not require synchronizing the host and the FPGA board after every clock cycle for retrieving the output response and checking it against the corresponding fault-free output. 2.3. The FI process The Fault Injection process is composed of the following steps: The FPGA board is loaded with the instrumented circuit description. The circuit description is instrumented according to the previously described transformations. The FPGA-based system is exploited to simulate the fault-free circuit and the output values at each clock cycle are recorded: the obtained output trace is the reference trace we use to classify fault effects. The Fault Injection Manager executes the procedure described in Figure 3, which is in charge of initializing the FPGA, feeding the emulated circuit with the input vectors, activating the injection, and observing the system response. If the Output Compaction feature is activated, output observation and fault classification are executed only once at the end of the experiment.
for every fault Fj { reset the circuit load Mask Chain for every input vector Vi { apply vector Vi at time Ti if( Ti == Injection Time (Fj) ) inject = 1 else inject = 0 if( Output Trace is active ) { observe output classify fault effects } i = i+1 } if( Output Compaction is active ) { observe MISR classify fault effects } read circuit FFs }

Figure 3: Fault Injection procedure The following phases are executed during each Fault Injection experiment: 1. Initialization: the mask defining the fault to be injected is loaded in the Mask Chain

Proceedings of the 2001 IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT01) 1063-6722/01 $17.00 2001 IEEE

module. 2. Emulation and Injection: the emulated circuit is fed with a stream of input vectors, and the fault is injected at the proper time. If the Output Trace feature is activated, the output at each single clock is recorded and sent to the host. 3. Latent fault analysis: after all the input vectors have been applied, the content of the FFs module is loaded in the Mask chain module and then sent to the Fault Injection Master through the scan_out signal. If the Output Compaction feature is activated, the information stored in the MISR about the output behavior of the circuit during the whole simulation process is also sent to the host. The method we propose guarantees the accessibility of every memory element in the circuit during Fault Injection and allows a detailed result analysis. Moreover, it allows a time resolution of one clock cycle: the Fault Injection Master can inject faults at any clock cycle by triggering the inject signal at that cycle, and observing the system behavior at any clock cycle.

3. Experimental results
In this Section, we describe the experiments we performed in order to assess the effectiveness of our approach. We developed a prototype of the Fault Injection system using the ADM-XRC development board [10], equipped with a Xilinx Virtex V1000 FPGA. The development board owns a PCI connector and it is inserted in a standard Pentium-class PC, which manages the board as a memory-mapped device. The main goal of the experiments is to measure the speed-up provided by our approach with respect to traditional simulation-based approaches. In particular to evaluate the time performance of our approach, we considered some circuits coming from the ITC'99 benchmark suite [11], and we performed two sets of experiments: for each benchmark we injected 100,000 faults using a basic version of our FPGA-based environment where the Output Trace option is activated. We then compared the attained time figures with the ones recorded while using a simulation-based fault injection tool based on a modified gate-level parallel fault simulator. We repeated the previous set of experiments with the Output Compaction feature activated. 3.4. Basic version During this experiment the Output Trace option is exploited, while feeding the circuits with 100 random vectors and 100,000 faults. Faults locations were randomly selected among the benchmark flip-flops. Fault Injection times were randomly selected among the 100 clock cycles. Results of this set of experiments are reported in Table 1. Columns F, L and S report the percentage of Failure, Latent, and Silent faults we recorded during the experiments, respectively; since no Error Detection Mechanism was implemented in the considered circuits, no faults were classified as Detected. The column Time reports the time spent for performing the experiments: this figure includes the time for loading the Xilinx configuration, for programming the Mask chain, for running the experiments and for classifying fault effects. During these experiments, the Output Trace feature was activated: therefore, the output values produced by the circuit are sent to the host at each clock cycle, and the host stops each Fault Injection experiment as soon as the fault has been classified. To provide the reader with some reference value, we also performed the same Fault Injection campaigns with an efficient in-house developed simulation-based Fault Injection tool based on a parallel event-driven fault simulator, following the approach proposed in [7]. The simulator concurrently simulates 32 faulty circuits and stops the simulation of a fault as soon

Proceedings of the 2001 IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT01) 1063-6722/01 $17.00 2001 IEEE

as the fault effects reach the circuit outputs. The last two columns of Table 1 report the results we attained by running this tool on a Sun UltraSparc running at 300 MHz and equipped with 256 MB of RAM, as well as the speed-up attained with our FPGA-based system. The results of Table 1 show that the basic version of the FPGA-based environment attains speed-up ranging from 4 to 6, pointing out that the effectiveness of the approach becomes visible when very large circuits are considered. Circuit b17 shows a lower speed-up due to its high number of Primary Outputs: this requires a longer time for transmitting the output values produced by the circuit at each emulated clock cycle.
Bench. FPGA-based Fault Injection F [%] b14 b15 b17 b18 57.1 20.6 10.9 2.9 S [%] 0 0 1.2 97.1 L [%] 42.9 79.4 87.9 0.0 Time [sec] 19.31 39.20 66.47 131.64 Simulationbased Fault Injection Time [sec] 114.34 201.12 240.43 739.08

Speed-up

5.92 5.13 3.62 5.61

Table 1: Results with the basic version of the Fault Injection system We then performed a further group of experiments, aimed at evaluating how the performance of the proposed approach scales with the number of simulated vectors. We considered the benchmark b17 and performed several experiments where the number of vectors ranges between 100 to 10,000 and the number of faults is 1,000. Results are reported in Table 2 and demonstrate that the attained speed-up grows with the number of simulated vectors: this is mainly due to the fact that the time overhead for setting up the system becomes negligible while the length of the simulation becomes larger.
Num. of Vectors [#] 100 1,000 10,000 FPGA-based Fault Injection 1.09 6.97 63.03 Simulation-based Fault Injection 4.25 64.73 669.27 Speed-up 3.89 9.28 10.61

Table 2: Fault Injection campaigns on b17 3.5. Output Compaction As mentioned in Section 2.3, if Output Compaction is activated, the output values of the circuit are compacted using a MISR, whose signature is read only at the end of each Fault Injection experiments. In this way the synchronization between the FPGA board and the host computer becomes looser, allowing a significant gain in the system performance. Table 3 reports the results gathered by exploiting this technique, injecting 100,000 faults on the considered subset of benchmark circuits. For the purpose of the comparison reported in Table 3, the modified software fault simulator has been adopted as a reference simulated for the complete length of the experiment (100 vectors).

Proceedings of the 2001 IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT01) 1063-6722/01 $17.00 2001 IEEE

Bench. b14 b15 b17 b18

FPGA-based Fault Injection Time [s] 10.39 13.88 13.63 26.78

Simulation-based Fault Injection Time [s] 362.47 392.12 363.20 812.37

Speed-up 34.89 28.26 26.66 30.33

Table 3: Results with the extended version of the Fault Injection system In this case, our approach shows a higher speed-up with respect to the simulation-based approach, ranging from 27 to 30. These better results have been obtained by minimizing the data transfer between the host PC and the Xilinx board. Similar results are observed when studying how the performance scales with the number of simulated vectors. As it can be seen from Table 4, the method shows a speed-up which significantly increases when the number of considered vectors grows, and overcomes the threshold of 60 when more than 10,000 vectors are applied to the circuit.
Num. of Vectors [#] 100 1,000 10,000 FPGA-based Fault Injection 0.69 1.75 13.33 Simulation-based Fault Injection 5.22 74.11 776.85 Speed-up 7.60 42.34 58.27

Table 4: Fault Injection campaigns on b17 3.6. Comparison with a RT-level VHDL Fault Injection environment In order to further evaluate the effectiveness of the proposed approach in terms of speed, we compared our FPGA-based approach with a RT-level Fault Injection environment based on a commercial VHDL simulator [8]. The simulator runs on a Sun UltraSparc running at 300 MHz and equipped with 256 MB of RAM. For the purpose of these experiments we exploited the RT-level version of the ITC'99 benchmarks [11]. Tab. 5 reports the results gathered applying 100 vectors and 100,000 faults on the considered set of benchmark circuits. The experimental results show that the attained speed-up figures up to 45,000 times, thus demonstrating the very high efficiency of the proposed approach (no results are reported for the RT-level Fault Injection experiment in b18 circuit since the time required to conclude the experiment was too high).
Bench. b14 b15 b17 b18 FPGA-based Fault Injection Time [s] 10.39 13.88 13.63 26.78 RT-level Fault Injection Time [s] 291,170 434,091 625,065 NA Speed-up 28,024 31,274 45,859 -

Table 5: FPGA-based vs. RT-level Fault Injection environments.

4. Conclusions
This paper presented an environment for performing Fault Injection campaigns that is based on the adoption of a FPGA device for emulating the system under analysis. A major novelty in

Proceedings of the 2001 IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT01) 1063-6722/01 $17.00 2001 IEEE

the proposed approach lies in the circuit instrumentation approach adopted to inject transient faults in the circuit emulated through the FPGA. These transformations allow a significant reduction in the time required to perform the experiments, thanks to the fact that they do not require FPGA reconfiguration. By exploiting a prototypical version of the described Fault Injection environment, we were able to evaluate its effectiveness with respect to state-of-theart alternative approaches. Speed-up factors up to about 60 have been observed with respect to gate-level simulation-based Fault Injection techniques, while the speed-up is higher than 4 orders of magnitude if a commercial RT-level VHDL simulator is considered. Despite these impressive speed-up figures, the reported results are still rather preliminary, and we already devised several optimizations with respect to the current version of our system: therefore, we expect that both the figures for the area overhead (which limit the size of the largest circuits the method can deal with given a certain FPGA) and those for the speed-up can still be significantly improved in the future versions of our Fault Injection system. At the same time, work is currently being done in order to develop a new version of our Fault Injection system, based on an ad hoc board integrating the FPGA device and providing faster access to its input and output signals. In this way, the interaction between the host computer and the emulating FPGA is further reduced, and the latter can emulate the circuit, inject faults, and observe the faulty behavior at the maximum allowed speed.

5. References
[1] M. Nikoladis, Time Redundancy Based Soft-Error Tolerance to Rescue Nanometer Technologies, IEEE 17th VLSI Test Symposium, April 1999, pp. 86-94 [2] R. K. Iyer and D. Tang, Experimental Analysis of Computer System Dependability, Chapter 5 of Fault-Tolerant Computer System Design, D. K. Pradhan (ed.), Prentice Hall, 1996 [3] S. A. Hwang, J. H. Hong, C. W. Wu, Sequential circuit fault simulation using logic emulation, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Volume: Vol. 17, No. 8, Aug. 1998, pp. 724 -736 [4] K. T. Cheng, S. Y. Huang, W. J. Dai, Fault emulation: A new methodology for fault grading, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 18, No. 10, Oct. 1999, pp. 1487 -1495 [5] L. Antoni, R. Leveugle, B. Fehr, Using Run-time Reconfiguration for Fault Injection in Hardware Prototypes, IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems, 2000, pp. 405-413 [6] R. Leveugle, Fault Injection in VHDL Descriptions and Emulation, IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems, 2000, pp. 414-419 [7] C. Hungse, E.M. Rudnick, J.H. Patel, R.K. Iyer, G.S. Choi, A gate-level simulation environment for alpha-particle-induced transient faults, IEEE Transactions on Computers, Vol. 45, No. 11, Nov. 1996, pp. 1248-1256 [8] B. Parrotta, M. Rebaudengo, M. Sonza Reorda, M. Violante, New Techniques for Accelerating Fault Injection in VHDL descriptions, IOLTW2000: International On-Line Test Workshop, July 2000, pp. 61-66 [9] A. Benso, M. Rebaudengo, L. Impagliazzo, P. Marmo, Fault-List Collapsing for Fault Injection Experiments, RAMS'98: Annual Reliability and Maintainability Symposium, January 1998, pp. 383-388 [10] ADM-XRC PCI Mezzanine card User Guide Version 1.2, ALPHA DATA parallel systems ltd, http://www.alphadata.co.uk/ [11] F. Corno, M. Sonza Reorda, G. Squillero, RT-Level ITC 99 Benchmarks and First ATPG Results, IEEE Design & Test of Computers, July-August 2000

Proceedings of the 2001 IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT01) 1063-6722/01 $17.00 2001 IEEE

Você também pode gostar