Escolar Documentos
Profissional Documentos
Cultura Documentos
Abstract This paper describes a FPGA configuration higher speed than external configuration scrubbers and is able
scrubbing approach for Xilinx 7-Series FPGAs that combines the to repair all SEU upset types. This technique combines two
high-speed internal scrubbing available within these devices with forms of scrubbing: internal scrubbing using the Readback
an external scrubber. The internal scrubbing unit continuously
monitors the frames of the FPGA configuration memory and CRC feature of the 7-Series FPGA families and external
corrects single-bit frame errors and is used to detect multi- scrubbing through JTAG or another configuration interface.
bit frame errors. Multi-bit upsets are repaired by means of a The internal Readback CRC provides high-speed repair for
secondary scrubbing mechanism that is primarily external to the single-bit upsets and high-speed detection for multi-bit upsets.
FPGA fabric. This Xilinx 7-Series hybrid configuration scrubbing The external scrubber provides slower, yet robust repair for
architecture scans 25,636,224 bits of the XC7Z020 device in
several microseconds and detects upsets within 8 ms and then multi-bit upsets. This scrubber was verified with both fault
corrects most multi-cell upsets in under an additional 6 ms. This injection and radiation testing in a high-energy neutron source.
configuration scrubber was validated with configuration fault
injection and neutron radiation testing.
II. C ONFIGURATION S CRUBBING
Index Terms Configuration scrubbing, field programmable
gate arrays, single-event upset (SEU). Configuration scrubbing involves periodically repairing
the contents of the configuration memory that may have
I. I NTRODUCTION
changed through ionizing radiation.1 Configuration scrubbing
the entire frame to correct the upsets and allows the Readback
CRC to continue through the entire configuration memory.
C. CRC-Only Errors
Fig. 5. In-Bounds and Out-Of-Bounds Repair.
The vast majority of the errors are identified and repaired
using the ECCERROR signals of the FRAME_ECCE2. These
errors are repaired relatively quickly single-bit errors are
repaired by the internal Readback CRC and multi-bit errors
which the configuration bit that is being artificially corrupted
within a frame are repaired by scrubbing a single frame
(through the repair process) does not exist in the configura-
with the outer scrubber. Occasionally, the CRCERROR will
tion memory. The SECDED code used by the FRAME_ECCE2
be asserted without a corresponding ECCERROR suggesting
module contains a 13-bit syndrome which can repair any
that something is wrong somewhere within the configuration
single bit within a set of 2(131) 1 or 4095 bits. Since
memory. Although the global CRC is good at detecting errors,
the configuration frame contains only 3,232 configuration bits
it provides no insight on the location of the error. Since the
(101 words), a syndrome may be generated during a multi-
location of the error is not known, the entire bitstream must be
bit odd upset detection that falls outside of these 3,232 bits as
scrubbed (Step 4). This is a very time consuming process that
shown in Figure 5. In this Out-of-Bounds case, no change to
involves writing every frame of the bitstream into the FPGA
the configuration frame actually occurs since the bit does not
exist. Odd Out-of-Bounds multi-bit upsets are relatively easy device (though it is not the same as a full reconfiguration).
This full device scrub is the outer line of defense for errors
to detect since the syndrome generated by the FRAME_ECCE2
and, although costly to perform, is essential for repairing all
module can be checked against the acceptable ranges of the
identifiable errors.
3,232 valid configuration bits (bits 0 through 3,231 of the
frame). If the syndrome is outside of this range, the outer
scrubber can immediately perform a configuration scrub on D. Single-Event Functional Interrupts (SEFI)
the affected frame (Step 3). A common function of some configuration scrubbing imple-
If the syndrome generated by the odd error is In-Bounds, mentations is to identify the presence of single-event func-
then there are three possible classifications of the error: single- tional interrupts or SEFIs. A SEFI is a system-wide fault
bit upsets, multi-bit In-Bounds upset, and masked bit upsets. within the device that prevents the device from operating prop-
If exactly two error signatures are seen in the FIFO, then the erly (i.e., functional interrupt). Known SEFIs for FPGAs
error is either a single-bit error or a multi-bit odd In-Bounds include the power-on reset SEFI, frame address SEFI, and
upset. The first of the error entries in the FIFO is the ECC the SelectMap SEFI. Although the sensitive cross-section of
event indicating that an odd error was identified. The second SEFIs is very small, their impact on the system is significant
error event in the FIFO indicates that the error was repaired. and methods for detecting and responding to their occurence
No more error events are generated since a repair was made are important. The scrubbing architecture described in this
that satisfies the ECC syndrome logic. paper does not detect or respond to SEFIs. It is possible
If the error is a multi-bit odd In-Bounds upset, then the that radiation-induced upsets may occur within this scrubbing
Readback CRC injects an error into the frame (see Figure 2) architecture that cause unrecoverable SEFIs. Future work will
and the next computation of the global CRC will reflect the investigate the sensitive cross section of these SEFIs and
presence of these upsets. The CRCERROR signal indicates mechanisms to respond to them.
to the outer scrubber that the entire frame must be scrubbed.
If the CRC error signal is not asserted, then the error was VI. Z YNQ H YBRID S CRUBBER I MPLEMENTATION
a single-bit upset properly repaired by the Readback CRC
The hybrid scrubbing approach described above was imple-
system and no outer scrubber activity is needed.
mented in the Xilinx ZYNQ programmable SoC that incor-
If more than two identical error signatures are seen in
porates the 7-Series FPGA fabric onto an ARM-based SoC.
the error FIFO, then the error is caused by a masked bit
As described earlier, the inner scrubber is implemented using
upset. Masked bits within a configuration memory are those
the internal Readback CRC mechanism found on all 7-Series
bits that change during normal operation (such as user flip-
FPGA devices. The outer scrubber is implemented with soft-
flops and embedded LUT RAM). Because these bits change
ware running on the ARM processors within the Zynq device
during normal operation, they are not used to compute the
and outer configuration activities are performed using the
error syndrome of a frame (nor do they contribute to the
PCAP configuration port found on all Zynq devices.2 The
global CRC). Although direct upsets to these bits may cause
golden configuration memory used by the outer scrubber
problems to a user circuit, they do not impact the computation
to repair upsets is located within the Zynq global DRAM
of the frame error syndrome. However, it is possible that a
system memory. The PCAP allows the ARM processor cores
multi-bit, odd upset will generate a syndrome that attempts to
to directly write or read configuration data using DMA
repair one of these masked bits. Since such bits cannot be
accesses [8]. The software that comprises the outer scrubber
repaired, the Readback CRC will continuously try to repair
this bit and never complete a full internal scrub. When the 2 There are other outer scrubbing approaches that have been developed not
outer scrubber detects this situation, it performs a scrub on using the Zynq devices but are beyond the scope of this paper.
502 IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. 64, NO. 1, JANUARY 2017
TABLE I TABLE II
P ERFORMANCE M ETRICS FOR H YBRID S CRUBBER LANSCE H YBRID S CRUBBER R ESULTS
readback a relatively small number of times (only 69 times [4] X. Iturbe, M. Azkarate, I. Martinez, J. Perez, and A. Astarloa, A novel
during the 45 hour test). This allowed the scrubber to operate SEU, MBU and SHE handling strategy for Xilinx Virtex-4 FPGAs,
in Proc. Int. Conf. Field Program. Logic Appl. (FPL), Aug. 2009,
with far fewer memory reads and configuration port accesses. pp. 569573.
In this experiment, no global CRC full reconfigurations were [5] U. Legat, A. Biasizzo, and F. Novak, SEU recovery mechanism
required. for SRAM-based FPGAs, IEEE Trans. Nucl. Sci., vol. 59, no. 5,
pp. 25622571, Oct. 2012.
[6] F. Brosser, E. Milh, V. Geijer, and P. Larsson-Edefors, Assessing scrub-
VII. C ONCLUSION bing techniques for Xilinx SRAM-based FPGAs in space applications,
in Proc. Int. Conf. Field-Program. Technol. (FPT), Dec. 2014,
This paper presents a hybrid scrubbing architecture for pp. 296299.
7-Series FPGAs. This architecture harnesses the performance [7] G. Nazar, L. Santos, and L. Carro, Accelerated FPGA repair through
shifted scrubbing, in Proc. 23rd Int. Conf. Field Program. Logic
advantages offered by the Readback CRC mechanism while Appl. (FPL), Sep. 2013, pp. 16. [Online]. Available: http://dx.doi.org/
simultaneously compensating for its limitations with an exter- 10.1109/FPL.2013.6645533
nal readback scrubber to achieve the desirable combination [8] Xilinx, Zynq-7000 AP SoC techinikal reference manual,
Tech. Rep. UG585 v.1.10, 2015.
of high-performance and robustness. Explanations of unique [9] M. Berg et al., Effectiveness of internal versus external SEU scrubbing
upset types that this hybrid scrubbing architecture handles mitigation strategies in a Xilinx FPGA: Design, test, and analysis, IEEE
were also given. This work presented the results of testing Trans. Nucl. Sci., vol. 55, no. 4, pp. 22592266, Aug. 2008.
[10] Xilinx, Soft error mitigation controller, Tech. Rep. PG036 v4.1, 2014.
the hybrid scrubber in a radiation beam which validated its [11] Xilinx, 7 Series FPGAs Configuration User Guide, Tech. Rep. UG470
effectiveness. This hybrid scrubber will be used in several v.1.10, Jun. 2015.
scheduled satellite missions for scrubbing the FPGA logic [12] A. Stoddard, Configuration scrubbing architectures for high-reliability
FPGA systems, M.S. thesis, Brigham Young Univ., Provo, UT, USA,
within the Zynq programmable SoC. Future work will include 2015.
porting this hybrid scrubbing architecture to the recently [13] M. Wirthlin, D. Lee, G. Swift, and H. Quinn, A method and case
released UltraScale and UltraScale+ FPGA families. study on identifying physically adjacent multiple-cell upsets using
28-nm, interleaved and SECDED-protected arrays, IEEE Trans. Nucl.
Sci., vol. 61, no. 6, pp. 30803087, Dec. 2014.
R EFERENCES [14] M. Ebrahimi, P. M. B. Rao, R. Seyyedi, and M. B. Tahoori, Low-
cost multiple bit upset correction in SRAM-based FPGA configuration
[1] P. E. Dodd and L. W. Massengill, Basic mechanisms and modeling of frames, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 24,
single-event upset in digital microelectronics, IEEE Trans. Nucl. Sci., no. 3, pp. 932943, Mar. 2016.
vol. 50, no. 3, pp. 583602, Jun. 2003. [15] K. Schoenberg, The LANSCE accelerator: A powerful tool for science
[2] I. Herrera-Alzu and M. Lopez-Vallejo, Design techniques for Xilinx and applications, in Proc. IEEE Particle Accel. Conf., Jun. 2007,
Virtex FPGA configuration memory scrubbers, IEEE Trans. Nucl. Sci., p. 120.
vol. 60, no. 1, pp. 376385, Feb. 2013. [16] S. Venkataraman, R. Santos, A. Das, and A. Kumar, A bit-interleaved
[3] J. Heiner, N. Collins, and M. Wirthlin, Fault tolerant ICAP con- embedded Hamming scheme to correct single-bit and multi-bit upsets
troller for high-reliable internal scrubbing, in Proc. IEEE Aerosp. for SRAM-based FPGAs, in Proc. 24th Int. Conf. Field Pro-
Conf., Mar. 2008, pp. 110. [Online]. Available: http://dx.doi.org/10. gram. Logic Appl. (FPL), Sep. 2014, pp. 14. [Online]. Available:
1109/AERO.2008.4526471 http://dx.doi.org/10.1109/FPL.2014.6927385