Project

DESIGNING AN FPGA BASED SYSTEM WITH SHARED CLOCK MULTIPLE MEMORY CONTROLLERS
ABSTRACT
As FPGA designers strive to achieve higher performance while meeting critical timing margins, one consistently vexing performance bottleneck is the memory interface. Today's more advanced FPGAs provide embedded blocks in every I/O that make the interface design easier and more reliable. These I/O elements are building blocks that, when combined with surrounding logic, can provide the designer with a complete memory interface controller. Nonetheless, these I/O blocks along with extra logic must be configured, verified, implemented, and properly connected to the rest of the FPGA by the designer in the source RTL code. The memory controller is a digital circuit which manages the flow of data going to and from the main memory. It can be a separate chip or integrated into another chip, such as on the die of a microprocessor. This is also called a Memory Chip Controller (MCC). The complexities and intricacies of creating memory controllers pose a wide assortment of challenges, which for the FPGA designer suggest the need for a new level of integration support from the tools that accompany the FPGA. Integrating all the building blocks including the memory controller state machine is essential for the completeness of a design. Controller state machines vary with the memory architecture and system parameters. The goal of this project is to design a system which uses DDR3 SDRAM SODIMM memory and improve the performance of the system using multiple memory controllers which uses single clock. Xilinx has its own Memory IP known as Memory Interface generator. This IP supports variety of memories (DDR3, DDR2, QDR + and RLD
+). This IP is widely used in FPGA world. FPGA build by Xilinx used one instance of this IP or multiple instances.
INTRODUCTION
As system bandwidths continue to increase, memory technologies have been optimized for higher speeds and performance. The next generation family of Double Data Rate (DDR) SDRAMs is the DDR3 SDRAM. DDR3 SDRAMs offer numerous advantages compared to DDR2. These devices are lower power, they operate at higher speeds, offer higher performance (2x the bandwidth), and come in larger densities. DDR3 devices provide a 30% reduction in power consumption compared to DDR2, primarily due to smaller die sizes and the lower supply voltage (1.5V for DDR3 vs. 1.8V forDDR2). DDR3 devices also offer other power conservation modes like partial refresh Another significant advantage for DDR3 is the higher performance/bandwidth compared to DDR2 devices due to the wider pre-fetch buffers (8-bits wide for DDR3 compared to 4-bits for DDR2), and the higher operating clock frequencies. However, designing to the DDR3 memory interfaces also becomes more challenging. Implementing a high-speed, high-efficiency DDR3 memory controller in a FPGA is a formidable task. Until recently, only a few high-end FPGAs supported the building blocks needed to interface reliably to high speed DDR3 memory devices.
The increasing core-count on current and future processors is posing critical challenges to the memory subsystem to efficiently handle concurrent memory requests. The current trend is to increase the number of memory channels available to the processor's memory controller. There is a diminishing return when increasing the number of memory channels per memory controller. This can be solved system. by using multiple memory controllers for
The goal of this project is designing a system which uses the DDR3 SDRAM SODIMM memory channels and enhancing the performance of the system by using multiple memory controllers which share the common clock . In addition, this project show that the performance degradation can be efficiently addressed by increasing the ratio of memory controllers to channels while keeping the number of memory channels constant and clock sharing singnificantly addresses the power consumption and reduces wastage of resources of FPGA.
Memories:
DRAM
An advantage of DRAM over other types of memory is its ability to be implemented with fewer circuits per memory cell on the IC (integrated circuit). The DRAMs memory cell is based on storing charge on a capacitor. A typical DRAM cell is built with one capacitor and one or three FET(s) (field-effect transistor). A typical SRAM (Static Random Access Memory) memory cell takes six FET devices, resulting in fewer memory cells per same size IC. SRAMs are simpler to use, easier to interface to and have faster data access times than DRAMs.
DRAMs core architecture consists of memory cells organized into a twodimensional array of rows and columns . To access a memory cell requires two steps.First, you address a specific row and then you address a specific column in the selected row. In other words, first an entire row is read internally in the DRAM IC and then the column address selects which column of the row is to be read or to be written to the DRAM IC I/O (Input/Output) pins.
DRAM reads are destructive, meaning the data in the row of memory cells are
destroyed in the read operation. Therefore, the row data need to be written back into the same row after the completion of a read or write operation on that row. This operation is called precharge and is the last operation on a row. It must be done before accessing a new row and is referred to as closing an open row.
Analysis of computer memory accesses show that reads of sequential memory addresses are the most common types of memory accesses. This is reasonable since reading computer instructions are typically more common than data read or writes. Also, most instruction reads are sequential in memory until an instruction branch or a jump to subroutine occurs.
SDRAM
SDRAM, or Synchronous Dynamic Random Access Memory is a form of semiconductor memory can run at faster speeds than conventional DRAM and is therefore the use of SDRAM is becoming more widespread. Traditional forms of memory including DRAM operate in an asynchronous manner. They react to changes as the control inputs change, and also they are only able to operate as the requests are presented to them, dealing with one at a time. SDRAM is able to operate more efficiently. It is synchronised to the clock of the processor and hence to the bus with SDRAM having a synchronous interface, it has an internal finite state machine that pipelines incoming instructions. This enables the SDRAM to operate in a more complex fashion than an asynchronous DRAM. This enables it to operate at much higher speeds. As a result of this SDRAM is capable of keeping two sets of memory addresses open simultaneously. By transferring data alternately from one set of addresses, and then the other, SDRAM cuts down on the delays associated with asynchronous RAM, which must close one address bank before opening the next.
The term pipelining is used to describe the process whereby the SDRAM can accept a new instruction before it has finished processing he previous one. In other words, it can effectively process two instructions at once.For writing, one write command can be immediately followed by another without waiting for the original data to be stored within the SDRAM memory itself. .For reading the requested data appears a fixed number of clock pulses after the read instruction was presented. It is possible to send additional instructions during the delay period which is termed the latency of the SDRAM.
SDRAM types and development

Since SDRAM was introduced, it has been developed to make it faster and more effective. As a result there are a number of different types of SDRAM that are available.
SDR SDRAM: This is the basic type of SDRAM that was first introduced. It has now been superseded by the other types below. It is referred to as single data rate SDRAM.
DDR SDRAM: DDR SDRAM gains its name from the fact that it is Double Data Rate SDRAM. This type of SDRAM provides data transfer at twice the speed of the traditional type of SDRAM memory. This is achieved by transferring data twice per cycle.
DDR2 SDRAM: DDR2 SDRAM can operate the external bus twice as fast as its predecessor and it was first introduced in 2003.
DDR3 SDRAM: DDR3 SDRAM is a further development of the double data rate type of SDRAM. It provides further improvements in overall performance and speed. As a result its use is becoming more widespread.
DDR4 SDRAM: This is a further type of SDRAM being developed and anticipated to be available in 2012.
RLDRAM
: Reduced Latency DRAM is the type of memory which has the lowest
latency of row and column operations.
LPDDR SDRAM
: Low power SDRAM which is power efficient than other versions of
GPDDR .
: Graphics DDR . GDDR is a graphics card-specific memory technology
The various types of SDRAM are now widely used and have taken over from some other types of memory. There are a number of different elements to SDRAM architecture. When using SDRAM it is necessary to have a basic understanding of the SDRAM architecture and as a result the way it operates. SDRAM architecture also greatly impacts the design of the integrated circuit itself. Aspects such as the physical positioning of areas for the memory cells themselves as well as that for the control circuitry are of great importance.
MEMORY MODULES :
Dual inline memory modules (DIMMs) are plug-in memory modules for computers. DIMMs vary in physical size, memory data width, ranks, memory sizes, memory speeds and memory architectures. JEDEC has defined DIMMs standards and continues to work on defining new DIMMs based on new memory types and memory architectures. The standard DIMM size is used in desktops, workstations and servers. SO-DIMMs (Small Outline DIMMs) are small size DIMMs used in laptops and other space constant implementations. The Butterfly configuration refers to two SO-DIMMs parallel to the computer motherboard that have their edge connectors next to each other. Think of the two edge connectors as the butterfly body and the SO-DIMMs as the open butterfly wings. Mini-DIMMs (Miniature DIMMs) are smaller than SO-DIMMs (single outline dual in line memory module)and are used in single board computers. VLP-DIMMs (Very Low Profile DIMMs) are shorter in height and are used in blade servers. DIMM Memory Size & Speed: DIMM memory size depends on size of memory ICs used and DIMM configuration. A 512Mb (Meg bit) memory IC can be designed as different configurations (See Table 5).
DIMM speed depends on the clock speed supported by the DDR, DDR2 and DDR3 SDRAMs used on the DIMM.
CORE ARCHITECTURE
This section describes the architecture of the 7 series FPGAs memory interface solutions core, providing an overview of the core modules and interfaces. OVERVIEW: The 7 series FPGAs memory interface solutions core is shown in below.
User design The user design block shown in above figure is any FPGA design that requires to be connected to an external DDR3 SDRAM. The user design connects to the memory controller via the user interface. AXI4 Slave Interface Block The AXI4 slave interface maps AXI4 transactions to the UI to provide an industry-standard bus protocol interface to the memory controller. User Interface Block and User Interface The UI block presents the UI to the user design block. It provides a simple alternative to the native interface by presenting a flat address space and buffering read and write data. Memory Controller and Native Interface The front end of the memory controller (MC) presents the native interface to the UI block. The native interface allows the user design to submit memory read and write requests and provides the mechanism to move data from the user design to the external memory device, and vice versa. The back end of the memory controller connects to the physical interface and handles all the interface requirements to that module. The memory controller also provides a reordering option that reorders received requests to optimize data throughput and latency. PHY and the Physical Interface
The front end of the PHY connects to the memory controller. The back end of the PHY connects to the external memory device. The PHY handles all memory device signal sequencing and timing.
ID ELAYCTRL An IDELAYCTRL is required in any bank that uses IDELAYs. IDELAYs are associated with the data group, the capture clocks, and the resynchronization (BUFR-rsync) clocks. Any bank/clock region that uses these signals require an IDELAYCTRL. The IDELAYCTRL reference frequency should be set to 200 MHz. Based on the IODELAY_GROUP attribute that is set, the ISE tool replicates the IODELAYCTRLs for each region where the IDELAY blocks exist. When a user creates a multi-controller design on their own, each MIG output has the component instantiated with the primitive. This violates the rules for IODELAYCTRLs and the usage of the IODELAY_GRP attribute. IODELAYCTRLs need to have only one instantiation of the component with the attribute set properly, and allow the tools to replicate as needed. AXI4 Slave Interface Block The AXI4 slave interface block maps AXI4 transactions to the UI interface to provide an industry-standard bus protocol interface to the memory controller. The AXI4 slave
interface is optional in designs provided through the MIG tool. The AXI4 slave interface is required with the axi_7series_ddrx memory controller provided in EDK. The RTL is consistent between both tools. The overall design is composed of separate blocks to handle each AXI channel, which allows for independent read and write transactions. Read and write commands to the UI rely on a simple round-robin arbiter to handle simultaneous requests. The add ress read/address write mod ules are responsible for chopping the AXI4 burst/wrap requests into smaller memory size burst lengths of either four or eight, and also conveying the smaller burst lengths to the read/write data modules so they can interact with the user interface. About the AXI protocol: The AXI protocol is targeted at high-performance, high-frequency system designs and includes a number of features that make it suitable for a high-speed submicron interconnect. The objectives of the latest generation AXI interface are to: be suitable for high-bandwidth and low-latency designs enable high-frequency operation without using complex bridges meet the interface requirements of a wide range of components be suitable for memory controllers with high initial access latency provide flexibility in the implementation of interconnect architectures The key features of the AXI protocol are: separate address/control and data phases support for unaligned data transfers using byte strobes
burst-based transactions with only start address issued separate read and write data channels to enable low-cost Direct Memory Access (DMA) ability to issue multiple outstanding addresses out-of-order transaction completion .
As well as the data transfer protocol, the AXI protocol includes optional extensions that cover signaling for low-power operation. Architecture The AXI protocol is burst-based. Every transaction has address and control information on the address channel that describes the nature of the data to be transferred. The data is transferred between master and slave using a write data channel to the slave or a read data channel to the master. In write transactions, in which all the data flows from the master to the slave, the AXI protocol has an additional write response channel to allow the slave to signal to the master the completion of the write transaction. The AXI protocol enables: address information to be issued ahead of the actual data transfer support for multiple outstanding transactions support for out-of-order completion of transactions.
Figure below shows how a read transaction uses the read address and read data channels.
Read address channel Address and control Master Interface Read data Slave interface Read data
Read data channel Rea Read d Data data
Channel architecture of reads
Write address channel Address and control
Write data channel Master interface Write data Write Data Write data Write data Slave interface
Write response channel Writ e response
Channel architecture of writes
Channel definition
Each of the five independent channels consists of a set of information signals and uses a twoway VALID and READY handshake mechanism. The information source uses the VALID signal to show when valid data or control information is available on the channel. The destination uses the READY signal to show when it can accept the data. Both the read data channel and the write data channel also include a LAST signal to indicate when the transfer of the final data item within a transaction takes place. Read and write address channels Read and write transactions each have their own address channel. The appropriate address channel carries all of the required address and control information for a transaction. The AXI protocol supports the following mechanisms:
Interface and interconnect
variable-length bursts, from 1 to 16 data transfers per burst bursts with a transfer size of 8-1024 bits wrapping, incrementing, and non-incrementing bursts atomic operations, using exclusive or locked accesses
A typical system consists of a number of master and slave devices connected together through some form of interconnec.
Master 1
Master 2
Master 3 Interface
Interconnect Interface Slave 1 Slave 2 Slave 3 Slave 4
Interface and interconnect The AXI protocol provides a single interface definition for describing interfaces: between a master and the interconnect between a slave and the interconnect between a master and a slave.
The interface definition enables a variety of different interconnect implementations. The interconnect between devices is equivalent to another device with symmetrical master and slave ports to which real master and slave devices can be connected. Most systems use one of three interconnect approaches: shared address and data buses shared address buses and multiple data buses multilayer, with multiple address and data buses.
In most systems, the address channel bandwidth requirement is significantly less than the data channel bandwidth requirement. Such systems can achieve a good balance between system performance and interconnect complexity by using a shared address bus with multiple data buses to enable parallel data transfers.

Project

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Project

Enviado por

Direitos autorais:

Formatos disponíveis

DESIGNING AN FPGA BASED SYSTEM WITH SHARED CLOCK MULTIPLE MEMORY CONTROLLERS

SDRAM types and development

latency of row and column operations.

: Low power SDRAM which is power efficient than other versions of

: Graphics DDR . GDDR is a graphics card-specific memory technology

Read data channel Rea Read d Data data

Channel architecture of reads

Write address channel Address and control

Write response channel Writ e response

Channel architecture of writes

Interconnect Interface Slave 1 Slave 2 Slave 3 Slave 4

Você também pode gostar