Você está na página 1de 21

Asynchronous FIFO: Simulation using Modelsim

Note: Diagram numbers are continued from the previous post.

Test bench strategy is to generate all corner conditions like full and empty. Simulation
waveforms are shown in Figure (11) to Figure (13). These waveforms are generated using test
bench program provided in previous article. Read clock frequency 50 MHz and write clock
frequency 10 MHz are generated using initial procedural statements:
initial begin #10 r_clk=0; forever #10 r_clk=~r_clk; end
initial begin #5 w_clk=0; forever #50 w_clk=~w_clk; end
Duty cycle of r_clk is 10 nS and w_clk is 50 nS.
Other conditions like w_en, r_en etc are generated using below set of statements:
initial begin d_in=1;
@(posedge w_en);
repeat(20) @(posedge w_clk) d_in=d_in+2;
repeat(20) @(posedge w_clk) d_in=d_in-1;
end
initial begin reset=1;#30 reset=0;end
initial begin fork #50 w_en=1; #1800 w_en=0; #2500 w_en=1 ; join end
initial begin fork #50 r_en=0; #1850 r_en=1; #2400 r_en=0; #2500 r_en=1; join
end
In the above two statements ‘fork’ and ‘join’ are used to start the simulation at zero
simulation time. Reset signal is activated for first 30 nS and then it is deactivated. For the first
5 nS reset is active which tests asynchronous reset condition
For asynchronous reset condition except d_out all other variables are initialized to
default states, including f_empty_flag. r_ptr, w_ptr, ptr_diff are initialized to zero. All FIFO
status flags are initialized to default values. Since reset signal is connected to binary counters,
reset of counters causes reset of both read and write pointers, pointer difference and all status
flags. This can be observed in the simulated waveform shown in Figure (11). Verilog code has
to be improved for the complete asynchronous reset including reset of d_out. When I tried to
add asynchronous reset to RTL code of dual port RAM, synthesizer does not infer dual port
RAM. Instead, it infers set of registers. This problem has to be sorted out.
Figure (11) Simulation waveform 1

After 1800nS amount of time w_en is disabled. This time gap is chosen so that FIFO
full condition can be generated. (See Figure (11)). When ptr_diff becomes equal to
(fifo_depth-1) f_full_flag goes high and w_ptr stops counting. Further data on the d_in bus will
overwrite on the last location of the FIFO. This is unavoidable since there is no control over the
d_in bus. When FIFO is half filled (i.e. fifo_depth/2) f_half_full_flag is asserted and in next
w_clk cycle it goes to normal state. Similarly when FIFO reaches almost full condition
f_almost_full_flag is asserted. Thus all status flags are activated within zero clock delay. (See
Figure (13))
Signal r_en enabled to start read operation. At this time w_en is disabled so that empty
condition can be generated. R_en is detected in coming positive edge of clock cycle and hence
data is read with zero r_clk cycle delay. (See Figure (12). When ptr_diff becomes zero
f_empty_flag is asserted. r_ptr stops incrementing. But for every r_clk data is read from the
last location and put on to the d_out bus.
Figure (12) Simulation waveform 2

When both r_en and w_en are enabled read clock domain has to wait till data has
been written to FIFO. Thus empty flag goes low for the positive edge of w_clk. At the next
positive edge of the r_clk data has been read out and put in d_out bus. Since r_ptr has
incremented ptr_diff becomes zero and f_empty_flag goes high again. This status remains till
next positive edge of w_clk. Observe the asynchronous read and write operation in the Figure
(13). F_empty_flag and r_next_en signals are compliment to each other. As soon as data is
written to the FIFO r_next_en signal is enabled. Read address is incremented and both read
and write pointers become equal. This makes the ptr_diff zero and once again f_empty_flag is
asserted. Thus there is no pessimistic reporting of assertion or removal of FIFO status flags.
Thus overall performance of the designed FIFO resembles the performance of the FIFO
IP core provided by the Xilinx. Algorithm and methodology used are entirely different in both
designs. IP core uses acknowledgement signal for the confirmation of read and write
operation. But proposed design does not have any such mechanism. It is assumed that data
sending and receiving hardware takes care of the data once the FIFO full and empty condition
are asserted.
Figure (13) Simulation waveform 3

Asynchronous FIFO: Synthesis using Xilinx ISE and Spartan 3

Synthesis of the design based on two different optimization goal makes difference in usage of
logic cells and maximum operating frequency of the design. Maintaining ‘speed’ as
optimization goal, maximum achievable frequency is 113.830MHz.
----------------------------------------------
Timing Summary:
Speed Grade: -5
Minimum period: 8.785ns (Maximum Frequency: 113.830MHz)
Minimum input arrival time before clock: 4.692ns
Maximum output required time after clock: 12.049ns
Maximum combinational path delay: No path found
---------------------------------------------------------

Maintaining ‘area’ as optimization goal, maximum achievable frequency is 90.212MHz.


--------------------------------------------------------
Timing Summary:
Speed Grade: -5
Minimum period: 11.085ns (Maximum Frequency: 90.212MHz)
Minimum input arrival time before clock: 4.574ns
Maximum output required time after clock: 13.375ns
Maximum combinational path delay: No path found
---------------------------------------------------------------------

The difference in the operating frequency can be attributed to the delay in the adder-
subtractor circuit. Dual port distributed RAM is used for memory. Output data d_out is
registered (RTL schematic is shown in Figure (14)) which is one of the advantages of this
design.
Figure (14) Registered output

The part of the synthesis report generated by Xilinx ISE, shown below infers the hardware
which remains same for both optimization goals.

----------------------------------------------------------------------
Synthesizing Unit .
Related source file is a_fifo5.v.
Found 16x8-bit dual-port distributed RAM for signal .
-----------------------------------------------------------------------
| aspect ratio | 16-word x 8-bit | |
| clock | connected to signal | rise |
| write enable | connected to internal node | high |
| address | connected to signal | |
| dual address | connected to signal | |
| data in | connected to signal | |
| data out | not connected | |
| dual data out | connected to internal node | |
| ram_style | Auto | |
-----------------------------------------------------------------------
INFO:Xst:1442 - HDL ADVISOR - The RAM contents appears to be read
asynchronously. A synchronous read would allow you to take advantage of available
block RAM resources, for optimized device usage and improved timings. Please refer
to your documentation for coding guidelines.
Found 8-bit register for signal .
Found 4-bit addsub for signal <$n0003>.
Found 4-bit comparator greater for signal <$n0007> created at line 60.
Found 4-bit comparator less for signal <$n0008> created at line 62.
Found 4-bit adder for signal <$n0009> created at line 64.
Found 4 1-bit 2-to-1 multiplexers.
Summary:
inferred 1 RAM(s).
inferred 8 D-type flip-flop(s).
inferred 2 Adder/Subtracter(s).
inferred 2 Comparator(s).
inferred 4 Multiplexer(s).
Unit synthesized.
-----------------------------------------------------------------

But observation of the low level synthesis gives the device utilization summary. Device
utilization with ‘speed’ as optimization goal is as follows:

-----------------------------------------
Device utilization summary:
---------------------------
Selected Device: 3s200ft256-5
Number of Slices: 36 out of 1920 1%
Number of Slice Flip Flops: 20 out of 3840 0%
Number of 4 input LUTs: 50 out of 3840 1%
Number of bonded IOBs: 24 out of 173 13%
Number of GCLKs: 2 out of 8 25%
-------------------------------------------------------------

Device utilization with ‘area’ as optimization goal is as follows:

-----------------------------------------
Device utilization summary:
-----------------------------------------
Selected Device : 3s200ft256-5
Number of Slices: 34 out of 1920 1%
Number of Slice Flip Flops: 16 out of 3840 0%
Number of 4 input LUTs: 47 out of 3840 1%
Number of bonded IOBs: 24 out of 173 13%
Number of GCLKs: 2 out of 8 25%
------------------------------------------------------------

In the case of ‘area’, slices are reduced by 2, slice flip-flops are reduced by 4, 4 input LUTs are
reduced by 3. For the present design read clock is working at 50MHz. Hence operating speed
of the design is very important and optimization goal is maintained as ‘speed’. Other than the
package pin constraints, area constraints or timing constraints are not provided to the design
while synthesis. Since the design doesn’t consume much of the resource of the FPGA area is
not important factor. Package pin constraints are given so as to implement the design on
Spartan 3 development board. This is done for the fifo_top.v code which includes clock
generation code for the design.

Asynchronous FIFO Design


Asynchronous FIFOs are used as buffers between two asynchronous clock
domains to exchange data safely. Data is written into the FIFO from one clock domain
and it is read from another clock domain. This requires a memory architecture wherein
two ports of memory are available- one is for input (or write or push) operation and
another is for output (or read or pop) operation. Generally FIFOs are used where write
operation is faster than read operation. However, even with the different speed and access
types the average rate of data transfer remains constant. FIFO pointers keep track of
number of FIFO memory locations read and written and corresponding control logic
circuit prevents FIFO from either under flowing or overflowing. FIFO architectures
inherently have a challenge of synchronizing itself with the pointer logic of other clock
domain and control the read and write operation of FIFO memory locations safely. A
detailed and careful analysis of synchronizer circuit along with pointer logic is required
to understand the synchronization of two FIFO pointer logic circuits which is responsible
for accessing the FIFO read and write ports independently controlled by different clocks.

Why Synchronization?

It is very important to understand the signal stability in multi clock domains since
for a traveling signal the new clock domain appears to be asynchronous. If the signal is
not synchronized to new clock, the first storage element of the new clock domain may go
to metastable state and the worst case is that resolution time can’t be predicted. It can
traverse throughout the new clock domain resulting in failure of functionality. To prevent
such failures setup time and hold time specification has to be obeyed in the design.
Manufacturers provide statistics of probability of failure of flip-flops due to metastability
characters in terms of MTBF (Mean Time Before Failure). Synchronizers are used to
prevent the downstream logic from entering into the metastable state in multiclock
domain with multibit data values.

Issues in Designing Asynchronous FIFO

It has been mentioned that designing of FIFO pointers for efficient working of
FIFO is the key issue while designing FIFO architecture. Let us go deep into the FIFO
read and write pointers. On reset both read and write pointers are pointing to the starting
location of the FIFO. This location is also the first location where data has to be written at
the same time this first location happens to be first read location. Therefore, in general we
can say, read pointer always points to the word to be read and write pointer always points
to the next location to which data has to be written.

Now let us examine data write operation. When both read and write pointers are
pointing to first location of FIFO empty flag is asserted indicating the FIFO status as
empty. Now data writing can be performed. Data will be written to the location where the
write pointer is pointing and after the data write operation write pointer gets incremented
pointing to the next location to be written. At the same time, empty flag is deasserted
which indicates that FIFO is not empty, some data is available. One notable point
regarding read pointer is with empty flag active the data pointed out by the read pointer is
always invalid data. When first data written and empty flag status cleared (i.e. empty flag
inactive) read pointer logic immediately drives the data from the location to which it was
pointing to the read port of the dual port RAM, ready to be read by read logic. With this
implementation of read logic the biggest advantage is that only one clock pulse is
required to read from read port since previous clock cycle has already incremented read
pointer and drives the data to read port. This will help in reducing latency in detecting
empty and full pointer flag status. Empty status flag can be asserted in one more
condition. After some n number of data write operations if same n number of read is
performed then both pointers are again equal. Hence if both pointers “catch up” each
other then empty flag is asserted.
Now let us examine about FIFO full status. When write pointer reaches the top of
the FIFO, it is pointing towards the location, which can be written and is the last location
to be written. No read operation is performed yet and read pointer is pointing to first
location itself. This is one method is to generate FIFO full condition. When write pointer
reaches the top of the FIFO, if full flag is asserted then it is not the actual FIFO full
condition, this is only ‘almost full’ as there is one location which can be written.
Similarly almost empty condition can exist in FIFO. Now a write operation causes the
location to be written and increment of write pointer. Since the location was the last one
write pointer wraps up to first location. Now both read and write pointers are equal and
hence empty flag is asserted instead of full flag assertion, which is a fatal mistake. Hence
wrap around condition of a full pointer may be a FIFO full condition.
After writing the data to FIFO (consider write pointer is in top of FIFO) some data has
been read and read pointer is somewhere in between FIFO. One more write operation
causes the write pointer to wrap. Note that even though write pointer is pointing to first
location of FIFO this is NOT FIFO full condition, since read pointer has moved up from
the first location. Further data writing pushes write pointer up. Imagine read pointer
wraps around after some more read operation. Present condition is that both pointers have
wrapped around but there is no FIFO full or FIFO empty condition. Data can be written
to FIFO or read from the FIFO. This is being the situation how to identify and generate
full and empty condition? How to synchronize and compare these two pointers to
generate full and empty status? While synchronizing how to avoid possible metastable
state and ‘pessimistic reporting’ (i.e. harmless wrong report; will be discussed later)?
These are some key issues in designing an asynchronous FIFO

Verilog code for asynchronous FIFO


Verilog code for asynchronous FIFO is given below. The module “a_fifo5” should be
used for Modelsim (or any other HDL simulator) simulation. The module “fifo_top” is
used to synthesize the design in Spartan 3 board. Before invoking this module in ISE you
should add Digital Clock Manager (DCM) code to your project. This can be done using
Xilinx ISE wizard.
//==================================================
//a_fifo5.v; verilog code for asynchronous FIFO
//This module describes FIFO
//===================================================
module a_fifo5(d_out,f_full_flag,f_half_full_flag,f_empty_flag,
f_almost_full_flag,f_almost_empty_flag,d_in,r_en,w_en,r_clk,w_clk,reset);
parameter f_width=8; //FIFO width
parameter f_depth=16; //FIFO depth
parameter f_ptr_width=4; //because depth =16;
parameter f_half_full_value=8;
parameter f_almost_full_value=14;
parameter f_almost_empty_value=2;
output [f_width-1:0] d_out; reg [f_width-1:0] d_out; //outputs
output
f_full_flag,f_half_full_flag,f_almost_full_flag,f_empty_flag,f_almost_empty_flag;
input [f_width-1:0] d_in;
input r_en,w_en,r_clk,w_clk;
input reset;
//internal registers,wires
wire [f_ptr_width-1:0] r_ptr,w_ptr;
reg r_next_en,w_next_en;
reg [f_ptr_width-1:0] ptr_diff;
reg [f_width-1:0] f_memory[f_depth-1:0];
assign f_full_flag=(ptr_diff==(f_depth-1)); //assign FIFO status
assign f_empty_flag=(ptr_diff==0);
assign f_half_full_flag=(ptr_diff==f_half_full_value);
assign f_almost_full_flag=(ptr_diff==f_almost_full_value);
assign f_almost_empty_flag=(ptr_diff==f_almost_empty_value);
//---------------------------------------------------------
always @(posedge w_clk) //write to memory
begin
if(w_en) begin
if(!f_full_flag)
f_memory[w_ptr]<=d_in; end
end
//---------------------------------------------------------
always @(posedge r_clk) //read from memory
begin
if(reset)
d_out<=0; //f_memory[r_ptr];
else if(r_en) begin
if(!f_empty_flag)
d_out<=f_memory[r_ptr]; end
else d_out<=0;
end
//---------------------------------------------------------
always @(*) //ptr_diff changes as read or write clock change
begin
if(w_ptr>r_ptr)
ptr_diff<=w_ptr-r_ptr;
else if(w_ptr
begin
ptr_diff<=((f_depth-r_ptr)+w_ptr);
end
else ptr_diff<=0;
end
//---------------------------------------------------------
always @(*) //after empty flag activated fifo read counter should not increment;
begin if(r_en && (!f_empty_flag))
r_next_en=1;
else r_next_en=0;
end
//--------------------------------------------------------
always @(*) //after full flag activated fifo write counter should not increment;
begin if(w_en && (!f_full_flag))
w_next_en=1;
else w_next_en=0;
end
//---------------------------------------------------------
b_counter //instantiate address counters
r_b_counter(.c_out(r_ptr),.c_reset(reset),.c_clk(r_clk),.en(r_next_en));
b_counter w_b_counter(.c_out(w_ptr),.c_reset(reset),.c_clk(w_clk),.en(w_next_en));
endmodule
//==============================================================
//b_counter.v; 4 bit asynchronous binary up counter
//==============================================================
module b_counter(c_out,c_reset,c_clk,en);
parameter c_width=4; //counter width
output [c_width-1:0] c_out; reg [c_width-1:0] c_out;
input c_reset,c_clk,en;
always @(posedge c_clk or posedge c_reset)
if (c_reset)
c_out <= 0;
else if(en)
c_out <= c_out + 1;
endmodule
//===========================================================

//===========================================================
//fifo_top.v; top level verilog code of FIFO
//To be used with Xilinx ISE-simulation and synthesis
//For functional simulation this module is not necessary
//============================================================
module fifo_top(x,y,z,d_out,f_full_flag,f_half_full_flag,f_empty_flag,
f_almost_full_flag,f_almost_empty_flag,d_in,r_en,w_en,CLKIN_IN,RST_IN,reset);
parameter f_width=8;
parameter f_depth=16;
parameter f_ptr_width=4;
parameter f_half_full_value=8;
parameter f_almost_full_value=14;
parameter f_almost_empty_value=2;
output [f_width-1:0] d_out; //reg [f_width-1:0] d_out; //outputs
output
f_full_flag,f_half_full_flag,f_almost_full_flag,f_empty_flag,f_almost_empty_flag;
output x,y,z;
input [f_width-1:0] d_in;
input r_en,w_en,CLKIN_IN,RST_IN;
input reset;
a_fifo5 a_fifo55(d_out,f_full_flag,f_half_full_flag,f_empty_flag,
f_almost_full_flag,f_almost_empty_flag,d_in,r_en,w_en,CLK0_OUT,CLKDV_OUT,res
et); //instantiate fifo
dcm_fifo
dcm_fifo1(CLKIN_IN,RST_IN,CLKDV_OUT,CLKFX_OUT,CLKIN_IBUFG_OUT,CL
K0_OUT, LOCKED_OUT); //instantiate DCM
assign x=CLKIN_IBUFG_OUT; //simply to avoid error
assign y=LOCKED_OUT;
assign z=CLKFX_OUT;
endmodule

Verilog Test Bench for Asynchronous FIFO


Here is the verilog test bench for the asynchronous FIFO code already published.
Simulation results of the asynchronous FIFO will be discussed in coming articles.
//===========================================================
//ta_fifo5.v; test bench for the module a_fifo5
//===========================================================
module ta_fifo5();

parameter f_width=8;
parameter f_depth=16;

wire [f_width-1:0] d_out;


wire f_full_flag,f_half_full_flag,f_almost_full_flag,f_empty_flag,f_almost_empty_flag;
reg [f_width-1:0] d_in;
reg r_en,w_en,r_clk,w_clk;
reg reset;
wire [3:0] r_ptr,w_ptr,ptr_diff;

assign r_ptr=ta_fifo5.r_ptr;
assign w_ptr=ta_fifo5.w_ptr;
assign ptr_diff=ta_fifo5.ptr_diff;
assign r_next_en=ta_fifo5.r_next_en;
assign w_next_en=ta_fifo5.w_next_en;

a_fifo5 ta_fifo5(d_out,f_full_flag,f_half_full_flag,f_empty_flag,
f_almost_full_flag,f_almost_empty_flag,d_in,r_en,w_en,r_clk,w_clk,reset);

initial #5000 $stop;


initial begin #10 r_clk=0; forever #10 r_clk=~r_clk; end
initial begin #5 w_clk=0; forever #50 w_clk=~w_clk; end
//initial begin #10 r_clk=0; forever #50 r_clk=~r_clk; end //test for synchronous
operation
//initial begin #10 w_clk=0; forever #50 w_clk=~w_clk; end //test for synchronous
operation

initial begin d_in=1;


@(posedge w_en);
repeat(20) @(posedge w_clk) d_in=d_in+2;
repeat(20) @(posedge w_clk) d_in=d_in-1;
end

initial begin reset=1;#30 reset=0;end


initial begin fork #50 w_en=1; #1800 w_en=0; #2500 w_en=1 ; join end
initial begin fork #50 r_en=0; #1850 r_en=1; #2400 r_en=0; #2500 r_en=1; join end
endmodule
//======================================================
// DO file for Modelsim simulator
//This may not applicable for other simulator
//======================================================
vlog a_fifo5.v
vsim a_fifo5
add wave *
force -freeze sim:/a_fifo5/reset 1 0
run
force -freeze sim:/a_fifo5/d_in 00000001 0
force -freeze sim:/a_fifo5/r_en 0 0
force -freeze sim:/a_fifo5/w_en 1 0
force -freeze sim:/a_fifo5/reset 0 0
run
force -freeze sim:/a_fifo5/r_clk 1 0, 0 {10 ns} -r 20
force -freeze sim:/a_fifo5/w_clk 1 0, 0 {50 ns} -r 100
run
force -freeze sim:/a_fifo5/d_in 00000011 0
run
force -freeze sim:/a_fifo5/d_in 00000111 0
run
force -freeze sim:/a_fifo5/d_in 00001111 0
run
force -freeze sim:/a_fifo5/d_in 00011111 0
run
force -freeze sim:/a_fifo5/d_in 00111111 0
run
force -freeze sim:/a_fifo5/d_in 01111111 0
run
force -freeze sim:/a_fifo5/d_in 11111111 0
run
force -freeze sim:/a_fifo5/d_in 11111110 0
run
force -freeze sim:/a_fifo5/d_in 11111100 0
run
force -freeze sim:/a_fifo5/d_in 11111000 0
run
force -freeze sim:/a_fifo5/d_in 11110000 0
run
force -freeze sim:/a_fifo5/d_in 11100000 0
run
force -freeze sim:/a_fifo5/d_in 11000000 0
run
force -freeze sim:/a_fifo5/d_in 10000000 0
run
force -freeze sim:/a_fifo5/d_in 00000000 0
run
force -freeze sim:/a_fifo5/d_in 00000001 0
run
force -freeze sim:/a_fifo5/d_in 00000010 0
run
force -freeze sim:/a_fifo5/w_en 0 0
force -freeze sim:/a_fifo5/r_en 1 0
force -freeze sim:/a_fifo5/d_in 00000011 0
run
force -freeze sim:/a_fifo5/d_in 00000100 0
run
force -freeze sim:/a_fifo5/d_in 00000101 0
run
force -freeze sim:/a_fifo5/d_in 00000110 0
run
force -freeze sim:/a_fifo5/d_in 00000111 0
run
force -freeze sim:/a_fifo5/d_in 8'd8 0
run
force -freeze sim:/a_fifo5/w_en 1 0
force -freeze sim:/a_fifo5/d_in 00001001 0
run
force -freeze sim:/a_fifo5/d_in 00001010 0
run
force -freeze sim:/a_fifo5/d_in 00001011 0
run
force -freeze sim:/a_fifo5/d_in 00001100 0
run
force -freeze sim:/a_fifo5/d_in 00001101 0
run
run
//============================================

Asynchronous FIFO: Simulation using Modelsim


Test bench strategy is to generate all corner conditions like full and empty.
Simulation waveforms are shown in Figure (11) to Figure (13). These waveforms are
generated using test bench program provided in previous article. Read clock frequency 50
MHz and write clock frequency 10 MHz are generated using initial procedural
statements:
initial begin #10 r_clk=0; forever #10 r_clk=~r_clk; end
initial begin #5 w_clk=0; forever #50 w_clk=~w_clk; end
Duty cycle of r_clk is 10 nS and w_clk is 50 nS.
Other conditions like w_en, r_en etc are generated using below set of statements:
initial begin d_in=1;
@(posedge w_en);
repeat(20) @(posedge w_clk) d_in=d_in+2;
repeat(20) @(posedge w_clk) d_in=d_in-1;
end
initial begin reset=1;#30 reset=0;end
initial begin fork #50 w_en=1; #1800 w_en=0; #2500 w_en=1 ; join end
initial begin fork #50 r_en=0; #1850 r_en=1; #2400 r_en=0; #2500 r_en=1; join end
In the above two statements ‘fork’ and ‘join’ are used to start the simulation at
zero simulation time. Reset signal is activated for first 30 nS and then it is deactivated.
For the first 5 nS reset is active which tests asynchronous reset condition
For asynchronous reset condition except d_out all other variables are initialized to
default states, including f_empty_flag. r_ptr, w_ptr, ptr_diff are initialized to zero. All
FIFO status flags are initialized to default values. Since reset signal is connected to
binary counters, reset of counters causes reset of both read and write pointers, pointer
difference and all status flags. This can be observed in the simulated waveform shown in
Figure (11). Verilog code has to be improved for the complete asynchronous reset
including reset of d_out. When I tried to add asynchronous reset to RTL code of dual port
RAM, synthesizer does not infer dual port RAM. Instead, it infers set of registers. This
problem has to be sorted out.
Figure (11) Simulation waveform 1

After 1800nS amount of time w_en is disabled. This time gap is chosen so that
FIFO full condition can be generated. (See Figure (11)). When ptr_diff becomes equal to
(fifo_depth-1) f_full_flag goes high and w_ptr stops counting. Further data on the d_in
bus will overwrite on the last location of the FIFO. This is unavoidable since there is no
control over the d_in bus. When FIFO is half filled (i.e. fifo_depth/2) f_half_full_flag is
asserted and in next w_clk cycle it goes to normal state. Similarly when FIFO reaches
almost full condition f_almost_full_flag is asserted. Thus all status flags are activated
within zero clock delay. (See Figure (13))
Signal r_en enabled to start read operation. At this time w_en is disabled so that empty
condition can be generated. R_en is detected in coming positive edge of clock cycle and
hence data is read with zero r_clk cycle delay. (See Figure (12). When ptr_diff becomes
zero f_empty_flag is asserted. r_ptr stops incrementing. But for every r_clk data is read
from the last location and put on to the d_out bus.
Figure (12) Simulation waveform 2

When both r_en and w_en are enabled read clock domain has to wait till data has
been written to FIFO. Thus empty flag goes low for the positive edge of w_clk. At the
next positive edge of the r_clk data has been read out and put in d_out bus. Since r_ptr
has incremented ptr_diff becomes zero and f_empty_flag goes high again. This status
remains till next positive edge of w_clk. Observe the asynchronous read and write
operation in the Figure (13). F_empty_flag and r_next_en signals are compliment to each
other. As soon as data is written to the FIFO r_next_en signal is enabled. Read address is
incremented and both read and write pointers become equal. This makes the ptr_diff zero
and once again f_empty_flag is asserted. Thus there is no pessimistic reporting of
assertion or removal of FIFO status flags.

Thus overall performance of the designed FIFO resembles the performance of the
FIFO IP core provided by the Xilinx. Algorithm and methodology used are entirely
different in both designs. IP core uses acknowledgement signal for the confirmation of
read and write operation. But proposed design does not have any such mechanism. It is
assumed that data sending and receiving hardware takes care of the data once the FIFO
full and empty condition are asserted.
Figure (13) Simulation waveform 3

Asynchronous FIFO: Synthesis using Xilinx ISE and Spartan 3

Synthesis of the design based on two different optimization goal makes difference in usage
of logic cells and maximum operating frequency of the design. Maintaining ‘speed’ as
optimization goal, maximum achievable frequency is 113.830MHz.
----------------------------------------------
Timing Summary:
Speed Grade: -5
Minimum period: 8.785ns (Maximum Frequency: 113.830MHz)
Minimum input arrival time before clock: 4.692ns
Maximum output required time after clock: 12.049ns
Maximum combinational path delay: No path found
---------------------------------------------------------

Maintaining ‘area’ as optimization goal, maximum achievable frequency is 90.212MHz.

--------------------------------------------------------
Timing Summary:
Speed Grade: -5
Minimum period: 11.085ns (Maximum Frequency: 90.212MHz)
Minimum input arrival time before clock: 4.574ns
Maximum output required time after clock: 13.375ns
Maximum combinational path delay: No path found
---------------------------------------------------------------------

The difference in the operating frequency can be attributed to the delay in the adder-
subtractor circuit. Dual port distributed RAM is used for memory. Output data d_out is
registered (RTL schematic is shown in Figure (14)) which is one of the advantages of
this design.

Figure (14) Registered output

The part of the synthesis report generated by Xilinx ISE, shown below infers the
hardware which remains same for both optimization goals.

----------------------------------------------------------------------
Synthesizing Unit .
Related source file is a_fifo5.v.
Found 16x8-bit dual-port distributed RAM for signal .
-----------------------------------------------------------------------
| aspect ratio | 16-word x 8-bit | |
| clock | connected to signal | rise |
| write enable | connected to internal node | high |
| address | connected to signal | |
| dual address | connected to signal | |
| data in | connected to signal | |
| data out | not connected | |
| dual data out | connected to internal node | |
| ram_style | Auto | |
-----------------------------------------------------------------------
INFO:Xst:1442 - HDL ADVISOR - The RAM contents appears to be read
asynchronously. A synchronous read would allow you to take advantage of available
block RAM resources, for optimized device usage and improved timings. Please
refer to your documentation for coding guidelines.
Found 8-bit register for signal .
Found 4-bit addsub for signal <$n0003>.
Found 4-bit comparator greater for signal <$n0007> created at line 60.
Found 4-bit comparator less for signal <$n0008> created at line 62.
Found 4-bit adder for signal <$n0009> created at line 64.
Found 4 1-bit 2-to-1 multiplexers.
Summary:
inferred 1 RAM(s).
inferred 8 D-type flip-flop(s).
inferred 2 Adder/Subtracter(s).
inferred 2 Comparator(s).
inferred 4 Multiplexer(s).
Unit synthesized.
-----------------------------------------------------------------

But observation of the low level synthesis gives the device utilization summary. Device
utilization with ‘speed’ as optimization goal is as follows:

-----------------------------------------
Device utilization summary:
---------------------------
Selected Device: 3s200ft256-5
Number of Slices: 36 out of 1920 1%
Number of Slice Flip Flops: 20 out of 3840 0%
Number of 4 input LUTs: 50 out of 3840 1%
Number of bonded IOBs: 24 out of 173 13%
Number of GCLKs: 2 out of 8 25%
-------------------------------------------------------------

Device utilization with ‘area’ as optimization goal is as follows:

-----------------------------------------
Device utilization summary:
-----------------------------------------
Selected Device : 3s200ft256-5
Number of Slices: 34 out of 1920 1%
Number of Slice Flip Flops: 16 out of 3840 0%
Number of 4 input LUTs: 47 out of 3840 1%
Number of bonded IOBs: 24 out of 173 13%
Number of GCLKs: 2 out of 8 25%
------------------------------------------------------------

In the case of ‘area’, slices are reduced by 2, slice flip-flops are reduced by 4, 4 input LUTs
are reduced by 3. For the present design read clock is working at 50MHz. Hence operating
speed of the design is very important and optimization goal is maintained as ‘speed’. Other
than the package pin constraints, area constraints or timing constraints are not provided to
the design while synthesis. Since the design doesn’t consume much of the resource of the
FPGA area is not important factor. Package pin constraints are given so as to implement the
design on Spartan 3 development board. This is done for the fifo_top.v code which includes
clock generation code for the design.

Você também pode gostar