Cell

Cell
Cell
-
-
based IC Design,
based IC Design,
Implementation and Verification
Implementation and Verification
( (cmhuang@cic.org.tw cmhuang@cic.org.tw) )

Jun. 2004 Jun. 2004
C. M. Huang / CIC-CBDC / 07.2004 2
Day
Day
-
-
1
1
Design Flow Overview
Verilog at a Glance
RTL Simulation
Lab: Using Verilog-XL
Code Coverage Analysis
RTL Synthesis
Logic Synthesis
Lab: Using VN Cover
Lab: Using HDL and Design Compiler
C. M. Huang / CIC-CBDC / 07.2004 3
Day
Day
-
-
2
2
Gate-level Delay Calculation
Static Timing Analysis
Power Analysis & Optimization
DFT & ATPG
Lab: Gate-level Simulation
Lab: Using Power Compiler
Lab: Using DFT Compiler
Automatic Physical Design
Physical Verification
Formal Equivalence Checking
Lab: Using Apollo
Lab: Using Conformal LEC
C. M. Huang / CIC-CBDC / 07.2004 Overview - 2
Algorithm, Architecture, and Chip
Algorithm, Architecture, and Chip
We are here...
How to Realize an Architecture?
How to Realize an Architecture?
General Design Process
Design
Specify and capture the ideal into
some formal representations
Implementation
Refine the design
through all phases
Verification
Verify the correctness
of design and
implementation
Ideal Ideal
Design Design
Verification Verification
Implementation Implementation

Verified Chip Layout Verified Chip Layout
Typical Design Considerations
Typical Design Considerations
Functionality
Area
Timing
Power
Testability
Reliability
Design
Design
Possible Design Domains
Possible Design Domains
if A = 0 then
Z = 1;
else
Z = 0;
Z A
A
Z
Behavioral Domain Structural Domain Physical Domain
Possible Design Levels
Possible Design Levels
Layout
Modules
Floorplan
Clusters
Physical partitions
Transfer functions
Logic
Register transfers
Algorithms
Systems
Transistors
Gates, FFs
ALUs, MUXs, REGs
Hardware modules
Processor, memory, switch
Behavioral
Domain
Physical
Domain
Structural
Domain
Revised from: Silicon Compilation
Register Transfer Level
Logic Level
Circuit Level
Algorithm Level
Architecture Level
Productivity v.s. Predictability
Productivity v.s. Predictability
TR TR Gate Gate RTL RTL Behavioral Behavioral
Productivity Productivity
Predictability Predictability
Current Practice: HDL
Current Practice: HDL
@
@
RTL
RTL
module TEST(CLK, A, B, C, E);
input CLK, A, B, C;
output E;
reg E, D;
always @(posedge CLK)
E <= D | C;
always @(negedge CLK)
D <= A & B;
endmodule
A
B
C
CLK
E
CIC Supported
CIC Supported
IEEE Std. 1364-1995, Verilog
IEEE Std. 1076, VHDL
Implementation
Implementation
Typical RTL Design Flow
Typical RTL Design Flow
Logic
Synthesis
Test
Synthesis
Physical
Synthesis
Netlist
HDL
Netlist
Layout
Full-custom
Design
Module
Generation
Cell
Library
Full
Full
-
-
custom Design
custom Design
Performance at transistor level
zUtilize layout editing tools
zVirtuoso (Cadence)
zLaker (SpringSoft)
Very expensive in design cost and design time
z10-20 gates per week
Used for:
zAnalog
zLeaf cells - libraries, memory cells
zDatapath in high performance designs
Module Generation
Module Generation
Parameterized generators of actual physical layout and
various models
Typical used for:
zMemories
zProgrammable Logic Arrays
zRegister Files
Occasionally used for:
zMultipliers
zGeneral-purpose datapath
zHigh performance datapath
CIC Supported
CIC Supported
Compiler
TSMC 0.35LG
(Avant!)
TSMC 0.25RF
(Artisan)
TSMC 0.18RF
(Artisan)
UMC 0.18RF
(VST)
Asynchronous RAM
Asynchronous Two-Port RAM
Asynchronous Dual-Port RAM
Synchronous ROM
Synchronous RAM
Synchronous Dual-Port RAM

Register File

Dual-Port Register File

Twol-Port Register File

Logic Synthesis
Logic Synthesis
Translate a HDL source code into netlist
zTechnology independent optimization (logic minimization)
zBind the netlist with user specified cell libraries (technology
mapping)
zTechnology dependent optimization
CIC supported synthesis tools:
zHDL/Design Compiler (Synopsys)
zAmbit BuildGates (Cadence)
Test Synthesis
Test Synthesis
CIC supported test synthesis tools:
z DFT Compiler (Synopsys)
z TetraMAX (Synopsys)
z TurboDFT (SynTest)
RAM C
RAM
DSP
CORE
ASIC
LOGIC
S/P
DMA
JTAG 1149.1
BIST
Partial
scan
Full Scan
Fault simulation
for asynchronous
interfaces
Physical Synthesis
Physical Synthesis
Transform gate-level netlist into a physical representation
zPlace circuit components
zRoute wires
zTransform into a mask
CIC supported P&R tools:
zSilicon Ensemble (Cadence)
zApollo (Synopsys)
zSOC Encounter (Cadence)
zBlast Fusion (Magma)
Standard Cell Library
Standard Cell Library
Two types of cell:
zCore cells
zI/O cells (pads)
Contains for each cell:
zFunctional information
zTiming information
zPower information
zLayout (abstract) information
Wire-load models
Function Tool
TSMC 0.35LG
(Avant!)
TSMC 0.25RF
(Artisan)
TSMC 0.18RF
(Artisan)
UMC 0.18RF
(VST)
Design Compiler (Synopsys)
Logic Synthesis
Ambit (Cadence)
DFT Compiler (Synopsys)
Test Synthesis
TurboScan (Syntest)
Verilog-XL (Cadence)
NC-Verilog (Cadence)
VCS (Synopsys)
ModelSim (Mentor Graphics)
Simulation
VITAL
Timing Analysis Prime Time (Synopsys)
Power Analysis Power Compiler (Synopsys)
Silicon Ensemble (Cadence)
P&R
Apollo (Avant!/Synposys)
Verification
Verification
Types of Verification
Types of Verification
Design Verification
zIs what I specified what I wanted?
Implementation Verification
zIs what I implemented what I specified?
Approaches to Design Verification
Approaches to Design Verification
Simulation
zApplication of simulation stimulus to model of circuit
Emulation
zImplement a version of the circuit on emulator
Rapid Prototyping
zCreate a prototype of actual hardware
Formal Verification
zModel checking - prove properties relative to model
zTheorem proving - prove properties of a circuit
Software Simulation
Software Simulation
Simulation
driver
(vectors)
Simulation
monitor
(yes/no)
Simulation
model
(HDL)
CIC supported Verilog simulator:
zVerilog-XL, NC-Verilog (Cadence)
zVCS (Synopsys)
zModelSim (Mentor Graphics)
CIC supported VHDL simulator:
zModelSim (Mentor Graphics)
zNC-VHDL (Cadence)
Rapid Prototyping
Rapid Prototyping
Debug Environment (LA)
Aptix System Explorer MP4
Aptix System Explorer
Development Software
Rapid Prototyping
Rapid Prototyping
Aptix MP4CF SOC Verification Platform
Agilent 16702B LA
Logic
Synthesis
Test
Synthesis
Physical
Synthesis
Netlist
HDL
Netlist
Layout
Cell
Library
Simulation
Simulation
Simulation
Simulation*
STA
STA
STA*
FEC
FEC
LVS
DRC\ERC
Dynamic Analysis Techniques
zSimulation (functionality, timing, power)
Static Analysis Techniques
zStatic Timing Analysis (STA)
zFormal Equivalence Checking (FEC)
Physical Verification
zDesign Rule Checking (DRC)
zElectrical Rule Checking (ERC)
zLayout Versus Schematic (LVS)
Advantages of gate-level simulation
z verifies timing and functionality simultaneously
z approach well understood by designers
Disadvantages of gate-level simulation
z computationally intensive - only 1 - 10 clock cycles of 100K gate design per 1 CPU second
z incomplete - results only as good as your vector set - easy to overlook incorrect
timing/behavior
Simulation
driver
(vectors)
Simulation
monitor
(yes/no)
and
speed
Software Simulation
Software Simulation
a
b
s
q
0
1
d
clk
HDL Debugging
HDL Debugging
CIC supported HDL Debugging Tool:
zDebussy (SpringSoft)
CIC supported HDL Coding Style Checking Tool:
znLint (SpringSoft)
zLEDA (Synopsys)
CIC supported Code Coverage Analysis Tool:
zVN Check (TransEDA)
clk
Combinational
logic
clk
Combinational
logic
clk
Combinational
logic
determine fastest permissible clock speed (e.g. 100MHz) by
determining delay (including set-up and hold time) of longest
path from register to register (e.g. 10ns.)
largely eliminates need for gate-level simulation to verify the
delay of the circuit
Static Analysis
Static Analysis
CIC supported Static Timing Analysis Tool:
zPrimeTime (Synopsys)
CIC supported Power Analysis Tool:
zPrimePower (Synopsys)
Given two single-output circuits A and B
Are A and B equivalent can be posed as: Is there a test for F
s-a-0?
If F s-a-0 is redundant, A B else test vector produces
different outputs for A and B.
x
2
B
x
4
x
1
x
3
A
x
s-a-0
FEC
FEC
CIC supported FEC Tool:
zLEC (Verplex)
zFormality (Synopsys)
Physical Layout Verification
Physical Layout Verification
CIC supported Layout Verification Tools:
zDracula (Cadence)
zDiva-Assura (Cadence)
zCalibre (Mentor Graphics)
zHercules (Synopsys)
CIC supported RC Extraction Tools:
zCalibre xRC (Mentor Graphics)
zStar-RCXT (Synopsys)
zFire & Ice (Cadence)
Shared Computing Server
Shared Computing Server
Sun Fire 6800
z20(4+16) UltraSPARC III CPU
z40 GB memory
z2.0 TB storage
Support for:
zParallel Extraction
zDistributed Synthesis
zDistributed Simulation
zParallel Simulation
zDistributed P&R
zDistributed Characterization
Design Service Automation
Design Service Automation
Verilog at a Glance
Verilog at a Glance
C. M. Huang / CIC-CBDC / 07.2004 Verilog - 2
What is Verilog HDL?
What is Verilog HDL?
The Verilog Hardware Description Language is designed for describing
a hardware design or part of a design.
The Verilog is both a behavioral and a structure language.
Verilog models can be developed for different levels of abstraction:
z Algorithmic: a model that implements a design algorithm in high-level language
constructs
z RTL (Register Transfer Level): a model that describes the flow of data between
registers and how a design processes that data
z Gate-level: a model that describes the logic gates and the connections between
logic gates in a design
z Switch-level: a model that describe the transistors and storage nodes in a device
and the connections between them
The Behavioral Description Model
The Behavioral Description Model
The Verilog HDL behavioral description model is structured and procedural like the C
programming language.
The behavioral description model constructs are for algorithmic and RTL models.
The behavioral description model provides the following capabilities:
z Structured procedures for sequential or current (parallel) execution (simulation)
z Explicit control of the time of procedure activation specified by both delay expressions and
by value changes called event expressions
z Explicit named events to trigger the enabling and disabling of actions in other procedures
z Procedure constructs for conditional, if-else, case, and looping operations
z Procedures called tasks that can have parameters and non-zero time duration
z Procedures called functions that allow the definition of new operators
z Arithmetic, logical, bit-wise, and reduction operators for expressions
The Structural Description Model
The Structural Description Model
The structural description model constructs are for gate-level and
switch-level models.
The structural description model provides the following capabilities:
z A complete set of combinational primitives
z Primitives for bi-directional pass and resistive devices
z The ability to model dynamic MOS models with charge sharing and charge decay
Past, Present, and Future
Verilog was developed at a time when designers were looking for tools
combine different levels of simulation.
In the early 1980s, there were switch-level simulators, gate-level
simulators, functional simulators and no simple means to combine
them.
Verilog was created by Phil Moore in 1983-4 at Gateway Design
Automation and the first simulator was written a year latter.
Verilog borrowed much from the existing languages:
z The currency aspects may be seen in both Modula and Simula
z The syntax is deliberately close to that of C
z The methods for combining different levels of abstraction owe much to Hilo
In 1989, Gateway Design Automation were acquired by Cadence
Design System.
In 1990, Cadence decided to open the language to the public, and thus
OVI (Open Verilog International) was formed.
In 1993, an IEEE working group was established under the Design
Automation Sub-Committee to produce the IEEE Verilog 1364.
In December 1995, the final draft of Verilog was approved and the
result is known as IEEE Std. 1364-1995.
In March 2000, the final draft of Verilog-2000 is completed, and the
final IEEE balloting process has started.
Expect Verilog-2000 to be ratified in Q3-2000, the official standard will
be IEEE Std. 1364-2000.
Three task forces:
z Behavioral Task Force: RTL and behavioral modeling enhancements
z ASIC Task Force: ASIC and FPGA library modeling enhancements
z PLI Task Force: PLI enhancements
The major enhancements of Verilog-2000 are:
z Higher level, abstract system level modeling
z Intellectual Property (IP) modeling capabilities
z Greater timing accuracy for very deep sub-micron
Basic Building Block:
Basic Building Block:
Modules
Modules
Modules are the basic building blocks in the design hierarchy.
Descriptions of circuit are placed inside modules.
Modules can represent:
zA physical block such as a discrete component or a standard cell
zA logic block such as the ALU portion of a CPU design
zThe complete system
Every module description starts with the keyword module,
followed by a name, and ends with the keyword endmodule.
Modules define a new scope (level of hierarchy) in Verilog.
Communication Interface:
Communication Interface:
Module Ports
Module Ports
Modules communicate with the outside world through ports.
Module ports equivalent to the pins in hardware.
Module ports are listed in parentheses after the module name.
Module port types can be: input, output or inout (bidirectional).
DFF
d
clk
clr_
q_
q
module DFF (d, clk, clr_, q, q_);
output q, q_;
input d, clk, clr_;
endmodule
module ...(...);
. . .
always @(. . .)
begin
. . .
end
always @(. . .)
begin
. . .
end
initial @(. . .)
begin
. . .
end
endmodule
What's Inside the Module?
What's Inside the Module?
module ...(...);
. . .
not (a, b);
and (c, d, e);
xor (x, y, z);
endmodule
module ...(...);
. . .
assign out = a & b;
assign q = ~p
endmodule
module ...(...);
. . .
and (a, b, c);
assign out = a & b;
always @(. . .)
begin
. . .
end
initial @(. . .)
begin
. . .
end
endmodule
Structural Descriptions
Behavioral Descriptions
Behavioral Descriptions Mixed Descriptions
Put Things in the Right Place
Put Things in the Right Place
module ...(...);
. . .
sum = a + b;
always @(. . .)
begin
. . .
end
initial @(. . .)
begin
. . .
end
endmodule
module ...(...);
. . .
always @(. . .)
begin
. . .
and (a, b, c);
. . .
end
endmodule
module ...(...);
. . .
while (x < y)
begin
. . .
end
always @(. . .)
begin
. . .
end
initial @(. . .)
begin
. . .
end
endmodule
4
4
-
-
Value Logic System
Value Logic System
The Verilog HDL value set consists of four logic values:
0 - represents a logic zero, or a false condition
1 - represents a one, or a true condition
x - represents an unknown logic value
z - represents a high-impedance state
When the z value is present at the input of a gate, or when it
encountered in an expression, the effect is usually the same as an x
value.
The "unknown logic value" is not the same as "don't care". It
represents a situation where the value of a node cannot be predicted.
In real hardware, this node will most be at either 1 or 0.
Integer Constant Numbers
Integer constants can be sized or unsized, unsized integers shall be at
least 32 bits width.
Integer constants can be represented in binary (b or B), octal (o or O),
decimal (d or D), or hexidecimal (h or H) format, the default radix is
decimal.
Three possible representations are:
<size>`<base> <value> 8`b1010_0010 = 8`hA2
`<base> <value> `H 83a
<value> 626
The "?" character is interpreted as "z" (high impedance)
4'b1?0? => 4'b1z0z
When <size> is less than <value>, the upper bits are truncated.
2'b1101 => 2'b01, 4'habcd => 4'ha
When <size> is greater than <value>, and the MSB of <value> is 0 or 1, zeros
are extended to <size> bits.
4'b01 => 4'b0001, 16'h0 => 16'h0000
4'b11 => 4'b0011, 16'h1 => 16'h0001
When <size> is greater than <value>, and the MSB of <value> is an x, x is
extended to <size> bits.
4'bx1 => 4'bxxx1, 16'hx => 16'hxxxx
When <size> is greater than <value>, and the MSB of <value> is an z, z is
extended to <size> bits.
4'bz1 => 4'bzzz1, 16'hz => 16'hzzzz
Data Types in Verilog
Data Types in Verilog
There are two main groups of data types in Verilog: the
register data types and the net data types.
These two groups differ in the way that they are assigned
and hold values.
They also represent different hardware structures.
Nets
Nets
The net data types shall represent physical connections
between structural entities, such as gates.
A net shall not store a value, instead, its value shall be
determined by the value of its drivers, such as a continuous
assignment, a module, or a gate.
If not driver is connected to a net, its value shall be high-
impedance z.
a
b
sel sel_
c
d
e out
Nets
Registers
Registers
A register is an abstraction of a data storage element.
A register shall store a value from one assignment to the
next.
An assignment statement in a procedure acts as a trigger
that changes the value in the data storage element.
The default initialization value for a register data type shall
be the unknown value x.
a
b
sel sel_
c
d
e out
a
sel
b
Registers
Types of Nets
Types of Nets
There are several distinct types of nets in Verilog:
Nets that are not declared explicitly default to single-bit nets of type
wire.
The default net type can be overridden by using the compiler directive:
z `default_nettype <nettype>
Net Type Functionality
wire, tri For standard interconnection wires (default)
wor, trior For multiple drivers that are Wire-ORed
wand, triand For multiple drivers that are Wire-ANDed
trireg For nets with capacitive storage
tri0, tri1 For nets that pull up or down when not driven
supply0, supply1 For power and ground nets
Declarations of Nets
Declarations of Nets
Explicit declarations
Implicit declarations
buf2
a b
buf2
a b
a b t
module buf2(a, b);
output b;
input a;
endmodule
module buf2(a, b);
output b;
input a;
wire a, b, t;
not (b, t);
not (t, a);
endmodule
module buf2(a, b);
output b;
input a;
not (b, t);
not (t, a);
endmodule
Types of Registers
Types of Registers
The register class consists of four data types:
A reg is often associated with hardware.
integer, real (realtime), and time are typically used for manipulations of quantities
that are not regarded as hardware.
Register Type Functionality
reg Unsigned integer variable of varying bit width.
integer Signed integer variable, 32-bits wide, arithmetic
operations produce 2's-complement results.
real, realtime Signed floating-point variable, double precision.
time Unsigned integer variable, at least 64-bits wide.
reg a; // a scalar register
reg [3:0] v; // a 4-bit vector register
integer a[1:64]; // an array of 64 integer values
time chng_hist[1:100]; // an array of 100 time values
real float; // a register to store real value
Vectors
Vectors
A net or reg declaration without a <range> specification is one bit wide; that is, it
is a scalar.
Multiple bit net or reg data types are declared by specifying a <range>, and are
known as vectors.
The range is specified as follows:
[ <msb_expr> : <lsb_expr> ]
Both <msb_expr> and <lsb_expr> are non-negative constant expressions.
<lsb_expr> can be greater value than <msb_expr>, if desired.
reg [7:0] rv1; // a 8-bit vector register
reg [0:7] rv2; // a 8-bit vector register, rv1 != rv2
wire [0:15] nv1; // a 16-bit vector net
wire [15:0] nv2; // a 16-bit vector net, nv1 != nv2
Relationships Between Module Ports and Data Types Relationships Between Module Ports and Data Types
An input port can be driven by a net or a register, but it can only drive a net.
An output port can be driven by a net or a register, but it can only drive a net.
An inout port can only be driven by a net, and it can only drive a net.
DUT
net
net /
register
net
net
net /
register
net
input output
inout
module top;
wire y;
reg a, b;
DUT U0 (y, a, b);
initial begin
a = 0; b = 0;
#5 a = 1;
end
endmodule
module DUT (Y, A, B);
output Y;
input A, B;
wire Y, A, B;
and (Y, A, B);
endmodule
Examples
Examples
module x1(a);
input a;
reg a;
endmodule
Error! Incompatible declaration, (a) defined as input
at line 2
"x1.v", 3:
module x2(a);
inout a;
reg a;
endmodule
Error! Incompatible declaration, (a) defined as inout
at line 2
"x2.v", 3:
x1
a
a
x2
a
a
Build
Build
-
-
in Logical Gate and Switch
in Logical Gate and Switch
Multi-input
Gates
Multi-output
Gates
Tri-state
Gates
Pull Gates
Unidirectional
Switches
Bidirectional
Switches
and
nand
or
nor
xor
xnor
buf
not
bufin0
bufif1
notif0
notif1
pulldown
pullup
cmos
nmos
pmos
rcmos
rnmos
rpmos
rtran
rtranif0
rtranif1
tran
tranif0
tranif1
Gate and Switch Declaration
Gate and Switch Declaration
A gate or switch instance declaration shall have the following
specifications:
zThe keyword that names the type of gate or switch primitive
zThe terminal connection list
zAn optional output drive strength
zAn optional propagation delay
zAn optional instance name
zAn optional range for array of instances
and (out, in1, in2);
and (strong1, weak0) #2.6 u[2:0] (out, in1, in2);
Specify the Number of I/O Port
Specify the Number of I/O Port
The number of pins for a primitive gate is defined by the
number of nets connected to it, not by the gate type.
The output and bidirectional terminals always come first in
the terminal list, followed by the input terminals.
and (out, in1, in2); // 2-input AND gate
or (out, in1, in2, in3); // 3-input OR gate
xnor(out, in1, in2, in3, in4);// 4-input XNOR gate
not (out1, in); // 1-output NOT gate
not (out1, out2, in); // 2-output NOT gate
buf (out1, out2, out3, in); // 3-output BUF gate
X
0
1
Z
i
0
1
X
X
o
buf
X
0
1
Z
i
1
0
X
X
o
not
Multiple output
Multiple input
Tri
Tri
-
-
state Gates
state Gates
Tri-state gates have only three pins: output, input, and enable.
When tri-state gates are disabled, their outputs are at high impedance.
X represents a complete unknown. Value can be logic 1, 0, or Z, and strength is unknown.
L represents a partial unknown, Value is logic 0, but strength is unknown, and can even be Z.
H represents a partial unknown, Value is logic 1, but strength is unknown, and can even be Z.
0 1
X
Z
0
1
X
Z
enable
d
a
t
a
Z 0 L L
Z 1 H H
Z X X X
Z X X X
0 1
X
Z
0
1
X
Z
enable
d
a
t
a
0 Z L L
1 Z H H
X Z X X
X Z X X
bufif1(out,data,enable)
out data
enable
out data out data
enable
out
enable
data
0 1
X
Z
0
1
X
Z
enable
d
a
t
a
Z 1 H H
Z 0 L L
Z X X X
Z X X X
0 1
X
Z
0
1
X
Z
enable
d
a
t
a
1 Z H H
0 Z L L
X Z X X
X Z X X
bufif0(out,data,enable) notif1(out,data,enable) notif0(out,data,enable)
enable
Specify the Propagation Delay
Specify the Propagation Delay
The gate delays specify the (ideal) signal propagation delay from any
gate input to the gate output.
Up to three values per output representing rise, fall, and turn-off delays
can be specified.
The default delay shall be zero when no delay specification is given.
For a three-delay specification:
z The first delay refers to the transition to the 1 value (rise delay)
z The second delay refers to the transition to the 0 value (fall delay)
z The third delay refers to the transition to the high-impedance value (turn-off delay).
and #(10) a1 (out, in1, in2);
and #(10,12) a2 (out, in1, in2);
bufif1 #(10,12,11) b3 (out, in, ctrl);
min/
min/
typ
typ
/max Delays
/max Delays
You can specify minimum, typical, and maximum values for
each delay.
nand #(1.0:1.2:1.5, 2.3:3.5:4.7) n1(out, in1, in2);
bufif1 #(5:7:9, 8:10:12, 15:18:21) (io1, io2, dir);
By default, Verilog simulators use typical values.
While simulating, you can use only one of these values,
specified by the following command line options:
+mindelays, +typdelays, +maxdelays
verilog source.v +maxdelays
Behavioral Modeling
Behavioral Modeling
Behavioral modeling enables you to describe the system at a high
level of abstraction. At this level of abstraction, implementation is not
as important as the overall functionality of the system.
High-level programming language constructs are available in Verilog
for behavioral modeling. These include wait, while, if then, case, and
forever.
Behavioral modeling in Verilog is described by specifying a set of
concurrently active procedural blocks in a high-level programming
language that together describe the operation of the system.
The Goal Is ...
The Goal Is ...
Describe how/when the nets/registers will be updated.
?
?
?
?
?
?
?
?
?
How to Update Nets / Registers?
How to Update Nets / Registers?
A = (B + C) >> 3;
Assignment statement
Expression
Operator
Net / Register
Procedural assignment
Continuous assignment
When to Update Registers?
When to Update Registers?
.....
reg [7:0] A;
always @(posedge clk)
begin
if (enable_ == 1b0)
for (i=0; i<=7; i=i+1)
#2.5 A[i] = A[i] & B[7-i];
end
.....
Procedural
block
Timing control
Procedural control
statement
Procedural assignment
Continuous Assignments
Continuous Assignments
Continuous assignments drive values onto nets, both for
vector and scalar.
What does "drive values onto nets" mean for hardware
circuitry? Can a logical value be "stored" into a net?
A continuous assignment decides the driver of a net.
Explicit declaration
wire w;
assign w = a ^ 1'b0;
Implicit declaration
wire w = a ^ 1'b0;
LHS
(left hand side) RHS
(right hand side)
Rules for LHS
Rules for LHS
The LHS of a continuous
assignment can be a
zscalar net
zvector net
zpart-select of a vector net
zconcatenation of nets
b[7:0]
b[0]
b[1]
b[2]
b[3]
b[4]
b[5]
b[6]
b[7]
wire [7:0] b = 8'hF0;
d[1:0]
d[0]
d[1]
wire [1:0] d, e;
assign {d,e} = 4'hA;
e[1:0]
e[0]
e[1]
{d,e}
c[7:0]
c[0]
c[1]
c[2]
c[3]
c[4]
c[5]
c[6]
c[7]
wire [7:0] c;
assign c[5:2] = 4'hC;
wire a = 1'b1;
tri b = 1'b0;
a
b
Rules for RHS
Rules for RHS
The RHS of a continuous
assignment can be a
zconstant value
zexpression (composed of
nets or registers or both)
which return a value
zfunction which return a value
wire [7:0] w;
wire [3:0] a, b;
wire [7:0] c;
reg [7:0] r;
assign w = ({a,b} & c) | r;
r
a
c
w
b
wire [7:0] w;
assign w = f(...);
f
w
Question
Question
a b
w b t
a b
w
How to describe the following two circuits?
Expressions
Expressions
An expression is a construct that combines operands with operators to
produce a result that is a function of the values of the operands and
the semantic meaning of the operator.
Any legal operand, such as a net bit-select, without any operator is
considered an expression.
An operand can be one of the following:
z constant number
z net, net bit-select, net part-select
z register, register bit-select, register part-select
z memory element
z a call to a user-defined function or system-defined function that returns any of the
above
A + B
A
A[5:2]
Operator Precedence
Operator Precedence
Type of Operators Symbols
Concatenate & replicate { } {{ }} Highest
Unary + - ! ~
Arithmetic * / %
+ -
Shift << >> <<< >>>
Relational > < >= <=
Equality == != === !==
Binary bit-wise & ~&
^ ^~ ~^
| ~|
Binary logical && ||
Conditional ? : Lowest
Operators for Real Operands
Operators for Real Operands
Type of Operators Symbols
Unary + - !
Arithmetic * /
+ -
Relational > < >= <=
Equality == !=
Binary logical && ||
Conditional ? :
Unary: -A ~A &A
Binary: A + B A & B A | B
Sizing and Signing
Sizing and Signing
Verilog automatically resizes values in an expression according to the sizes of
variables in the expression.
Verilog automatically truncates or extends the right-hand-side value in an assignment
to fit the left-hand-side variable.
Verilog automatically perform a 2's complement when a negative value is assigned to
an unsigned variable such as a reg.
module SignSize;
reg [3:0] a, b;
reg [15:0] c;
initial begin
a = -1; // a = 1111
b = 8; c = 8; // b = c = 1000
#10 b = b + a; // b = 10111 => 0111
#10 c = c + a; // c = 10111
end
endmodule
+
=
=
Arithmetic Operators
Arithmetic Operators
+ add
- subtract
* multiply
/ divide
% modulus
An assignment of a negative result to a
reg or other unsigned variable uses
the 2's complement.
If any bit of any operand is unknown or
high impedance, then the entire result
value is unknown x.
Integer division truncates any
fractional part.
A modulus operation always return the
sign of the first operand.
module arith;
parameter five = 5;
integer ans, int;
reg [3:0] rega, regb, num;
initial begin
rega = 3;
regb = 4'b1010;
int = -3;
end
initial begin #10
ans = five * int; // ans = -15
ans = (int + 5) / 2; // ans = 1
ans = five / int; // ans = -1
num = rega + regb; // ans = 1101
num = rega + 1; // ans = 0100
num = int; // ans = 1101
num = regb % rega; // ans = 1
end
endmodule
Question
Question
What is the simulation result of the following code?
1 module t6;
2
3 reg [3:0] r1, r2;
4 integer i;
5
6 initial
7 begin
8 r1 = -1;
9 r2 = -2;
10 i = r1 + r2;
11 $display("i = %d\n", i);
12 end
13 endmodule
Bit
Bit
-
-
Wise Operators (Binary)
Wise Operators (Binary)
~ not
& and ~& nand
| or ~| nor
^ xor
~^ ^~ xnor
Bit-wise binary operators perform bit-
wise manipulations on two operands.
They compare each bit in one operand
with its corresponding bot in the other
operand to calculate each bit for the
result.
Unknown bits in an operand do not
necessarily lead to unknown bits in the
result.
module bitwise;
reg [3:0] rega, regb, regc, num;
initial
begin
rega = 4'b 1001;
regb = 4'b 1010;
regc = 4'b 11x0;
end
initial
begin
#10
num = rega & 0; // 0000
num = rega & regb; // 1000
num = rega | regb; // 1011
num = regb & regc; // 10x0
num = regb | regc; // 1110
end
endmodule
Logical Operators (Binary)
Logical Operators (Binary)
! not
&& and
|| or
Logical binary operators operate on
logic values. If an operand contains all
zeros, it is false (logic 0). If it contains
any ones, it is true (logic 1). If it is
unknown (contain only zeros and/or
unknown bits), its logic value is
ambiguous.
The result of a logical operation is
always 1'b0, 1'b1, or 1'bx.
module logical;
parameter five = 5;
reg ans;
reg [3:0] rega, regb, regc;
initial
begin
rega = 4'b 0011;
regb = 4'b 10xz;
regc = 4'b 0z0x;
end
initial
begin
#10
ans = rega && 0; // ans = 0
ans = rega || 0; // ans = 1
ans = rega && five; // ans = 1
ans = regb && rega; // ans = 1
ans = regc || 0; // ans = X
end
endmodule
Logical v.s. Bit
Logical v.s. Bit
-
-
Wise Negation
Wise Negation
! logical
~ bit-wise
The logical negation will return 1'b0,
1'b1, or 1'bx.
Bit-wise negation returns a value with
the same number of bits that are in the
operand.
module negation;
reg [3:0] rega, regb;
reg [3:0] bit;
reg log;
initial
begin
rega = 4'b 1011;
regb = 4'b 0000;
end
initial
begin
#10
bit = ~rega; // 0100
bit = ~regb; // 1111
log = !rega; // 0
log = !regb; // 1
end
endmodule
Reduction Operators (Unary)
Reduction Operators (Unary)
& and ~& nand
| or ~| nor
^ xor
~^ ^~ xnor
Unary reduction operators operate on
all bits of a single operand to produce
a single-bit result.
The result is always 1'b1, 1'b0, or 1'bx;
module reduction;
reg val;
reg [3:0] rega, regb;
initial
begin
rega = 4'b 0100;
regb = 4'b 1111;
end
initial begin #10
val = &rega; // 0
val = |rega; // 1
val = &regb; // 1
val = |regb; // 1
val = ^rega; // 1
val = ^regb; // 0
val = ~|rega; // 0
val = ~&rega; // 1
val = ^rega && &regb; // 1
end
endmodule
Shift Operators
Shift Operators
>> shift right
<< shift left
Shift operators perform left or right bit
shifts to the first operand.
The second operand is treated as
unsigned.
If the second operand has unknown or
high impedance bits, the result is
unknown.
In an assignment, if the result of the
RHS is:
z Of greater bit-width than that of the
LHS, its MSBs are truncated.
z Of smaller bit-width than that of the
LHS, it is zero-extended.
module shift;
reg [9:0] num;
reg [7:0] rega;
initial
rega = 8'b 0000_1100;
initial
begin
#10
num = rega << 5; // 01_1000_0000
num = rega >> 3; // 00_0000_0001
end
endmodule
Relational Operators
Relational Operators
> greater than
< less than
>= greater than or equal
<= less than or equal
The result is always 1'b1, 1'b0, or 1'bx.
Relational operators have lower
precedence than arithmetic operators.
a < size - 1
a < (size - 1)
size - (1 < a)
size - 1 < a
module relations;
reg [3:0] rega, regb, regc;
reg val;
initial
begin
rega = 4'b 0011;
regb = 4'b 1010;
regc = 4'b 0x10;
end
initial
begin
#10
val = regc > rega; // x
val = regb < rega; // 0
val = regb >= rega; // 1
val = regb > regc; // 1
end
endmodule
same expression
different expression
Equality Operators
Equality Operators
== is the equality (logical equality) operator, === is the identity (case equality) operator
0 1 x z 0 1 x z
0 1 0 x x 0 1 0 0 0
1 0 1 x x 1 0 1 0 0
x x x x x x 0 0 1 0
z x x x x z 0 0 0 1
The difference between the logical and case equalities is the handling of the X and Z
values.
With the logical equality operator, an X in either of the operands is logically unknown.
2'b0x == 2'b1x => 0 (false)
2'b1x == 2'b1x => x (unknown)
With the case equality operator, the result can still evaluate to true or false when X or Z
values are present in the operands.
2'b0x === 2'b1x => 0 (false)
2'b1x === 2'b1x => 1 (true)
Concatenation Operator
Concatenation Operator
{ } concatenation
Concatenation operator allows you to
select bits from different vectors and
join them into a new vector.
Used for bit reorganization and vector
construction.
You must use sized quantities in
concatenation. If you do not, an error
message will be displayed.
A[3:0] = {3'b011, 'b0}; // illegal
module concatenation;
reg [7:0] rega, regb, regc, regd;
reg [7:0] new;
initial
begin
rega = 8'b 0000_0011;
regb = 8'b 0000_0100;
regc = 8'b 0001_1000;
regd = 8'b 1110_0000;
end
initial
begin
#10
new = {regc[4:3], regd[7:5],
regb[2], rega[1:0]};
// new = 8'b 1111_1111
end
endmodule
Replication Operator
Replication Operator
{{ }} replication
Replication allows you to reproduce the
variable or sized value inside the inner
{ }.
Specify a positive integer number of
repetitions between the two leading ' { '
characters.
You must use sized quantities in
replication. If you do not, an error
message will be displayed.
A[7:0] = {4{'b10}}; // illegal
B[7:0] = {2{5}}; // illegal
module replicate;
reg [3:0] rega, regb, regc, bus;
initial begin
rega = 4'b 1001;
regb = 2'b 11;
regc = 2'b 00;
end
initial begin #10
bus = {4{regb}};
// bus = 1111_1111
bus = { {2{regb}}, {2{regc}} };
// bus = 1111_0000
bus = { 4{rega[3]}, rega };
// bus = 1111_1001
// sign-extension
end
endmodule
Conditional Operator
Conditional Operator
The syntax of the conditional operator is:
<LHS> = <condition> ? <true_expression> : <false_expression>
This can be read as: "if condition is TRUE, then LHS = true_expression,
else LHS = false_expression".
Each conditional operator must have all three RHS arguments. If one
is missing, an error message will be displayed.
If the condition is unknown, and the true_expression and
false_expression are not equal, the output is unknown.
Conditional Operator Examples
Conditional Operator Examples
module likebufif1(in, en, out);
input in, en;
output out;
assign out = (en == 1) ? in : 'bz;
endmodule
module mux41(a, b, c, d, sel, out);
input a, b, c, d,sel;
output out;
assign out = (sel == 2'b00) ? a :
(sel == 2'b01) ? b :
(sel == 2'b10) ? c :
d;
endmodule
out
en
in
out
sel
a
b
c
d
2
Structured Procedures
Structured Procedures
The initial and always constructs are enabled at the beginning of a simulation.
The initial construct shall execute only once and its activity shall cease when the
statement has finished.
The always construct shall execute repeatedly. Its activity shall cease only when the
simulation is terminated.
There shall be no implied order of execution between initial and always constructs.
The initial constructs need not be scheduled and executed before the always
constructs.
initial
c
c
c
c
c
c
c always
c
c
c
c
c
c
c
Timing controls
Procedure Block Statements
Procedure Block Statements
The procedure block statements are a means of grouping two or more
statements together so that they act syntactically like a single
statement.
There are two type of procedure blocks in the Verilog HDL:
z Sequential block, also called begin-end block
z Parallel block, (concurrent block) also called fork-join block
The sequential block shall be delimited by the keywords begin and
end. The procedure statements in a sequential block shall be executed
sequentially in the given order.
The parallel block shall be delimited by the keywords fork and join.
The procedure statements in a sequential block shall be executed
concurrently.
Sequential Blocks
Sequential Blocks
A sequential block shall have
the following characteristics:
z Statements shall be executed in
sequence, one after another
z Delay values for each statement
shall be treated relative to the
simulation time of the execution of
the previous statement
z Control shall pass out of the block
after the last statement executes
A sequential block enables the
following two assignments to have a
determinstic result:
begin
Areg = Breg;
Creg = Areg;
end
Timing control can be used in a
sequential block to separate the two
assignments in time:
begin
Areg = Breg;
@(posedge clock) Creg = Areg;
end
Parallel Blocks
Parallel Blocks
A parallel block shall have the
following characteristics:
z Statements shall be executed
concurrently
z Delay values for each statement
shall be considered relative to the
simulation time of entering the
block
z Delay control can be used to
provide time-ordering for
assignments
z Control shall pass out of the block
after the last time-ordered
statement executes
The following three codes describe the
same waveform by using sequential
and parallel blocks:
begin
#50 r = h35;
#50 r = hE2;
#50 r = h00;
#50 r = hF7;
end
fork
#50 r = h35;
#100 r = hE2;
#150 r = h00;
#200 r = hF7;
join
fork
#150 r = h00;
#200 r = hF7;
#50 r = h35;
#100 r = hE2;
join
Procedural Timing Controls
Procedural Timing Controls
In Verilog, actions are scheduled in the future through the
use of delay controls.
A general principle of the Verilog HDL is that where you do
not see a timing control, simulation time does not advance -
if you specify no timing delays, the simulation completes at
time zero.
The Verilog HDL provides two types of timing control:
zDelay control
zEvent expression
Delay Control
Delay Control
A procedure statement following the
delay control shall be delayed in its
execution with respect to the
procedural statement preceding the
delay control by the specified delay.
If the delay expression evaluates to an
unknown or high-impedance value, it
shall be interpreted as zero delay.
If the delay expression evaluates to a
negative value, it shall be interpreted
as a 2s complement unsigned integer
of the same size as a time variable.
The following example delays the
execution of the assignment by 10 time
units:
#10 rega = regb;
The next three examples provide an
expression following the number sign
(#):
#d rega = regb;
#((d+e)/2) rega = regb;
#regr regr = regr + 1;
Event Control
Event Control
If the expression evaluates to more than a 1-bit result, the
edge transition shall be detected on the least significant bit of
the result.
The change of value in any of the operands without a change
in the value of the least significant bit of the expression result
shall not be detected as an edge.
@r rega = regb;
@(posedge clock) rega = regb;
Level
Level
-
-
Sensitive Event Control
Sensitive Event Control
The execution of a procedure statement can also be delayed until a
condition becomes true. This is accomplished using the wait statement,
which is a special form of event control.
The nature of the wait statement is level-sensitive, as opposed to basic
event control (specified by the @ character), which is edge-sensitive.
The wait statement shall evaluate a condition, and if it is false, the
procedural statements following the wait statement shall remain
blocked until that condition becomes true before continuing.
begin
wait (!enable)
#10 out = a + b;
end
Intra
Intra
-
-
Assignment Timing Controls
The delay and event control constructs previously described precede a
statement and delay its execution. The intra-assignment delay and
event controls are contained within an assignment statement and
modify the flow of activity in a slightly different way.
Encountering an intra-assignment delay or event control delays the
assignment just as a regular delay or event control does, but the right-
hand side expression is evaluated before the delay, instead of after the
delay.
This allows data swap and data shift operations to be described
without the need for temporary variables.
#10 a = b + c;
a = #10 b + c;
1 2 3
3 1 2
Intra
Intra
-
-
The following table illustrates the philosophy of intra-assignment timing
controls by showing the code that could accomplish the same timing
effect without using intra-assignment.
=
Procedural Assignments
Procedural Assignments
Assignments made inside procedural blocks are called procedural
assignments.
Procedural assignments are for updating reg, integer, time, and
memory variables.
The right-hand side of a procedural assignment can be any expression
that evaluates to a value. However, part-selects on the right-hand side
must have constant indices. The left-hand side indicates the variable
that receives the assignment from the right-hand side.
The Verilog HDL contains two types of procedural assignment
statements:
z Blocking procedural assignment statements
z Non-blocking procedural assignment statements
Procedure Assignments
Procedure assignments update the value of register variables under the control of the
procedure flow constructs that surround them
Blocking procedure assignments
z <lvalue> = <timing_control> <expression>
Non-blocking procedure assignments
z <lvalue> <= <timing_control> <expression>
initial begin always @(posedge c) always @(posedge c)
begin begin begin
a=0; b=1; c=0; a = b; // 1 a <= b; // 1
end b = a; // 1 b <= a; // 0
end end
always c = #5 ~c; blocking non-blocking
Behavioral Control Statements
Behavioral Control Statements
Conditional Statements
zOne-way
zTwo-way
zMulti-way
Looping Statements
zforever loop
zrepeat loop
zwhile loop
zfor loop
The conditional statement (or if-else statement) is used to make a decision as to
whether a statement is executed or not.
Formally, the syntax is as follows:
<statement>
::= if ( <expression> ) <statement_or_null>
||= if ( <expression> ) <statement_or_null>
else <statement_or_null>
<statement_or_null>
::= <statement>
||= ;
The <expression> is evaluated; if it is true (that is, has a non-zero known value), the
first statement executes. If it is false (has a zero value or the value is x or z), the first
statement does not execute. If there is an else statement and <expression> is false,
the else statement executes.
There are two statements that you can
use to specify one or more actions to
be taken based on specified conditions:
if-else-if and case.
The sequence of if statements known
as an if-else-if construct is the most
general way of writing a multi-way
decision. The syntax is shown as
follows:
if (<expression>)
<statement>
else if (<expression>)
<statement>
else if (<expression>)
<statement>
else
<statement>
Multi
Multi
-
-
way Decision Statements
way Decision Statements
The expressions are evaluated in
order; if any expression is true, the
statement associated with it is
executed, and this terminates the
whole chain. Each statement is either
a single statement or a block of
statements.
The last else part of the if-else-if
construct handles the default case
where none of the other conditions
was satisfied. Sometimes there is no
explicit action for the default; in that
case the trailing else can be omitted or
it can be used for error checking to
catch an impossible condition.
T F
F
F
T
T
case Statements
case Statements
The case statement is a special multi-way
decision statement that tests whether an
expression matches one of several other
expressions, and branches accordingly.
For example, the case statement is useful
for describing the decoding of a
microprocessor instruction.
The syntax of the case statement is as
follows. The default statement is optional.
Use of multiple default statements in one
case statement is illegal syntax.
<statement>
::= case ( <expression> ) <case_item>+ endcase
||= casez ( <expression> ) <case_item>+ endcase
||= casex ( <expression> ) <case_item>+ endcase
<case_item>
::= <expression> <,<expression>>* : <statement_or_null>
||= default : <statement_or_null>
||= default <statement_or_null>
reg [15:0] rega;
reg [9:0] result;
...
case (rega)
16d0: result = 10b0111111111;
16d1: result = 10b1011111111;
16d2: result = 10b1101111111;
16d3: result = 10b1110111111;
16d4: result = 10b1111011111;
16d5: result = 10b1111101111;
16d6: result = 10b1111110111;
16d7: result = 10b1111111011;
16d8: result = 10b1111111101;
16d9: result = 10b1111111110;
default result = bx;
endcase
Example for
Example for
casez
casez
Statement
Statement
The following is an example of the casez statement. It demonstrates
an instruction decoder, where values of the most significant bits select
which task should be called. If the most significant bit of ir is a 1,
then the task instruction1 is called, regardless of the values of the
other bits of ir.
reg [7:0] ir;
...
casez (ir)
8b1???????: instruction1(ir);
8b01??????: instruction2(ir);
8b00010???: instruction3(ir);
8b000001??: instruction4(ir);
endcase
Example for
Example for
casex
casex
Statement
Statement
The following is an example of the casex statement. It demonstrates
an extreme case of how dont-care conditions can be dynamically
controlled during simulation. In this case, if r = 8b01100110, then the
task stat2 is called.
reg [7:0] r, mask;
...
mask = 8bx0x0x0x0;
casex (r ^ mask)
8b001100xx: stat1;
8b1100xx00: stat2;
8b00xx0011: stat3;
8bxx001100: stat4;
endcase
01100110
^ x0x0x0x0
--------
x1x0x1x0 x1x0x1x0 x1x0x1x0
1100xx00 xx001100 001100xx
-------- -------- --------
-1-1---1 ---1-1-1 -0-0-0--
Looping Statements
Looping Statements
There are four types of looping statements. They provide a means of
controlling the execution of a statement zero, one, or more times.
z forever continuously executes a statement.
z repeat executes a statement a fixed number of times.
z while executes a statement until an expression becomes false. If the expression
starts out false, the statement is not executed at all.
z for controls execution of its associated statement(s) by a three-step process, as
follows:
executes an assignment normally used to initialize a variable that controls the
number of loops executed
evaluates an expressionif the result is zero, the for loop exits, and if it is not zero,
the for loop executes its associated statement(s) and then performs step 3
executes an assignment normally used to modify the value of the loop-control
variable, then repeats step 2
synthesis
synthesis
synthesis
synthesis
RTL Simulation
RTL Simulation
C. M. Huang / CIC-CBDC / 07.2004 Simulation - 2
Test Bench Structure
Test Bench Structure
A simple style test bench applies vectors to the design under
test with the output being manually verified.
A sophisticated test bench is a self-checking program where
the results are automatically verified.
Waveform
Generator
DUT
Compare
Results
Test
Pattern
File
Result
File
Pass?
Testbench
Stimuls
Vectors
Output
Vectors
Reference Vectors
Using Concurrent Blocks
Using Concurrent Blocks
fork ... join blocks are common in test files, their parallel
nature lets you specify time in an absolute fashion and
execute complex procedural constructs such as loops or
tasks in parallel.
module inline_ tb;
reg [7: 0] data_ bus;
// instance of DUT
initial
fork
data_ bus = 8'b00;
#10 data_ bus = 8'h45;
#20 repeat (10) #10 data_ bus = data_ bus + 1;
#25 repeat (5) #20 data_ bus = data_ bus << 1;
#140 data_ bus = 8'h0f;
join
endmodule
The two repeat loops above start at different times, and
execute concurrently, applying this particular set of stimulus
would be very difficult to do in a single begin ... end block.
Time | data_ bus
--------------------
0 | 8b0000_0000
10 | 8b0100_0101
30 | 8b0100_0110
40 | 8b0100_0111
45 | 8b1000_1110
50 | 8b1000_1111
60 | 8b1001_0000
65 | 8b0010_0000
70 | 8b0010_0001
80 | 8b0010_0010
85 | 8b0100_0100
90 | 8b0100_0101
100 | 8b0100_0110
105 | 8b1000_1100
110 | 8b1000_1101
120 | 8b1000_1110
125 | 8b0001_1100
140 | 8b0000_1111
Applying Stimulus
Applying Stimulus
There are many ways to generate and apply stimulus to your
design. Some common techniques include:
zIn line stimulus, applied from an initial block
zStimulus applied from a loop or always block
zStimulus applied from an array of vectors or integers
zStimulus that is recorded during one simulation and played back in
another simulation
In Line Stimulus
In Line Stimulus
In line stimulus has the following
characteristics:
z Variables can be listed only when their
values change
z Complex timing relationships are easy
to define
z A test bench can become very large
for complex tests
module inline_ tb;
reg [7:0] data_ bus, addr;
wire [7:0] results;
DUT u1(results, data_ bus, addr);
initial
fork
data_ bus = 8'h00;
addr = 8'h3f;
#10 data_ bus = 8'h45;
#15 addr = 8'hf0;
#40 data_ bus = 8'h0f;
#60 $finish;
join
endmodule
Stimulus From Loops
Stimulus From Loops
Stimulus applied from a loop has the
z The same set of stimulus variables
are modified in every iteration
z Timing relationships are regular in
nature
z Code is compact
module loop_ tb;
reg clk;
reg [7:0] stimulus;
wire [7:0] results;
integer i;
DUT u1(results, stimulus);
always begin // clock generation
clk = 1;
clk = 0;
end
initial begin
for (i = 0; i < 256; i = i + 1)
@(negedge clk) stimulus = i;
#20 $finish;
end
endmodule
Stimulus From Arrays
Stimulus From Arrays
Stimulus applied from an array has the
z The same set of stimulus variables
are modified in every iteration
z Stimulus can be read into an array
directly from a file
module array_ tb;
reg [7:0] data_ bus, stim_array[0:15];
integer i;
DUT u1(results, stimulus);
initial begin
// load array with values
#20 stimulus = stim_array[0];
for (i = 14; i > 1; i = i - 1)
#50 stimulus = stim_array[i] ;
#30 $finish;
end
endmodule
A Multiplier Example
A Multiplier Example
Combinational Shift
Combinational Shift
-
-
Add Multiplier
Add Multiplier
A << k
0
A << k+1
0
B[k]
B[k+1]
A << k+2
0
B[k+2]
1
0
1
0
1
0
+
P[2n-1:0]
A[n-1:0]
.
.
.
.
.
.
How to implement the shift operation?
1. Use the Verilog shift operator :
A'[n+k-1:0] = A[n-1:0] << k;
2. Use the Verilog concatenation operator :
A'[n+k-1:0] = {A[n-1:0], k{1'b0}};
A[0]
A[1]
A[2]
A[3]
A[4]
A[5]
0
0
0
A'[0]
A'[1]
A'[2]
A'[3]
A'[4]
A'[5]
A'[6]
A'[7]
A'[8]
k = 3
n bits
k bits
n bits
Combinational Shift
Combinational Shift
-
-
Add Multiplier
Add Multiplier
How to implement the MUXs?
1. Full multiplexing:
PP_k[n+k-1:0] = B[k] ? A'[n+k-1:0] : (n+k)'b0;
2. Partital multiplexing:
PP_k[n+k-1:0] = {(B[k] ? A'[n+k-1:k] : n'b0), k{1'b0}};
A'[n+k-1:k]
n'b0
PP_k[n+k-1:0] n
k{1'b0}
B[k]
A'[n+k-1:0]
(n+k)'b0
PP_k[n+k-1:0] n+k
B[k]
A << k
0
A << k+1
0
B[k]
B[k+1]
A << k+2
0
B[k+2]
1
0
1
0
1
0
+
P[2n-1:0]
A[n-1:0]
.
.
.
.
.
.
Combinational Shift
Combinational Shift
-
-
Add Multiplier
Add Multiplier
Merge the shift operation with MUX
PP_k[n+k-1:0] = {(B[k] ? A[n-1:0] : n'b0), k{1'b0}};
A << k
0
A << k+1
0
B[k]
B[k+1]
A << k+2
0
B[k+2]
1
0
1
0
1
0
+
P[2n-1:0]
A[n-1:0]
.
.
.
.
.
.
A[n-1:0]
n'b0
PP_k[n+k-1:0] n
k{1'b0}
B[k]
Combinational Shift
Combinational Shift
-
-
Add Multiplier
Add Multiplier
How to implement the adder?
1. Multi-operand adder:
P[2n-1:0] = PP_0[n-1:0] +
PP_1[n:0] +
PP_2[n+1:0] +
PP_3[n+2:0] +
.
.
.
PP_30[n+29:0]+
PP_31[n+30:0];
A << k
0
A << k+1
0
B[k]
B[k+1]
A << k+2
0
B[k+2]
1
0
1
0
1
0
+
P[2n-1:0]
A[n-1:0]
.
.
.
.
.
.
Combinational Shift
Combinational Shift
-
-
Add Multiplier
Add Multiplier
How to implement the adder?
2. Tree adder:
PP_0_1[n+1:0] = PP_0[n-1:0] + PP_1[n:0];
PP_2_3[n+3:0] = PP_2[n+1:0] + PP_3[n+2:0];
PP_4_5[n+5:0] = PP_4[n+3:0] + PP_5[n+4:0];
PP_6_7[n+7:0] = PP_6[n+5:0] + PP_7[n+6:0];
.
.
PP_0_3[n+3:0] = PP_0_1[n+1:0] + PP_2_3[n+3:0];
PP_4_7[n+7:0] = PP_4_5[n+5:0] + PP_6_7[n+7:0];
.
.
PP_0_7[n+7:0] = PP_0_3[n+3:0] + PP_4_7[n+7:0];
.
.
PP_0_15[n+15:0] = PP_0_7[n+7:0] + PP_8_15[n+15:0];
.
.
PP_0_31[n+31:0] = PP_0_15[n+15:0] + PP_16_31[n+31:0];
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
The Complete Design
The Complete Design
+
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
1 module cmult32(A, B, P);
2 parameter N = 32;
3
4 output [2*N-1:0] P;
5 input [N-1:0] A, B;
6
7 // Partial products
8
9 wire [N-1:0] PP_0 = B[ 0] ? A : 32'b0; // 32 bits
10 wire [N:0] PP_1 = {(B[ 1] ? A : 32'b0), 1'b0}; // 33 bits
11 wire [N+1:0] PP_2 = {(B[ 2] ? A : 32'b0), 2'b0}; // 34 bits
19 wire [N+9:0] PP_10 = {(B[10] ? A : 32'b0), 10'b0}; // 42 bits
The Complete Design
The Complete Design
+
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
41
The Complete Design
The Complete Design
+
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
42 // Level 1 summation
43
44 wire [N+1 :0] PP_0_1 = PP_0 + PP_1; // 34 bits
49 wire [N+11:0] PP_10_11 = PP_10 + PP_11; // 44 bits
60
The Complete Design
The Complete Design
+
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
62
63 wire [N+3 :0] PP_0_3 = PP_0_1 + PP_2_3; // 36 bits
65 wire [N+11:0] PP_8_11 = PP_8_9 + PP_10_11; // 44 bits
71
73
78
The Complete Design
The Complete Design
+
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
80
83
85
86 wire [N+31:0] P = PP_0_15 + PP_16_31; // 64 bits
87
88 endmodule
Validation Plan
Validation Plan
Complete test is impossible
z Need to apply 2
32
(A
n-1
) * 2
32
(B
n-1
) * 2
32
(A
n
) * 2
32
(B
n
) patterns !!!
Test vectors (include input patterns and expected output patterns)
should be generated randomly.
Test vectors should be stored in separate file (off-line test vector
generation) which can be used by various testbenchs.
Testbench should read in test vectors and generate corresponding
input patterns and expected output patterns.
Testbench should apply input patterns on the DUT and compare the
output of DUT with the expected output pattern automatically.
Testbench should report the mismatched results and the total error
count when all test vectors are applied.
Off
Off
-
-
line Test Vector Generator
line Test Vector Generator
1 `define NUMBER_OF_PATTERN 1000
2
3 module pgen;
4 reg [31:0] a, b;
5 reg [63:0] p;
6 integer i, sf;
7
8 initial
9 begin
10 sf = $fopen("mult32.dat");
11 for (i=0; i<`NUMBER_OF_PATTERN; i = i + 1)
12 begin
13 a = $random; // first input
14 b = $random; // second input
15 p = a * b; // expected output
16 $fdisplay(sf, "%b_%b_%b", a, b, p);
17 end
18 $fclose(sf);
19 end
20
21 endmodule
Generated Test Vector File
Generated Test Vector File
00010010000101010011010100100100_11000000100010010101111010000001_0000110110011001100110111110000101010110011101101111111100100100
10000100100001001101011000001001_10110001111100000101011001100011_0101110000011100010000010011110011000000101101011100101101111011
00000110101110010111101100001101_01000110110111111001100110001101_0000000111011100100101110100000001001101000001101000101100101001
10110010110000101000010001100101_10001001001101110101001000010010_0101111111010000101110011111001101011110110010001010100100011010
00000000111100111110001100000001_00000110110101111100110100001101_0000000000000110100001001110100100000101000000010101010000001101
00111011001000111111000101110110_00011110100011011100110100111101_0000011100001110111110001000000110110001111010100000011100011110
01110110110101000101011111101101_01000110001011011111011110001100_0010000010010011011001100100011000011111100111101100000010011100
01111100111111011110100111111001_11100011001101110010010011000110_0110111011110000000100101111100101110000110010001111101010010110
11100010111101111000010011000101_11010101000100111101001010101010_1011110011101001100001001001110010000110111001101100010011010010
01110010101011111111011111100101_10111011110100100111001001110111_0101010000100100110100011011010101001101000010100011010101110011
10001001001100101101011000010010_01000111111011001101101110001111_0010011010001100000010011110011101110111000111101111101000001110
01111001001100000110100111110010_11100111011101101001011011001110_0110110110010010110100110100111000010110100101010000110010111100
.
.
.
First Input Value Second Input Value Expected Output Value
The Structure of
The Structure of
Testbench
Testbench
00010010000101010011010100100100_11000000100010010101111010000001_0000110110011001100110111110000101010110011101101111111100100100
10000100100001001101011000001001_10110001111100000101011001100011_0101110000011100010000010011110011000000101101011100101101111011
00000110101110010111101100001101_01000110110111111001100110001101_0000000111011100100101110100000001001101000001101000101100101001
10110010110000101000010001100101_10001001001101110101001000010010_0101111111010000101110011111001101011110110010001010100100011010
00000000111100111110001100000001_00000110110101111100110100001101_0000000000000110100001001110100100000101000000010101010000001101
00111011001000111111000101110110_00011110100011011100110100111101_0000011100001110111110001000000110110001111010100000011100011110
01110110110101000101011111101101_01000110001011011111011110001100_0010000010010011011001100100011000011111100111101100000010011100
01111100111111011110100111111001_11100011001101110010010011000110_0110111011110000000100101111100101110000110010001111101010010110
11100010111101111000010011000101_11010101000100111101001010101010_1011110011101001100001001001110010000110111001101100010011010010
01110010101011111111011111100101_10111011110100100111001001110111_0101010000100100110100011011010101001101000010100011010101110011
10001001001100101101011000010010_01000111111011001101101110001111_0010011010001100000010011110011101110111000111101111101000001110
01111001001100000110100111110010_11100111011101101001011011001110_0110110110010010110100110100111000010110100101010000110010111100
X
== Pass?
1. Read in the test vectors
2. Apply the input patterns
3. Compare the results
DUT
The Complete
The Complete
Testbench
Testbench
1 `define CYCLE_TIME 5
2 `define NUMBER_OF_PATTERN 1000
3
4 module test;
5 parameter N = 32;
6
7 reg [N-1:0] A, B;
8 wire [2*N-1:0] P;
9
10 reg [127:0] pattern[`NUMBER_OF_PATTERN-1:0];
11 reg [127:0] t;
12 reg [2*N-1:0] Prod;
13 integer ErrorCount, i;
14
15 cmult32 u0(A, B, P); // DUT
16
The Complete
The Complete
Testbench
Testbench
17 initial
18 begin
19 $readmemb("error32.dat", pattern); // read test vectors into memory
20 ErrorCount = 0;
21 for (i=0; i<`NUMBER_OF_PATTERN; i=i+1)
22 begin
23 t = pattern[i]; // ith test vector
24 A = t[127:96]; // first input
25 B = t[95:64]; // second input
26 Prod = t[63:0]; // expected output
27 #8;
28 if (P != Prod) // compare the results
29 begin
30 $display($time,,"%d * %d = %d != %d", A, B, Prod, P);
31 ErrorCount = ErrorCount + 1;
32 end
33 #2;
34 end
35 $display("Failed pattern: %0d/%0d\n", ErrorCount, `NUMBER_OF_PATTERN);
36 $finish;
37 end
38
39 endmodule
C. M. Huang / CIC-CBDC / 07.2004 Code Coverage - 2
Design Verification Methods
Design Verification Methods
Given the HDL descriptions of a hardware design, how to
verify its correctness?
Target of verification
zFunctionality Correctness
zTiming Correctness
Verification Strategies
zDynamic Analysis: simulation
zStatic Analysis: STA
zFormal Method: compare with the golden design
Functionality
Timing
Dynamic
Static
Formal
About Simulation
About Simulation
Advantage: verification for both of functionality correctness
and timing correctness.
Disadvantage: verification quality dependents on simulation
pattern quality.
Feeds 1000 patterns and exercises 25% circuit, is it helpful
for verification?
Two approaches:
zHow many patterns need to feed to cover the whole circuit? HARD!
zHow many portion of circuit be covered after apply certain patterns?
EASIER!
Code Execution Sequence
Code Execution Sequence
A
B
C
D
E
F
G
T
F
Possible execution path
zA B C D E
zA B C F G
When A is executed, B will be
executed as well (basic block).
For exercising all the possible
execution paths, all possible values of
C need to be appeared during
simulation.
Code Coverage as a Testing Method
Code Coverage as a Testing Method
Code coverage with respect to HDL is a measure of how well
a model of a circuit is tested by determining how well a test
exercise the design.
There is a hierarchy of coverage criteria:
zBranch coverage is superior to statement coverage
zPath coverage is better than branch coverage
zThere is no way that path coverage can be 100% without branch
coverage also being 100%
zConsider only statement and branch coverage until them reach
100%, and then rerun to check the higher level criteria such as
path coverage.
Goal of Coverage Analysis
Goal of Coverage Analysis
Monitor and evaluate the functional simulation of a design to
determine which portions of the design have not been tested.
Identify & locate the untested area of design, and additional
tests can be created to target those portions of the design.
Used to improve the test quality.
Used to Identify untested, untestable or redundant logic.
Analyzing the HDL Source Code
Analyzing the HDL Source Code
During this phase the coverage analysis tools inspects the
HDL source code to determine where monitor points such as
probes should be inserted in order to collect the maximum
amount of information about simulation activity in the design.
It is crucial that the source code is not altered in any way, so
this process (known as instrumenting) must be non-intrusive
and normally carried out by making copies of the original
source files and instrumenting those files.
Different types of probe are used depending on the type of
coverage measurements selected by the user.
module d1(a, b, q);
input a;
input b;
output q;
reg a_, b_;
reg q;
always @ (a or b)
begin
a_ = ~a;
b_ = ~b;
q = a_ & b_;
end
endmodule
module d1(a, b, q) ;
reg [0:4] verisure_countAi;
integer verisure_countS_0;
integer verisure4;
time verisure3;
initial begin
verisure4 = $verisure_probe0("/vnavigator_files/d1_m.control",
"d1", verisure3);
if (verisure3 > 0) #verisure3 ;
verisure_countS_0 = 0;
verisure_countAi = 0;
end
input a;
input b;
output q;
reg a_, b_;
reg q;
Collecting Coverage Data from the Simulation
Collecting Coverage Data from the Simulation
Most coverage analysis tools automatically invoke the
appropriate logic simulator and run a normal simulation to
collect information about activity in the design.
The information collected from the various probes that were
embedded in the source files is used to build a series of
history files for each design unit or module in the design.
The information in the history file defines what actually
happened during the simulation.
format: 3
run: test
start: 1069947065
elapsed: 0
2
/work/lab/trans/vnavigator_files/test_m.pp
/work/lab/trans/vnavigator_files/test_m.control
test
test
3
5
0 2 1
1 1 1
2 2 1
3 1 1
4 2 1
0
0
0
0
Presenting the Results to the User
Presenting the Results to the User
Most coverage analysis tools enable the results to be
displayed graphically on the screens as well as generating
textual printouts.
Hierarchical views, color-coding and filtering techniques are
all used to enable a user to quickly navigate to the problem
areas.
test (test):
Metric Instance Only Instance and sub-components
---------- --------------------- ---------------------------
Statement 8/8 (0) 100 % 11/11 (0) 100 %
Branch ---/--- (-) --- % ---/--- (-) --- %
Condition ---/--- (-) --- % ---/--- (-) --- %
Triggering ---/--- (-) --- % ---/--- (-) --- %
Toggle 3/3 (0) 100 % 8/8 (0) 100 %
Trace ---/--- (-) --- % ---/--- (-) --- %
Path ---/--- (-) --- % ---/--- (-) --- %
Excluded ---/--- (-) --- % ---/--- (-) --- %
State ---/--- (-) --- % ---/--- (-) --- %
Arc ---/--- (-) --- % ---/--- (-) --- %
FSM path ---/--- (-) --- % ---/--- (-) --- %
Coverage Analysis Result
Coverage Analysis Result
Instance Coverage, Module Coverage
zStatement
zDecision
zExpression
State Machine Coverage
zState Visitation
zState Transition
zPaired State Visitation
Coverage Analysis Measurements
Coverage Analysis Measurements
Statement coverage
Branch coverage
Condition and expression coverage
Path coverage
Toggle coverage
Signal-tracing coverage
Statement Coverage
Statement Coverage
Statement coverage provides a measure of the number of
executable statements in your HDL that are executed when
the test is run.
Executable statements are those that correspond to a
definite action at run time, as opposed to HDL that
represents comments, compiler directives or non executable
declarations.
Statement Coverage
Statement Coverage
In Verilog 1364-1995 code, statement coverage is performed
on the following executable statements:
zContinuous assignments
zProcedural assignments (blocking and non-blocking)
zLooping statements (forever, repeat, while, for)
zTiming control statements (delay, event, wait)
zSubroutine enable statements (task, system task)
zInterrupt statements (disable)
zPatching statements (force, release)
Branch Coverage
Branch Coverage
Branch coverage (also known as decision coverage) is
defined as the percentage given by the number of program
branches that are taken as the test exercises the model,
divided by the total number of possible branches in the
design.
Branch Coverage
Branch Coverage
A program branch is a possible outcome of a conditional
statement.
zAn if statement has two possible branches, one of which is taken
if the condition evaluates to TRUE, the other if it is FALSE.
zA case statement have a number of branches equal to the
number of choices within the case block.
Branch Coverage
Branch Coverage
The following example shows how branch coverage is
superior to statement coverage:
if b = a
c = 1;
d = c;
Imagine a test bench that forced b to always equal a.
zThe statement coverage would be 100%.
zThe branch coverage would only be 50%, since the FALSE branch
(corresponding to a not equal to b) would not have been taken.
Branch Coverage
Branch Coverage
Branch coverage checks the following Verilog 1995
constructs:
zif, else if, else blocks
zcase, casex, casez blocks
zternary (? :) conditional operators
Condition Coverage
Condition Coverage
Condition coverage is present in many software code
coverage tools and determines the extent to which
expressions have been tested.
For example, the if statement
if ((a == 1) || (b == 1))
could be TRUE because a = 1, because b = 1, or because
both = 1.
Condition coverage checks the combinations and identifies
those that have been tested and those that have not.
Condition Coverage
Condition Coverage
Similarly, in the continuous assignment statement
a <= b || (c && d);
condition coverage checks and reports the combinations of
values that have been used to assign values to a, and those
that have not.
The use of condition coverage is the only way of ensuring
that a test can detect errors such as
zthe use of the wrong logical operator (e.g. && instead of ||), or
zthe incorrect placement of brackets, e.g. !(a)||(b) differs
from !(a || b).
Condition Coverage
Condition Coverage
For Verilog 1995, condition coverage checks all pre-defined
logical operators, i.e. !,|| and &&.
It covers nested expressions within any if or else if
expression containing a logical operator.
That coverage on the bitwise operators (& | ^ ^~ ~^)
are provided if the operands are one of the following:
zA reference to a net or a register in its complete form, e.g b, where
the net or a register is a single bit
zA bit select operand, e.g a[1]
zAn expression where the operator is not one of + - * / %
<< >>
Path Coverage
Path Coverage
A program path is defined as a sequence of executable
statements executed in a particular order.
Paths can be viewed as combinations of sequential branches,
hence full branch coverage is a precondition for full path
coverage. For instance consider the code fragment:
Path Coverage
Path Coverage
Consider the code fragment:
if (a == 1)
operand = data;
else
operand = 0.0;
if (b == 1)
result = 1.0/operand;
else
result = operand;
Path Coverage
Path Coverage
100% branch coverage can be achieved by using the data
sets (a=1, b=1) and (a=0, b=0).
Path coverage, however, would be only 50%, since the paths
forced by (a=0, b=1) and (a=1, b=0) have not
been taken.
Path coverage would thus show that the path leading to the
divide by zero error, which occurs for (a=0, b=1), is
untested. Branch coverage would not detect this.
Infeasible Paths
Infeasible Paths
Note that certain paths may be infeasible.
For instance, the source shown below forms an infeasible
path. The path on which the first IF statement evaluates to
TRUE and the second evaluates to FALSE is impossible and
therefore 100% path coverage cannot be achieved.
if (a==1)
unconditional HDL
if (a==1)
unconditional HDL
Toggle Coverage
Toggle Coverage
Checks that each bit in the registers and nets of a module
change polarity (i.e. 'toggles') and are not atuck at one
particular level.
Toggle coverage is a very useful coverage measurement as
it shows the amount of activity within the design and helps to
pinpoint areas that have not been adequately verified by the
testbench.
Toggle Coverage
Toggle Coverage
Count Bit Transition
***0*** aa[0] posedge
***0*** aa[0] negedge
1 aa[1] posedge
1 aa[1] negedge
1 aa[2] posedge
***0*** aa[2] negedge
***0*** aa[3] posedge
1 aa[3] negedge
Summary
Toggle coverage is : 25%
Number of toggles executed : 1 (aa[1])
Number of toggles considered: 4
Signal Tracing Coverage
Signal Tracing Coverage
Checks the variables (i.e. nets and registers) and
combinations of variables take a range of values.
Signal trace coverage information
Signal name Lowest value Highest value
Done 0 1
LSB 0 1
Signal value combinations
Count Done LSB
3 0 0
1 1 0
3 0 1
0 1 1
Instance v.s. Module Coverage
Instance v.s. Module Coverage
Module Type ABC
Instance A1
Line 1
Line 2
Line 3
Line 4 X
Line 5 X
Line 6 X
Module Type ABC
Instance A2
Line 1 X
Line 2 X
Line 3 X
Line 4 X
Line 5
Line 6
Instance Coverage of A1 = 3/6 = 50%
Instance Coverage of A2 = 4/6 = 66%
Total Instance Coverage = (3+4)/(6+6) = 58%
Total Module Coverage = 100%
Finite State Machine Coverage
Finite State Machine Coverage
Analysis
Analysis
FSM Metrics
FSM Metrics
State
This metric identifies all the possible states in a FSM, reports
on the proportion that were actually exercised during
simulation and identifies the states that were not attained.
Arc
An arc is a transition between two 'adjacent' states. The arc
coverage metric reports on those arcs traversed during
simulation, expressing these as a proportion of all possible
arcs, and identifies any arcs that were not traversed.
FSM Metrics
FSM Metrics
Path
The state and arc metrics are the traditional methods of
measuring coverage in FSMs and it is usual to aim for over
95% coverage. However, it is possible to achieve 100% state
and arc coverage without exercising all possible sequences
of arcs and states. In other words, arc and state coverage do
not necessarily measure the extent to which the total control
functionality of the FSM has been tested.
A path is a combination of adjacent arcs - so a path
represents a valid sequence of states.
FSM Example
FSM Example
-
-
1
1
clock in out
----- -- ---
1 01 01
2 01 10
----- -- ---
ST0 ST1 ST2 ST3
ST2
ST3
ST0
ST1
D R R
R D R
D R D R
R R D
10,11/00
ST0 ST1
ST3 ST2
01/01
0
1
/
1
0
01/11
0
1
,
1
1
/
0
0
10/01
10/11
00,11/00
00/10
0
0
/
0
1
0
0
/
1
1
10/10
11/00
coverage criteria
FSM Example
FSM Example
-
-
2
2
clock in out
----- -- ---
1 01 01
2 01 10
3 01 11
4 01 00
5 01 01
6 00 00
7 00 11
8 10 11
9 00 10
10 11 00
----- -- ---
10,11/00
ST0 ST1
ST3 ST2
01/01
0
1
/
1
0
01/11
0
1
,
1
1
/
0
0
10/01
10/11
00,11/00
00/10
0
0
/
0
1
0
0
/
1
1
10/10
11/00
FSM Example
FSM Example
-
-
3
3
clock in out
----- -- ---
1 01 01
2 01 10
3 01 11
4 01 00
5 00 11
6 00 10
7 00 01
8 00 00
----- -- ---
10,11/00
ST0 ST1
ST3 ST2
01/01
0
1
/
1
0
01/11
0
1
,
1
1
/
0
0
10/01
10/11
00,11/00
00/10
0
0
/
0
1
0
0
/
1
1
10/10
11/00
ST0 ST1 ST2 ST3
ST2
ST3
ST0
ST1
D R R
D R
D R D R
R R D
coverage criteria
References
References
J. Bergeron, Writing Testbenches: Functional Verification of
HDL Models, Second Edition, Kluwer Academic Publishers,
2003
L. Bening and H. Foster, Principles of Verifiable RTL Design:
A Functional Coding Style Supporting Verification Processes
in Verilog, Kluwer Academic Publishers, 2000
D. Dempster and M. Stuart, Verification Methodology Manual:
Techniques for Verifying HDL Designs, Second Edition,
Teamwork International, 2001
F. Nekoogar, Timing Verification of Application-Specific
Integrated Circuits (ASICs), Prentice Hall PTR, 1999
Synthesis and Optimization
Synthesis and Optimization
C. M. Huang / CIC-CBDC / 07.2004 Synthesis - 2
HDL Synthesis and Optimization
HDL Synthesis and Optimization
-VHDL //Verilog
if (A='1') then if (A==1)
Y <= C and D; Y = C & D;
else (B='1') then else if (B==1)
Y <= C or D; Y = C || D;
else else
Y <= C; Y = C;
end if;
A
B
C
D
Y
A
Y
B
C
D
Translation (RTL Synthesis)
Mapping & Optimization
(Logic Synthesis)
HDL Synthesis
HDL Synthesis
Translate HDL code to functional equivalent technology dependent
gate-level netlist which preserve design requirements
RTL synthesis
z HDL code parsing
z Synthetic library component instantiation
z RTL optimization
z Technology independent representation translation
Logic synthesis
z Logic optimization
z Technology mapping & optimization
RTL Synthesis Problems
RTL Synthesis Problems
How to make the code synthesizable?
z Unsupported language constructs
z Unsupported semantics
How to make the before & after synthesis designs consist?
z Combinational before, sequential after
z 8 latches before, 12 latches after
How to make the code optimal for synthesis?
z Resource sharing
Logic Synthesis Problems
Logic Synthesis Problems
How to explore the design space and select the best
implementation?
Area
Delay
Tasks for RTL Synthesis
Tasks for RTL Synthesis
Re-modeling the code to meet the synthesizer
requirement (synthesizable code)
Re-simulate to make sure the consistency between these
two codes
Analysis and elaborate the code
Check the result and make sure there is not any
unexpected resource been generated
Tasks for Logic Synthesis
Tasks for Logic Synthesis
Describe the design rule constraints
Describe the optimization constraints
Compile the design
Check the synthesis reports to make sure the result meets
the timing, area and other constraints
Export the design to a gate-level netlist format
Re-simulate to make sure the consistency between RTL &
gate-level models
Using Synthetic Libraries
Using Synthetic Libraries
module add32(a, b, cin, sum, cout);
.
.
always @(a or b or cin)
begin
/* synopsys resource r0:
ops = A1,
map_to_module = DW01_add,
implementation = cla; */
temp = ({1`b0, a, cin} + // synopsys label A1
{1`b0, b, 1`b1}) >> 1;
end
assign {cout, sum} = temp[33:1];
endmodule
Some Notes About Synthesis
Some Notes About Synthesis
Synthesis is not suitable for all kinds of design, e.g. design
with special topological requirements
Separate structural logic from random logic
Isolate finite state machine from other logic
Maintain a reasonable gate count per module (250 to
5,000 gates per module)
RTL Synthesis and Optimization
RTL Synthesis and Optimization
RTL Synthesis
RTL Synthesis
RTL Description Model
Synthesizer supported language constructs
Translation of language constructs
Unsynthesizable code examples
High-level Generic Optimizations
z Resource sharing
z Three Height Reduction
z Implicit Constant Propagation
z Common Sub-Expression Elimination
z Extraction of Sum-of-Products Logic
What Are You Coding for?
What Are You Coding for?
Controller
(FSM)
Datapath
Register
Mux
Functional
Unit
RTL Description Model (1/3)
Describe the Functional Units (Operations)
A = B + C * D;
Z = (X >> 2) & Y;
Describe the Muxs (or Priority Encoders)
q = (sel == 1b0) ? i1 : i2;
if (sel == 1b0)
q = i1;
else
q = i2;
Describe the Registers (or Latches)
begin
if (!reset)
q = 1b0;
else
q = d;
end
always @(!enable)
begin
q = d;
end R
H
S

i
s

n
o
t
s
p
e
c
i
f
i
e
d

i
n

t
h
e

s
e
n
s
i
t
i
v
e

l
i
s
t
Describe the FSMs (State Registers + Random Logic)
always @(posedge CLK or negedge RESET)
if (!RESET)
STATE = ST0;
else
case (STATE)
ST0: begin Y = 1; STATE = ST1; end
ST1: begin
Y = 2;
if (CONTROL)
STATE = ST2;
else
STATE = ST3;
end
endcase
Output
Logic
Next
State
Logic
State
Register
Moore Machine
Output
Logic Next
State
Logic
State
Register
Mealy Machine
Behavioral Description Model (1/2)
Describe the Datapaths (Easily, Effectively)
zSequential Evaluation Model, iteration time is ignored
zNo sequential element access constraints
Matrix-Vector Multiplication
for (i=1; i<n; i=i+1)
begin
C[i] = 8b0;
for (j=1; j<m; j=j+1)
C[i] = C[i] + A[i][j] * B[j];
end
#ADD = ? #MUL = ?
#MEM PORT = ?
#CLK = ?
C
a
n

N
o
t

U
n
i
q
u
e
l
y

D
e
f
i
n
e

A
n
A
r
c
h
i
t
e
c
t
u
r
e
!
Describe the FSMs (Intuitively)
zImplicit Description Model, No Explicit State Register
always
begin
@(posedge clk)
total = data;
@(posedge clk)
total = total + data;
@(posedge clk)
total = total + data;
end
C
a
n

N
o
t

U
n
i
q
u
e
l
y

D
e
f
i
n
e

A
n
A
r
c
h
i
t
e
c
t
u
r
e
!
Make the Code Synthesizable?
Make the Code Synthesizable?
HDL Code Interpretation Order
zIdentify Registers and Latches,
zClock domains are determined
zIdentify Muxs and Priority Encoders
zIdentify Functional Units
A = B / C;
X = Y % Z;
zCombinational Feed-back Loop Detection
Supported Net Types
Supported Net Types
Signal nets
z wire,tri
Wired nets
z wand,wor,triand,trior
z trireg
z tri0,tri1
Supply nets
z supply0, supply1
module MyWand(a, b, c);
input a, b;
output c;
wand c;
assign c = a;
assign c = b;
endmodule
Supported Declarations
Supported Declarations
Registers
z reg
Memories
z array of register variables
Integers (scalar only)
z integer
Time (64-bit)
z time
Real numbers
z real
Parameters
z parameter
Supported Build
Supported Build
-
-
in Primitives
in Primitives
Logic gates
z and, nand, or, nor, xor, xnor
z buf, not
z bufif0, bufif1, notif1, notif0
MOS Switches
z nmos, pmos, rnmos, rpmos
z cmos, rcmos
Bi-directional Pass Switches
z tran, tranif1, tranif0
z rtran, rtranif1, rtranif0
Sources
z pullup, pulldown
Supported Operators
Supported Operators
Arithmetic operators
z +, -, *, /, %
Relational operators
z <, >, <=, >=
Equality operators
z ==, !=, ===, !==
Logical operators
z &&, ||, !
Bit-wise operators
z ~, &, |, ^, ~^, ^~
parameter size = 8;
wire [3:0] a,b,c,d,e;
assign c = size + 2;
assign d = a + 1;
assign e = a + b;
Supported Operators
Supported Operators
Reduction operators
z &, |, ^, ~&, ~|, ~^, ^~
Shift operators (by constant or variable)
z <<, >>
Conditional operator
z ?:
Concatenations
z {, }
Supported Behavioral Statements
Supported Behavioral Statements
Non-blocking procedure assignments
begin...end sequential blocks
if...else statements
case/casex/casez statements
for loops
while loops
forever loops
disable statements
Constructs Translation
Constructs Translation
Sequential constructs
Conditional assignments
Procedure assignments
Full case & parallel case
Expressions with Parentheses
Sequential Constructs
Sequential Constructs
for (i=0; i<8; i=i+1)
example[i] = a[i] & b[7-i];
Although many Verilog constructs appear sequential in nature, they
describe combinational circuitry
example[0] = a[0] & b[7];
example[1] = a[1] & b[6];
example[2] = a[2] & b[5];
example[3] = a[3] & b[4];
example[4] = a[4] & b[3];
example[5] = a[5] & b[2];
example[6] = a[6] & b[1];
example[7] = a[7] & b[0];
Loop unrolling
Conditional Assignments
Conditional Assignments
x = b;
if (y)
x = x + a;
if (y)
x = b + a;
else
x = b;
if (y)
x = b;
z = x;
no latch synthesized for x
no latch synthesized for x
a latch will be synthesized for x
A variable is conditionally assigned if there is a path that does not
explicitly assign a value to that variable
+
x
y
b
a
0
1
y
b
x
z
MUX or Latch
MUX or Latch
always @(en or a)
begin
if (en)
y = a;
else
y = 0;
end
always @(en or a)
begin
if (en)
y = a;
end
A multiplexer will be synthesized
A latch will be synthesized
Non-blocking (RTL) procedure assignments
data
clk
reg_c
reg_d
data
clk
reg_c
reg_d
begin
reg_c = data;
reg_d = reg_c;
end
begin
reg_c <= data;
reg_d <= reg_c;
end
Full Case and Parallel Case
Full Case and Parallel Case
Full Not Full
Multiplexer
Priority Encoder
Latch,
Multiplexer
Latch,
Priority Encoder
Parallel
Not Parallel
(overlapped)
input [1:0] a;
always (....)
case (a)
2'b11: b = w;
2'b10: b = x;
2'b01: b = y;
2'b00: b = z;
endcase
input [1:0] a;
always (....)
case (a)
2'b11: b = w;
2'b10: b = x;
endcase
Full &
Parallel
Not Full &
Parallel
Expressions with Parenthesis
Expressions with Parenthesis
module AddTree(Y1, Y2, A, B, C, D);
input [7:0] A, B, C, D;
output [9:0] Y1, Y2;
reg [9:0] Y1, Y2;
always @(A or B or C or D)
begin
Y1 <= A + B + C + D;
Y2 <= (A + B) + (C + D);
end
endmodule
+
+
+
A B
C
D
Y1
+
A B
+
C D
+
Y2
Combinational Feedback Loop Example
Combinational Feedback Loop Example
A while loop creates a conditional branch that must be broken by an
@(posedge clk) or @(negedge clk)
to prevent combinational feedback loop
always
while (x < y)
x = x + z;
always
begin
@(posedge clk)
while (x < y)
begin
@(posedge clk)
x = x + z;
end
end;
<
x
y
Non
Non
-
-
static Loops
static Loops
module NonStaticLoop(A,B,R,Y);
input [7:0] A, B;
input [2:0] R;
output [7:0] Y;
reg [7:0] Y;
integer N;
always @(A)
begin
Y = 8b0;
for (N=0; N<R; N=N+1)
Y[N] = A[N] & B[N];
end
endmodule
R is non-static
Disagree Simulation Results
Disagree Simulation Results
Incomplete event specification
Non-local reference within a function
Order dependency of concurrent statements
Code with delays
Comparisons to X or Z
Incomplete Event Specification
Incomplete Event Specification
always @(a or b)
begin
f = a & b & c;
end
always @(a or b or c)
begin
f = a & b & c;
end
Non
Non
-
-
local References Within a Function
local References Within a Function
function byte_compare;
input [15:0] vector1, vector2;
input [7:0] length;
begin
if (byte_sel)
// compare the upper byte
else
// compare the lower byte
end
endfunction
Order Dependency of Concurrent Statements Order Dependency of Concurrent Statements
always @(posedge Clock)
begin: CONCURRENT_1
Y1 = A;
end
begin: CONCURRENT_2
if (Y1 == 1)
Y2 = B;
else
Y2 = 0;
end
begin: ALL_IN_ONE
if (Y1 == 1)
Y2 = B;
else
Y2 = 0;
Y1 <= A;
end
always
begin
if (A == 1`bX)
B = 0;
else
B = 1;
end
Warning:Comparisons to a dont
care are treated as always being
false in routine test2 line 10 in
file test2.v. This may cause
simulation to disagree with
synthesis. (HDL-170)
Resource Sharing
Resource Sharing
Without resource sharing, each Verilog operation is built
with separate circuitry, e.g. every + with noncomputable
operands cause a new adder to be built
Resource sharing restrictions
Control flow conflicts
Data flow conflicts
Automatic Resource Sharing
Automatic Resource Sharing
always @(A or B or C or ADD_B)
begin
if (ADD_B)
Z = B + A;
else
Z = A + C;
end
B
add_0
A
C
Z
ADD_B
Z
B
add_0
add_1
A
C
ADD_B
Resource Sharing Restrictions
Resource Sharing Restrictions
Operations can be shared only if they lie in the same always block
always @(A1 or B1 or C1 or D1 or COND_1)
begin
if (COND_1)
Z1 = A1 + B1;
else
Z1 = C1 + D1;
end
always @(A2 or B2 or C2 or D2 or COND_2)
begin
if (COND_2)
Z2 = A2 + B2;
else
Z2 = C2 + D2;
end
4
1
2
3
1 2 3 4
1
2
3
4
Y
N
N
Y
N
N
N
N
Y
N
N
Y
Control Flow Conflicts
Control Flow Conflicts
Two operations can be shared only if no execution path exists from
the start of the block to the end of the block reaches both operations
Z1 = A + B;
if (COND_1)
Z2 = C + D;
else begin
Z2 = E + F;
if (COND_2)
Z3 = G + H;
else
Z3 = I + J;
end
if (!COND_1)
Z4 = K + L;
else
Z4 = M + N;
4
1
2
3
1 2 3 4
1
2
3
4
N
N
N
N
N
Y
N
N
Y
5
6
7
5 6 7
5
6
7
N
N
N
N
N
Y
Y
Y
N
N
N
N
N
Y
N
N
N
Y
Y
N
N
N
N
N
N
N
Y
N
N
N
N
N
Y
Data Flow Conflicts
Data Flow Conflicts
Operations cannot be shared if doing so causes a combinational
feedback loop
always @(A or B or C or D or E or F or Z or ADD_B)
begin
if (ADD_B)
begin
TEMP_1 = A + B;
Z = TEMP_1 + C;
end
else
begin
TEMP_2 = D + E;
Z = TEMP_2 + F;
end
end
R1
R2
R1 R2
TEMP_1
TEMP_2
Tree Height Reduction (THR)
Tree Height Reduction (THR)
Minimizes the delay of complex arithmetic expressions.
Tree Height Reduction (THR) is a timing independent
optimization technique for reducing the height of an
arithmetic expression tree by balancing its subtrees.
The height of a tree is balanced when the height of its left
and right subtrees do not differ by more than one.
The height of the tree is equal to the number of steps needed
to compute the expression, so the smaller the height of the
expression tree, the smaller the delay in computing the
expression.
THR Disabled and Enabled
THR Disabled and Enabled
Implicit Constant Propagation (ICP)
Implicit Constant Propagation (ICP)
Reduces area and delay by identifying variables in the RTL
design that can be implemented as constants in the
synthesized design.
Common Sub
Common Sub
-
-
Expression Elimination (CSE)
Expression Elimination (CSE)
Removes redundant arithmetic expressions from the RTL
description to minimize the hardware components required to
implement those expressions.
Extraction of Sun
Extraction of Sun
-
-
of
of
-
-
Products Logic
Products Logic
Reduces area by using specialized logic optimization
techniques on constant case statements.
Logic Synthesis and
Logic Synthesis and
Optimization
Optimization
Define the Design Environment
Define the Operating Conditions
zOperating temperature variation
zSupply voltage variation
zProcess variation
Define Wire Load Models
Modeling the System Interface
Determine & Specify Operating Conditions
Determine & Specify Operating Conditions
read my_lib.db
report_lib my_lib
****************************************
Report : library
Library: my_lib
Version: 1999.05
Date : Mon Jan 4 10:56:49 1999
****************************************
...
Operating Conditions:
Name Library Process Temp Volt Interconnect Model
---------------------------------------------------------------------------
WCCOM my_lib 1.50 70.00 4.75 worst_case_tree
WCIND my_lib 1.50 85.00 4.75 worst_case_tree
WCMIL my_lib 1.50 125.00 4.50 worst_case_tree
set_operating_conditions WCCOM lib my_lib
Specify Wire Load Models
Specify Wire Load Models
Wire Loading Model:
Name : 05x05
Location : my_lib
Resistance : 0
Capacitance : 1
Area : 0
Slope : 0.186
Fanout Length Points Average Cap Std Deviation
------------------------------------------------------------------------
1 0.39
auto_wire_load_selection = false
set_wire_load_model 05x05
Input Driving
Output Load
Operating Conditions(PVT)
Wire Load Model
Define Design Constraints
Define Design Constraints
Define design rule constraints
zSet maximum transition time with each output pin of a cell
zSet maximum fanout load for a net
zSet maximum capacitance with each output pin of a cell
Define design optimization constraints
zTiming constraints
zArea constraints
Setting Timing Constraints
Define the clocks
zDefine the period and waveform for the clock
zCreate a virtual clock
zSpecify clock network delay
Specify the I/O timing requirements relative to the clocks
set_input_delay 20 clock CLK DATA_IN
set_output_delay 15 clock CLK DATA_OUT
Specify the combinational path delay requirements
zFor purely combinational delays that are not bounded by a clock
period, set the maximum and minimum delays for the specified
paths
Specify the timing exceptions
zDefine timing relationships that override the default singe-cycle
timing relationship for one or more timing paths
zSpecifying false paths
zSpecifying minimum and maximum delay requirements
zAsynchronous paths
zMulti-cycle paths
Optimizing the Design
Optimizing the Design
Logic-level optimization
zStructuring
zFlattening
Gate-level optimization
zMapping
zDelay optimization
zDesign rule fixing
zArea optimization
Flattening & Structuring
Flattening & Structuring
Flattening
Structuring
out = t1 t2
t1 = a + b(c + f)
t2 = d + e
out = ad + bcd + bdf + ae bce + bef
f0 = ab + ac f0 = a t0
f1 = b + c + d f1 = d + t0
f2 = b c e f2 = t0 e
t0 = b + c
Design Optimization
Design Optimization
Logic optimization
z Flattening
z Structuring
Gate-level optimization
z Initial sequential optimization
z Combination optimization
z Final sequential optimization
z Localized adjusting
Logic level
Gate-level
Technology-dependent
Optimization
Technology-independent
Optimization
The operating conditions
The wire load models
The system interface
z Define drive characteristics for input ports
z Define input & output port loads
z Define fanout loads on output ports
Design Constraints
Design Constraints
Design rule constraints
z Attributes defined in the technology library
z Maximum transition (slew) time
z Maximum fanout load
z Maximum capacitance
Design optimization constraints
z Explicit constraints
z Timing constraints
z Area constraints
Technology Mapping Example
Technology Mapping Example
A
A
B
OR
transistors = 6
gate equivalent = 1.5
Y
B
A B
A
B
AND
transistors = 6
Y
A
B
C
D
E
Y
4 cells
24 transistors
6 equivalent gates
NAND
transistors = 4
gate equivalent = 1
A B
A
B
Y
INV
transistors = 2
A Y
C
A
OAI23
transistors = 8
gate equivalent = 2
A
B
B
D
C
D
Y
A
B
C
D
E
Y
3 cells
14 transistors
3.5 equivalent gates
Before optimization
After optimization
Gate
Gate
-
-
level Delay Calculation
level Delay Calculation
Tasks for Gate
Tasks for Gate
-
-
level Simulation
level Simulation
For whole chip simulation, add I/O pads and buffers
Add time scale statement
Perform delay calculation
Add back-annotation statement to the test bench
Perform simulation
Gate
Gate
-
-
level Simulation Delay Models
level Simulation Delay Models
Linear model
Non-linear equation model
Non-linear table model
LD4 LD2
SL1
SL4
DS4 DS3
CS1
CS4
Delay or output slope with variation
of input slope and output load
Setup and Hold with variation
of data slope and clock slope
Courtesy Avant!
Delay Calculation
Delay Calculation
Input: gate-level netlist, wire RC information, gate-level cell delay
model
Output: delay & timing checks information
Must be done before gate-level simulation
Most of the delay calculators have the pre-layout wire RC estimation
capability
Combined or separated delay
calculators
Need delay back-annotation if
separated delay calculator is used
How to Obtain the Delays
How to Obtain the Delays
(0.201:0.272:0.325, 0.146:0.215:0.276)
(0.187:0.253:0.301, 0.165:0.238:0.295)
(0.212:0.280:0.321, 0.195:0.265:0.327)
(0.182:0.248:0.295, 0.160:0.233:0.289)
Abstract Timing Model
Abstract Timing Model
i
z
i
z
z i
i z
for circuit
simulation
for logic
simulation
Definition of Delay and Slew
Definition of Delay and Slew
t
PLHR
t
PHLF
t
r
t
f
50% 50%
50% 50%
10%
90% 90%
10%
input
output
Notations for Delay
Notations for Delay
t
PLHR
t
PHLR
t
PLHF
t
PHLF
Threshold Voltages
Threshold Voltages
input
output
V
TH
V
TH2
V
TH1
input slew
output
slew
delay
Threshold voltages
are chosen by the
library developers
and can be arbitrary
Typical value:
V
TH1
= V
DD
* 10%
V
TH2
= V
DD
* 90%
V
TH
= V
DD
* 50%
Analytic Delay Model
Analytic Delay Model
DD
out
O
DD
tp
DD
tn
p
L
O
O
DD
PLHF
n
L
O
O
DD
PHLR
V
V
V
V
V
p
V
V
n
C
V
V p
p
p
p V
t
C
V
V n
n
n
n V
t
= = =
(
|
|
.
|
\
| +
+
+
+
=
(
|
|
.
|
\
|
+

=
, ,
where
) 1 ( 2
ln
1
2
) 1 (
1
) 1 ( 2
ln
1
2
) 1 (
1
Elmasry, 1981
C
L
Simulation Result
Simulation Result
-
-
t
t
PHLR PHLR
50%
i_slew = 1ns
c_load = 0.1 ~ 0.5pf
Delay v.s. Loading
Delay v.s. Loading
in01d1 output delay time
0.00E+00
2.00E-10
4.00E-10
6.00E-10
8.00E-10
1.00E-09
1.20E-09
1.40E-09
1.60E-09
1.80E-09
2.00E-09
2.20E-14 4.40E-14 1.10E-13 2.20E-13 4.40E-13 8.80E-13
load (f)
Time
i_slew_0.05ns
i_slew_0.25ns
i_slew_0.5ns
i_slew_1.0ns
i_slew_2.0ns
i_slew_5.0ns
Linear Delay Model
Linear Delay Model
c
1
c
2
t
2
t
1
loading output Cell
t coefficien Load
delay intrinsic pin - to - Pin
0
0
=
=
=
+ =
load
c
load c P
C
A
A
where
C A A t
+ =
+ =
2 0 2
1 0 1
c A A t
c A A t
c
c
Modified Analytic Delay Model
Modified Analytic Delay Model
) 2 1 (
6
) 1 ( 2
ln
1
2
) 1 (
1
) 2 1 (
6
) 1 ( 2
ln
1
2
) 1 (
1
n
t
C
V
V p
p
p
p V
t
p
t
C
V
V n
n
n
n V
t
rise input
p
L
O
O
DD
PLHF
fall input
n
L
O
O
DD
PHLR
+ +
(
|
|
.
|
\
| +
+
+
+
=
+
(
|
|
.
|
\
|
+

=
Hedenstierna and Jeppson, 1987

The Impact of Input Slew Rate
input
output
input
output
50%
50%
90%
10%
90%
10%
i_slew = 0.1ns
c_load = 0.1pf
delay = 0.18ns
i_slew = 1ns
c_load = 0.1pf
delay = 0.30ns
Delay v.s. Input Slew Rate
Delay v.s. Input Slew Rate
in01d1 output delay time
0.00E+00
2.00E-10
4.00E-10
6.00E-10
8.00E-10
1.00E-09
1.20E-09
1.40E-09
1.60E-09
1.80E-09
2.00E-09
i_slew_0.05ns i_slew_0.25ns i_slew_0.5ns i_slew_1.0ns i_slew_2.0ns i_slew_5.0ns
i_slew
Time
2.20E-14
4.40E-14
1.10E-13
2.20E-13
4.40E-13
8.80E-13
Delay, Loading and Input Slew Rate
Delay, Loading and Input Slew Rate
Explicit Delay Formula Method
Explicit Delay Formula Method
Path Delay or Output Slew =
z Intercept+L*Load+S*Slew+LNL*LN(Load+.01)+
LNS*LN(Slew+.01)+LS*Load*Slew+V*Vcc+T*Temp+
P*Process+LNV*LN(Vcc)+LNT*LN(Temp+56)+
LNP*LN(Process+10)+LV*Load*Vcc+LT*Load*Temp+LP*Load*Process+SV*Sle
w*Vcc+ST*Slew*Temp+ SP*Slew*Process+VT*Vcc*Temp+VP*Vcc*Process+
TP*Temp*Process
630 circuit simulations per path and input pin direction for
characterization
25% accuracy for inverter
2
2
-
-
Dimensional Delay Table Method
Dimensional Delay Table Method
Characterization
Points
Table Interpolation
Table Interpolation
S
req
C
req
Simulation Result for
Simulation Result for
t
t
f f
90%
10%
i_slew = 1ns
c_load = 0.1 ~ 0.5pf
Output Slew Rate v.s. Loading
Output Slew Rate v.s. Loading
in01d1 output slew time
0.00E+00
5.00E-10
1.00E-09
1.50E-09
2.00E-09
2.50E-09
3.00E-09
3.50E-09
2.20E-14 4.40E-14 1.10E-13 2.20E-13 4.40E-13 8.80E-13
load (f)
Time
i_slew_0.05ns
i_slew_0.25ns
i_slew_0.5ns
i_slew_1.0ns
i_slew_2.0ns
i_slew_5.0ns
input
output
input
output
90%
10%
90%
10%
i_slew = 0.1ns
c_load = 0.1pf
o_slew = 0.22ns
i_slew = 1ns
c_load = 0.1pf
o_slew = 0.36ns
Output Slew v.s. Input Slew Rate
Output Slew v.s. Input Slew Rate
in01d1 output slew time
0.00E+00
5.00E-10
1.00E-09
1.50E-09
2.00E-09
2.50E-09
3.00E-09
3.50E-09
i_slew_0.05ns i_slew_0.25ns i_slew_0.5ns i_slew_1.0ns i_slew_2.0ns i_slew_5.0ns
i_slew
Time
2.20E-14
4.40E-14
1.10E-13
2.20E-13
4.40E-13
8.80E-13
Delay/Slew Characterization
Delay/Slew Characterization
0
1
1
1
1
1
delay slew
Comparisons Between Models
Comparisons Between Models
Calculation Time:
#Cell Linear ISM Table
------------------------------------
Case1 97 2.8 19.9 14.5
Case2 1250 19.6 512.6 41.2
Case3 2226 30.2 724.2 56.4
Model Size (Core + IO):
Linear ISM Table
-----------------------
801K 3076K >10000K(8x8)
Dynamic Input Capacitance
Dynamic Input Capacitance
How to Measure the Input Cap.
How to Measure the Input Cap.
v(t)
t
r
t
f
Cell
under
test
v(t)
i(t)
f r
f FALL IN r RISE IN
AVERAGE IN
IN
t t
t C t C
C
dt t dv
t i
average C
+
+
=
|
.
|
\
|
=
) ( ) (
) (
/ ) (
) (
Configuration v.s. Delay
Configuration v.s. Delay
Practical Input Slew
Practical Input Slew
Setup and Hold Time
Setup and Hold Time
t
P-Clock-Q
t
HOLD
t
SETUP
Data
Clock
Q
Determine the Min. Setup Time
Determine the Min. Setup Time
Bisection Method
Bisection Method
Synthesis Model
Synthesis Model
cell(AND2) {
area : 2 ;
pin(A) {
direction : input ;
capacitance : 1.3 ;
}
pin(B) {
direction : input ;
capacitance : 1.3 ;
}
pin(Z) {
direction : output ;
function : "A * B" ;
timing() {
intrinsic_rise : 0.58 ;
intrinsic_fall : 0.69 ;
rise_resistance : 0.1378 ;
fall_resistance : 0.0465 ;
related_pin : "A B" ;
}}}
Simulation Model
Simulation Model
module mx21d1 (z, i0, i1, s);
input i0, i1, s;
output z;
not G3 (N3, s);
and G4 (N4, i0, N3),
G5 (N5, s, i1),
G6 (N6, i0, i1);
or G7 (z, N4, N5, N6);
endmodule
specify
(i0=>z)=(0.215:0.330:0.531, 0.195:0.300:0.483);
(i1=>z)=(0.208:0.320:0.515, 0.202:0.310:0.499);
(s=>z) =(0.221:0.340:0.547, 0.208:0.320:0.515);
endspecify
specparam
InCap$i0 = 0.028,
InCap$i1 = 0.027,
InCap$s = 0.042,
R_Ramp$i0$z = 0.722:1.110:1.787,
F_Ramp$i0$z = 0.806:1.240:1.996,
R_Ramp$i1$z = 0.722:1.110:1.787,
F_Ramp$i1$z = 0.806:1.240:1.996,
R_Ramp$s$z = 0.722:1.110:1.787,
F_Ramp$s$z = 0.806:1.240:1.996;
Results of Delay Calculation
Results of Delay Calculation
Setup & Hold Time Violation
Setup & Hold Time Violation
Clock
Normal Signal
Violated Signal
Interconnection Parasitic RC
Interconnection Parasitic RC
0
0.5
1
1.5
2
2.5
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Fanout
p
F
foundry default
actual model
of each region
Courtesy Avant!
Courtesy IBM Journal of R&D, Vol. 39
C. M. Huang / CIC-CBDC / 07.2004 STA - 2
Static Timing Analysis Overview
Static Timing Analysis (STA) is a method of validating the
timing performance of a design by checking all possible
paths for timing violations.
To check a design for violations, the STA tools:
zBreak the design down into a set of timing paths
zCalculates the signal propagation delay along each path
zChecks for violations of timing constraints inside the design and at
the input/output interface
Another way to perform timing analysis is to use dynamic
simulation, which determines the full behavior of the circuit
for a given set of input stimulus vectors.
Compared with dynamic simulation, static timing analysis is
much faster because it is not necessary to simulate the
logical operation of the circuit.
It is also more thorough because it checks all timing paths
(completeness), not just the logical conditions that are
sensitized by a particular set of test vectors.
Timing Paths
Timing Paths
The first step performed by STA tools for timing analysis is to
break the design down into a set of timing paths.
Each path has a start point and an end point.
The startpoint of a path is a clock pin of a sequential element,
or possibly an input port of the design
The endpoint of a path is a data input pin of a sequential
element, or possibly an output port of the design
Timing Paths
Timing Paths
Multiple Paths Through Combinational Logic
Multiple Paths Through Combinational Logic
Other Path Types
Other Path Types
Delay Calculation
Delay Calculation
After breaking down a design into a set of timing paths, the
STA tool calculates the delay along each path.
The total delay of a path is the sum of all cell and net delays
in the path.
Before layout, the chip topography is unknown, so STA tools
must estimate the net delays using wire load models.
After layout, an external tool can accurately determine the
delays and write them to a Standard Delay Format (SDF) file.
STA can read the SDF file and back-annotate the design
with the delay information for layout-accurate timing analysis.
Setup and Hold Checking for
Setup and Hold Checking for
FFs
FFs
Static Timing Analysis Flow
Static Timing Analysis Flow
Static Timing Analysis Summary
Static Timing Analysis Summary
Static analysis, without test pattern
Longest path analysis
Shortest path analysis
Hard to distinguish between
effective paths and false paths
Combinational
Logic
Storage
Element
A False Path Example
A False Path Example
1
0
Power Analysis and Optimization
Power Analysis and Optimization
C. M. Huang / CIC-CBDC / 07.2004 Power Analysis - 2
Gate
Gate
-
-
level Power Analysis Overview
level Power Analysis Overview
Power analysis is performed to determine the power
consumption of the chip based on the activity.
PrimePower is event based, so for every event it determines
the supply current and leakage current dissipated given the
states and dynamic conditions.
During power analysis, PrimePower report the following
types of power:
zSwitching power
zInternal power
zLeakage power
Static Power
Static Power
Static power is the power dissipated by a gate when it is not
switching, that is, when it is inactive or static.
Static power is dissipated in several ways. The largest
percentage of static power results from source-to-drain
subthreshold leakage, which is caused by reduced threshold
voltages that prevent the gate from completely turning off.
Static power is also dissipated when current leaks between
the diffusion layers and the substrate.
For this reason, static power is often called leakage power.
Dynamic Power
Dynamic Power
Dynamic power is the power dissipated when the circuit is active.
A circuit is active anytime the voltage on a net changes due to some
stimulus applied to the circuit.
Because voltage on an input net can change without necessarily
resulting in a logic transition on the output, dynamic power can be
dissipated even when an output net doesnt change its logic state.
The dynamic power of a circuit is composed of two kinds of power:
z Switching power
z Internal power
Switching Power
Switching Power
The switching power of a driving cell is the power dissipated by the
charging and discharging of the load capacitance at the output of the
cell.
The total load capacitance at the output of a driving cell is the sum of
the net and gate capacitances on the driving output.
Because such charging and discharging are the result of the logic
transitions at the output of the cell, switching power increases as logic
transitions increase. Therefore, the switching power of a cell is a
function of both the total load capacitance at the cell output and the
rate of logic transitions.
Switching power comprises a large percentage of the power
dissipation of an active CMOS circuit.
Internal Power
Internal Power
Internal power is any power dissipated within the boundary of
a cell.
During switching, a circuit dissipates internal power by the
charging or discharging of any existing capacitances internal
to the cell.
Internal power includes power dissipated by a momentary
short circuit between the P and N transistors of a gate, called
short-circuit power.
Internal Power
Internal Power
Internal Power
Internal Power
For circuits with fast transition times, short-circuit power can
be small.
However, for circuits with slow transition times, short-circuit
power can account for > 50% of the total power dissipated by
the gate.
Short-circuit power is affected by the dimensions of the
transistors and the load capacitance at the gates output.
In most simple library cells, internal power is due mostly to
short-circuit power. For this reason, the terms internal power
and short-circuit power are often considered synonymous.
Calculating Leakage Power
Calculating Leakage Power
PrimePower analysis computes the total leakage power of a
design by summing the leakage power of the designs library
cells.
Leakage power can sometimes depend on the logical
condition of the cell. Such leakage power is state-dependent
leakage power.
For designs that are active most of the time, leakage power
is often less than 1% of the total power. However, for
designs that are usually idle, modeling leakage power is
important. It also increases dramatically when using low Vt
cells.
Calculating Dynamic Power
Dynamic power is the power dissipated when a circuit is
active.
PrimePower calculates the dynamic switching energy for
every event.
For every event, the switching power is calculated by
summing the switching energy and dividing it by a small time
interval.
The average switching power is derived by summing the total
dynamic energy and dividing by the total simulation time.
Glitch Power
Glitch Power
Glitch power is defined in PrimePower as follows. If 2 toggles
are very close to each other, and the time interval of two
toggles is less than the rise and fall transition time of this pin,
then these two toggles are glitch.
PrimePower uses the sum of rise and fall transition time and
the time interval of the two toggles to scale the glitch power.
Z States
Z States
Transitions through the Z state are assumed to consume no
power.
PrimePower tracks it and if the transition after the Z state
differs from the original transition, it is considered to be one
full transition.
For example: 0 -> Z -> 1 or 1 -> Z -> 0 are considered as 1
transition.
However; 1 -> Z -> 1 or 0 -> 0 is not a transition and
consumes no power.
X States
X States
X Power is consumed to transitions to and from the X state.
By default, every time a net transitions to or from the X state;
it is considered as 1/2 of a power transition.
For example: 0 -> X = 1/2 transition
0 -> X -> 0 or 0 -> X -> 1 = 1 transition
PrimePower
PrimePower
Simulation Flow
Simulation Flow
First, you run an HDL simulation to produce files that contain
switching activity as a function of time.
Next, you run PrimePower to build a detailed power profile of
the design based on the circuit connectivity, the switching
activity, net capacitance, and the cell-level power behavior
data in the Synopsys .db library. PrimePower calculates the
dynamic and static power behavior of a circuit at the cell
level and reports power consumption at the chip, block, and
cell levels.
Design for Testability and
Design for Testability and
Automatic Test Pattern
Automatic Test Pattern
Generation
Generation
C. M. Huang / CIC-CBDC / 07.2004 DFT - 2
Why Perform Manufacturing Testing?
Why Perform Manufacturing Testing?
Functional testing verifies that your circuit performs as it is intended to
perform. For example, assume you have designed an adder circuit.
Functional testing verifies that this circuit performs the addition function
and computes the correct results over the range of values tested.
Manufacturing testing verifies that your circuit does not have
manufacturing defects by focusing on circuit structure rather than
functional behavior. Manufacturing defects include problems such as
z Power or ground shorts
z Open interconnect on the die caused by dust particles
z Short-circuited source or drain on the transistor caused by metal spike-through
Typically development teams perform both functional and
manufacturing testing of devices.
Stuck
Stuck
-
-
At Fault Models
At Fault Models
When a manufacturing defect occurs, the physical defect has
a logical effect on the circuit behavior.
An open connection can appear to float either high or low,
depending on the technology.
A signal shorted to power appears to be permanently high.
A signal shorted to ground appears to be permanently low.
Many manufacturing defects can be represented using the
industry-standard stuck-at fault model.
Stuck
Stuck
-
-
At Fault Models
At Fault Models
The stuck-at-0 model represents a signal that is permanently
low regardless of the other signals that normally control the
node.
The stuck-at-1 model represents a signal that is permanently
high regardless of the other signals that normally control the
node.
Example: a two-input AND gate with stuck-at-0 fault on
output pin
Detecting Stuck
Detecting Stuck
-
-
At Faults
At Faults
The node of a stuck-at fault must be controllable and
observable for the fault to be detected.
A node is controllable if you can drive it to a specified logic
value by setting the primary inputs to specific values. A
primary input is an input that can be directly controlled in the
test environment.
A node is observable if you can predict the response on it
and propagate the fault effect to the primary outputs where
you can measure the response. A primary output is an output
that can be directly observed in the test environment.
Detecting Stuck
Detecting Stuck
-
-
At Faults
At Faults
To detect a stuck-at fault on a target node,
zControl the target node to the opposite of the stuck-at value by
applying data at the primary inputs.
zMake the nodes fault effect observable by controlling the value at
all other nodes affecting the output response, so the targeted node
is the active (controlling) node.
Detecting Stuck
Detecting Stuck
-
-
At Faults
At Faults
The set of logic 0s and 1s applied to the primary inputs of a
design is called the input stimulus.
The resulting values at the primary outputs, assuming a fault-
free design, are called the expected response.
The actual values measured at the primary outputs are
called the output response.
If the output response does not match the expected
response for a given input stimulus, the input stimulus has
detected the fault.
Detecting Stuck
Detecting Stuck
-
-
At Faults
At Faults
For the two-input AND gate, apply a logic 1 at both inputs.
The expected response for this input stimulus is logic 1, but
the output response is logic 0.
The input stimulus (1,1) detects the stuck-at-0 fault.
Detecting Stuck
Detecting Stuck
-
-
At Faults
At Faults
This method of determining the input stimulus to detect a
fault uses the single stuck-at fault model.
The single stuck-at fault model assumes that only one node
is faulty and that all other nodes in the circuit are good.
The single stuck-at fault model greatly reduces the
complexity of fault modeling and is technology independent,
enabling the use of algorithmic pattern generation techniques.
Detecting Stuck
Detecting Stuck
-
-
At Faults
At Faults
To detect the fault, control the output of cell G2 to logic 1 (the
opposite of the faulty value) by applying a logic 0 value at
primary input C.
Detecting Stuck
Detecting Stuck
-
-
At Faults
At Faults
To ensure that the fault effect is observable at primary output
Z, control the other nodes in the circuit so that the response
value at primary output Z depends only on the output of cell
G2, as follows:
zApply a logic 1 at primary input D so the output of cell G3 depends
only on the output of cell G2. The output of cell G2 is the controlling
input of cell G3.
zApply logic 0s at primary inputs A and B so the output of cell G4
depends only on the output of cell G2.
Detecting Stuck
Detecting Stuck
-
-
At Faults
At Faults
Given the input stimulus of A = 0, B = 0, C = 0, and D = 1, a
fault-free circuit produces a logic 1 at output port Z.
If the output of cell G2 is stuck-at-0, the value at output port
Z is a logic 0 instead.
Thus, this input stimulus detects a stuck-at-0 fault on the
output of cell G2.
This set of input stimulus and expected response values is
called a test vector.
Following the process previously described, you can
generate test vectors to detect stuck-at-1 and stuck-at-0
faults for each node in the design.
Determining Fault Coverage
One definition of the testability of a design is the extent to
which the design can be tested for the presence of
manufacturing defects, as represented by the single stuck-at
fault model.
Using this definition, the metric used to measure testability is
fault coverage, which is defined as
An Undetectable Fault
An Undetectable Fault
U2
U3
U1
U0
A
Y
B
C
D
1/0
Some circuits have inherently undetectable faults.
One reason, shown here, is reconvergent fanout.
No pattern can be devised to detect fault U2/Z SA0.
Network N 2
For larger combinational designs and sequential designs, it is not
feasible to analyze the fault coverage results for existing functional test
vectors or to manually generate test vectors to achieve high fault
coverage results.
Fault simulation tools determine the fault coverage of a set of test
vectors.
Automatic test pattern generation (ATPG) tools generate
manufacturing test vectors.
Both of these automated tools require models for all logic elements in
your design to correctly calculate the expected response.
Fault Simulation
Fault Simulation
Fault simulation determines the fault coverage of a set of test
vectors.
It can be thought of as performing many logic simulations
concurrentlyone that represents the fault-free circuit (the
good machine), and many that represent the circuits
containing single stuck-at faults (the faulty machines).
Fault simulation detects a fault each time the output
response of the faulty machine differs from the output
response of the good machine for a given vector.
Fault Simulation
Fault Simulation
Fault simulation determines all faults detected by a test
vector. Fault simulating the test vector generated to detect
the stuck-at-0 fault on the output of G2 in page-10 shows
that this vector also detects the following single stuck-at
faults:
zStuck-at-1 on all pins of G1 (and ports A and B)
zStuck-at-1 on the input of G2 (and port C)
zStuck-at-0 on the inputs of G3 (and port D)
zStuck-at-1 on the output of G3
zStuck-at-1 on the inputs of G4
zStuck-at-0 on the output of G4 (and port Z)
Fault Simulation
Fault Simulation
You can generate manufacturing test vectors by manually
generating test vectors, then fault simulating them to
determine the fault coverage.
For large or complex designs, this process is time consuming
and often does not result in high fault coverage results.
Automatic Test Pattern Generation
ATPG generates test patterns and provides fault coverage
statistics for the generated pattern set.
ATPG for combinational circuits is well understood; it is
usually possible to generate test vectors that provide high
fault coverage for combinational designs. Combinational
ATPG tools use both random and deterministic techniques to
generate test patterns for stuck-at faults on cell pins.
During random pattern generation, the tool assigns input stimulus in a
pseudo-random manner, then fault simulates the generated vector to
determine which faults are detected. As the number of faults detected
by successive random patterns decreases, ATPG shifts to a
deterministic technique.
During deterministic pattern generation, the tool uses a pattern
generation algorithm based on path-sensitivity concepts to generate a
test vector that detects a specific fault in the design. After generating a
vector, the tool fault simulates the vector to determine the complete set
of faults detected by the vector. Test-pattern generation continues until
all faults have either been detected or have been identified as
undetectable by this algorithm.
Because of the effects of memory and timing, ATPG for sequential
circuits is much more difficult than for combinational circuits. It is often
not possible to generate high fault coverage test vectors for complex
sequential designs, even when using sequential ATPG.
Sequential ATPG tools use deterministic pattern generation algorithms
based on extended applications of the path-sensitivity concepts.
Structured DFT techniques, such as internal scan, simplify the test-
pattern generation task for complex sequential designs, resulting in
higher fault coverage and reduced testing costs
Test for SA0 fault here
1
0
0
1
Need to observe results
at the output of the
design.
Need to set input pins to specific values
so that nets within pipeline can be set to
values which test for a fault
Each fault tested requires a predictive means for both controlling the input
and observing the results downstream from the fault.
Utilize each available flip-flop, by pre-loading it with a bit of data to
control logic gates upstream and/or observe logic gates downstream.
So How Do We Test Sequential Circuits?
So How Do We Test Sequential Circuits?
What Is Internal Scan?
Internal scan design is the most popular DFT technique and has the
greatest potential for high fault coverage results.
This technique simplifies the pattern generation problem by dividing
complex sequential designs into fully isolated combinational blocks
(full-scan design) or semi-isolated combinational blocks (partial-scan
design).
Internal scan modifies existing sequential elements in the design to
support a serial shift capability in addition to their normal functions.
This serial shift capability enhances internal node controllability and
observability with a minimum of additional I/O pins.
D flip-flop modified to support internal scan
The modified sequential cells are chained together to form one or more
large shift registers.
These shift registers are called scan chains or scan paths.
The sequential cells connected in a scan chain are scan controllable
and scan observable.
A sequential cell is scan controllable when it can be set to a known
state by serially shifting in specific logic values.
ATPG tools consider scan controllable cells pseudo-primary inputs of
the design.
A sequential cell is scan observable when its state can be observed by
serially shifting out data.
ATPG tools consider scan observable cells pseudo-primary outputs of
the design.
Adding scan circuitry to a design usually has the following
effects:
zDesign size and power increase slightly because scan cells are
usually larger than the nonscan cells they replace, and the nets
used for the scan signals occupy additional area.
zDesign performance (speed) decreases marginally because of
changes in the electrical characteristics of the scan cells that
replace the nonscan cells.
zGlobal test signals that drive many sequential elements might
require buffering to prevent electrical design-rule violations.
For scan designs, ATPG tools generate input stimulus for the
primary inputs and pseudo-primary inputs and expected
responses for the primary outputs and pseudo-primary
outputs. The set of input stimulus and output response that
includes primary inputs, primary outputs, pseudo-primary
inputs, and pseudo-primary outputs is called a test pattern or
scan pattern. A test pattern represents many test vectors
because
zThe pseudo-primary-input data must be serialized to be applied at
the input of the scan chain
zThe pseudo-primary-output data must be serialized to be measured
at the output of the scan chain
Full
Full
-
-
Scan Design
Scan Design
In the full-scan design technique, all sequential cells in your
design are modified to perform a serial shift function.
Partial
Partial
-
-
Scan Design
Scan Design
In the partial-scan design technique, the scan chains contain
some, but not all, of the sequential cells in your design.
Applying Scan Patterns
Applying Scan Patterns
C. M. Huang / CIC-CBDC / 07.2004 P&R - 2
Cell
Cell
-
-
based Physical Layout
based Physical Layout
I/O Pads
Hard Macros
Standard Cells
z Flattening
z Collect to soft blocks
z Collect to groups
& regions
Macro-1
Macro-2 Macro-3
Physical Design Flow
Physical Design Flow
Floorplanning
z Partitioning, I/O placement, macros placement, groups & regions creation
Preroute nets
z Power ring, power straps, global signals
Place standard cells
z placement & optimization, scan chains optimization
Clock tree synthesis
Route standard cells
z Global route, track assignment, detail route, routing optimization
Physical
Physical
Floorplanning
Floorplanning
Soft macros, Hard macros, IPs and floorplan groups
Block placement
Interactive floorplan adjustment & edit
z Move, mirror, rotate, align
z Change size, aspect ratio
z Reserve routing layers
z Add /delete/modify/move/align pins/bus pins
Pin assignment
Early analysis: timing, congestion,
Courtesy Avant!
Flattening or Hierarchical
Flattening or Hierarchical
Assume a design contains 10,000 cells
Flattening design
z P&R iteration: at least 1 time
z Memory & computing time: large & long
z Die size: relative small
Hierarchical design
z Partition into 5 soft blocks, each contains 2,000 cells
z P&R iteration: at least 6 times
z Memory & computing time: relative small & short
z Die size: relative large
z Design reuse potential
Regions & Groups
Regions & Groups
Geometrical constraints
z Region: location constraint
z Group: proximity constraint
z One cell can belong to one
region only
z One cell can belong to
multiple groups
Power Frame
Power Frame
Route power rings
Route power straps
Connect macros & pads
to rings & straps
Macro-1
Macro-2 Macro-3
Placement Drivers
Placement Drivers
Congestion-driven: minimizes wire length & congestion in the design
Capacitance/length-driven: maximizes the placement for capacitance
considerations
Timing-driven: meets the timing requirements
Power-driven: reduce dynamic power consumption by minimizing length of
nets that have high switching frequency
Rail-driven: reduce the IR-drop along the standard cell P/G rails
Heat-driven: minimize hot-spots by reducing the density of cells with high
power consumption
Clock-driven: reduce clock net skew
Example: Rail
Example: Rail
-
-
driven Placement
driven Placement
Minimize IR drop along standard cell row
Courtesy Avant!
Evaluating Placement Result
Evaluating Placement Result
Courtesy Avant!
Clock Tree Synthesis
Clock Tree Synthesis
Minimize clock skew, but increase delay
Multi-level clock tree synthesis
Fish-bone clock tree synthesis
Routing
Routing
Route standard cells pins to P/G
Zero skew routing for clock nets
Global Route: assign nets to GR cells
Assign nets to wire tracks within GR cells, it can make
long routes straight and reduce the number of vias
Detail route: uses the general pathways suggested by
global routing and tract assignment to place paths and
vias in order to route the nets
Package Selection
Package Selection
DIP
PGA
CLCC
CQFP
Package pin count
Package die cavity
Package connector type
Unused pin can be left
unconnected
Exporting the Design
Exporting the Design
Translate design data into GDSII or other format
The GDSII will be used for post-layout verification
A error-free GDSII file can be sent to foundry for
fabrication
Before exporting the design, make sure your layer number
definitions are consist with foundry definitions, otherwise,
use a layer mapping table to correct it
Post
Post
-
-
layout Verifications
Post
Post
-
-
For geometrical verification
z DRC, ERC
For topological verification
z LVS
For functional & performance verification
z Circuit extraction, RC extraction
z Circuit-level post-sim / STA
z Gate-level post-sim / STA
z Power analysis: static & dynamic
DRC & ERC
DRC & ERC
Minimum width
Minimum spacing
Minimum enclosure
Short
Open
Black box
Whole chip
LVS
LVS
Extract netlist from layout
Perform device merging
Compare netlists
I
A
B
O
A
B
C
O
B
A
I
O
I
C
O
B
A
I
Parasitic RC Extraction
Parasitic RC Extraction
Courtesy IBM Journal of R&D, Vol. 39
Make sure the design is
LVS error-free
Device extraction
RC extraction
RC reduction
Name mapping for
back-annotation
Power Analysis
Power Analysis
Static power analysis
z Full-chip power distribution verification
z Current density for electro-migration diagnosis
Dynamic power simulation
z IR drop simulation
z Detailed electrical simultaneous switching simulation
C. M. Huang / CIC-CBDC / 07.2004 FEC - 2
Functional Verification
FUNCTIONAL VERIFICATION involves proving that an IC
design works properly relative to specification, protocol, and
functionality.
It does not necessarily guarantee correct operation with
respect to timing, power, noise, routability, and so forth.
Other tools and methodologies address these important
concerns.
Functional Verification Can Certify...
Functional Verification Can Certify...
Multipliers and adders give the correct answer based on a
particular standard
A register-transfer-level (RTL) and gate-level model are
equivalent
A bus protocol (such as PCI, AMBA, or RapidIO) is implemented
according to a standard
CPUs properly implement their instruction set architectures (ISAs)
Digital signal processors correctly calculate their intended
functions
Distributed-memory designs maintain cache coherency.
Three main ways to functionally verify designs:
zSimulation
zEmulation
zFormal verification
Two basic qualities to verify:
zEquivalence of models, such as whether the gate model
implements the RTL correctly
zProperties of models, such as whether a design maintains cache
coherency or whether multipliers give the correct result.
Why Formal Verification
Todays design methodology requires regression testing at
several points in the design process.
Currently, traditional simulation tools, such as event-driven
and cycle-based simulators, handle this regression testing.
However, as designs become larger and more complex and
require more simulation vectors, regression testing with
traditional simulation tools becomes a bottleneck in the
design flow.
The bottleneck is caused by these factors:
zLarge numbers of simulation vectors are needed to provide
confidence that the design meets the required specifications.
zLogic simulators must process more events for each stimulus
vector because of increased design size and complexity.
zMore vectors and larger design sizes cause increased memory
swapping, slowing down performance.
Formal verification significantly reduces the amount of time and labor
required to verify design behaviors.
As designs become more complex, verification through simulation
alone becomes more time consuming and less likely to ensure
comprehensive coverage.
Additionally, coverage analysis tools are inherently unable to verify
functional correctness; therefore, they may promote false security in
the thoroughness of the design verification.
Formal verification, which relies instead on mathematical techniques to
prove or disprove design functionality, is a viable alternative to
extensive simulation.
Equivalence checkers are the most common type of formal
verification tool.
They use formal mathematical techniques to verify logic
functions by comparing input and output conditions and
matching one design iteration with the next. This allows you
to determine if one design is functionally equivalent to
another.
Additionally, equivalence checkers allow you to verify the
consistency of a design as it is transformed from one level of
abstraction to another, such as when a Register Transfer
Level (RTL) design is synthesized into gates.
Formal Property Checking
Formal Property Checking
Property checkers are more sophisticated than equivalence
checkers in that they verify functional intent.
These tools use formal mathematical techniques to
determine if a design obeys behavioral characteristics
defined by the user, typically in the form of assertions or
properties.
ASIC Design Flow Introduces Changes
ASIC Design Flow Introduces Changes
Synthesis
zSignal name transformations
zAssumptions about don't care
conditions
Test logic insertion
Clock tree insertion
Place and route
zDifferent logical and physical
hierarchies
Timing optimization
Manual ECO
Verification of Two RTL Designs
Verification of Two RTL Designs
When you make an architectural change to an RTL design,
use FEC tools to verify that you did not change how the
design functions.
In this situation, you are verifying an RTL implementation
against an original RTL reference design.
Situations where this type of regression testing becomes
necessary include
zAdding clock gating circuitry for power reduction
zRestructuring critical paths
zReorganizing logic for area reduction
Verification of RTL and Gate
Verification of RTL and Gate
-
-
level Design
level Design
Verification of an RTL design against a gate-level design can
occur at several points in the design methodology:
zFor example, it is important to verify the gate-level implementation
that results from synthesis against the golden RTL design for
functional equivalence. This gate-level design becomes the golden
design used in verifying subsequent implementations.
zAnother example is when you make minor functional changes in
the gate-level netlist and you simultaneously update the RTL
source without using synthesis. In this case, you can use FEC tools
to verify that the changes made in the RTL source match the
current implementation.
Verification of Two Gate
Verification of Two Gate
-
-
level Designs
level Designs
Any time you produce a new gate-level implementation of the
design by making nonfunctional changes, you can use FEC
tools to verify the functional equivalence of the design.
These are typical changes that result in a new
implementation but do not affect functionality:
zAdding test logic (scan circuitry) to a design
zReordering a scan chain in a design
zInserting a clock tree into a design
zAdding I/O pads to the netlist
zPerforming design layout
zPerforming flattening and cell sizing
Formal Equivalence Checking Examples
Formal Equivalence Checking Examples
Design Equivalence (Verification Modes)
Design Equivalence (Verification Modes)
Design consistency
For every input pattern for which the reference design defines a 1
or 0 response, the implementation design gives the same response.
If there is a dont-care (X) condition in the reference, verification still
passes if there is a 0 or a 1 at the equivalent point in the
implementation.
Design equality
Includes design consistency with additional requirements. The
functions of the implementation and reference designs must be
defined for exactly the same set of input patterns. If there is a dont-
care (X) condition in the reference, verification passes only when
there is an X at the equivalent point in the implementation.
Design Consistency
Design Consistency
Sometimes conditions exist where one design (design A) is
consistent with a second design (design B), but design B is
not consistent with design A.
For example, design B might have a dont care condition that
is implemented as a 0 or 1 in design A.
Design Consistency
Design Consistency
If you run a verification with design A as the reference and design B as the
implementation, verification passes (X in the reference versus 1 in the
implementation).
However, if you run a verification with design B as the reference and design A as the
implementation, verification fails (1 in the reference versus X in the implementation).
module A(d, q);
input [1:0] d;
output q;
reg q;
always @ (d)
case (d)
2b00: q = 1b0 ;
2b01: q = 1b1 ;
2b10: q = 1b1 ;
2b11: q = 1bX ;
endcase
endmodule
module B(d, q);
input [1:0] d;
output q;
reg q;
always @ (d)
case (d)
2b00: q = 1b0 ;
2b01: q = 1b1 ;
2b10: q = 1b1 ;
2b11: q = 1b1 ;
endcase
endmodule
Equivalence Checking Mechanics
Equivalence Checking Mechanics
D. Brand, Verification of Large Synthesis Designs, Proc. Intl Conf. Computer-Aided
Design (ICCAD 93), IEEE CS Press, Los Alamitos, Calif., 1993, pp. 534-537.
"Miter"
Equivalence Proof Techniques
Equivalence Proof Techniques
Given two combinational circuits, F
spec
(X) and F
impl
(X), where
X represents an input vector defined on input variables (x
1
,
x
2
,... x
n
), we can establish equivalence between the two
circuits by proving that the following equation is non-
satisfiable for all values of X:
F
spec
(X) F
impl
(X)
Approaches developed to solve this problem include
techniques based on binary decision diagrams (BDDs),
ATPG, and simulations.
BDD
BDD
-
-
based Techniques
based Techniques
R. E. Bryant, "Graph-based Algorithms for Boolean Function
Manipulation," IEEE Trans. Computers, vol. 35, no.8, Aug.
1986, pp. 677-691.
First constructing a BDD for each function, then determining
if the two functions share the same BDD representation
(canonical representations are isomorphic for equivalent
functions).
Alternatively, either the BDD for function F
spec
(X) F
impl
(X)
equals 0 when the two functions are equal or the final BDD
represents the set of all witnesses that distinguish the two
functions.
BDDs
BDDs
for Basic Functions
for Basic Functions
Build
Build
BDDs
BDDs
for Boolean Functions
for Boolean Functions
node or known function nodes
Evaluate Function from Its BDD
Evaluate Function from Its BDD
Ordered
Ordered
BDDs
BDDs
Reduced
Reduced
BDDs
BDDs
Results on
Results on
BDDs
BDDs
Given an ordering, the reduced BDD for a function is
unique. i.e., ROBDD is canonical.
AND, OR, and complement are inexpensive.
Tautology checking is in constant time.
The size of the BDD is exponential in the number of
variables in the worst case.
Good ordering is not trivial to find.
It is not the format for circuit implementation.
http://www.enee.umd.edu/class/enee644.S2003/notes/2slides/BDD1.pdf
BDD
BDD
-
-
based Techniques
based Techniques
Many practical functions can be represented with BDDs.
However, a BDDs size is susceptible to the ordering of its
supporting variables.
Poor variable ordering can make memory size explode. In
the worst case, BDD size can grow exponentially with
respect to the number of inputs for certain functions.
Nevertheless, designers can control the BDD size for many
functions by exploiting the circuit structure during BDD
variable ordering. Other methods include iterative
improvement techniques based on BDD variable swapping.
ATPG
ATPG
-
-
based Techniques
based Techniques
J. P. Roth, Hardware Verification, IEEE Trans. Computer-
Aided Design, vol. 26, no. 12, Dec. 1977, pp. 1292-1294.
Check equivalence between two combinational circuits
F
spec
(X) and F
impl
(X), by running ATPG on a stuck-at-0 fault
applied to the XOR product of the two functions
(F
spec
(X) F
impl
(X)).
zIf the stuck-at-0 fault is untestable, then the two circuits are
equivalent.
zHowever, if the ATPG justification step identifies a witness for the
stuck-at-0 fault, then the two circuits are unequal, and the ATPG
distinguishing vector provides a debugging source.
BDD
BDD
-
-
v.s. ATPG
v.s. ATPG
-
-
based Approaches
based Approaches
The complexity for BDD-based techniques resides in the
special domain and potentially suffers from memory
explosion, the complexity for the APTG-based techniques
resides in the time domain and potential time-outs.
BDD
BDD
-
-
v.s. ATPG
v.s. ATPG
-
-
based Approaches
based Approaches
BDD-based techniques identify all witnesses for the case
when F
spec
(X) does not equal F
impl
(X), whereas ATPG-based
techniques find only a single witness.
Hence, if the two functions are suspected to be unequal,
then applying ATPG is generally more efficient than using
BDDs.
When there is very little structural similarity between the two
circuits, BDD-based techniques are generally more efficient
at proving equivalence.
Simulation
Simulation
-
-
based Techniques
based Techniques
Although proving equivalence using simulation is incomplete
for large designs, some equivalence checkers use simulation
to exhaustively verify trivial functions.
Additionally, simulation-based techniques can be combined
with BDD- and ATPG-based approaches to create a powerful
proof engine.
Simulation
Simulation
-
-
based Techniques
based Techniques
J. Burch and V. Singhal, Tight Integration of Combinational
Verification Methods, Proc. Intl Conf. Computer-Aided
Design (ICCAD 98), IEEE CS Press, Los Alamitos, Calif.,
1998, pp. 570-576.
Combine simulation and BDD-based techniques with
Boolean satisfiability (SAT) techniques. (SAT algorithms
search a conjunctive normal form (CNF) Boolean formula for
assignments that satisfy each of its conjunctive clauses)
Reducing Complexity
Reducing Complexity
By exploiting the structural similarity between two designs,
we can partition the combinational equivalence-checking
problem into a set of smaller, simpler problems.
Simulation is one technique used to identify structural
similarity between two designs.
For example, consider simulating a large set of random
vectors. While calculating a signature on each internal signal
(based on its response to the input vector), we can identify a
set of candidate equivalent signal pairs between the two
designs. These equivalent pairs are called cutpoints.
Compare (Key) Points
Compare (Key) Points
A compare point is a design object used as a combinational
logic endpoint during verification.
A compare point can be an output port, register, latch, black
box input pin, or net driven by multiple drivers.
FEC Tools use the following design objects to automatically
create compare points:
zPrimary outputs
zSequential elements
zBlack box input pins
zNets driven by multiple drivers, where at least one driver is a port
or black box
Compare Points
Compare Points
FEC tools verify a compare point by comparing the logic
cone from an implementation compare point against a logic
cone for a matching compare point from the reference design.
Logic Cones
Logic Cones
A logic cone consists of all logic that funnels down to, and
drives, a key point.
A logic cone can have any number of inputs, but only one
output.
Constructing Compare Points
Constructing Compare Points
Combinational Design Changes
Combinational Design Changes
Internal scan insertions
Boundary scan insertions
Clock tree buffers
Internal Scan Insertions
Internal scan insertion is a technique used to make it easier
to set and observe the state of registers internal to a design.
During scan insertion, scan flops replace flip-flops. The scan
flops are then connected into a long shift register.
The additional logic added during scan insertion means that
the combinational function has changed.
After determining which pins disable the scan circuitry,
disable the inserted scan logic.
Boundary Scan Insertions
Boundary scan is similar to internal scan in that it involves
the addition of logic to a design.
This added logic makes it possible to set and observe the
logic values at the primary inputs and outputs (the
boundaries) of a chip.
Boundary scan is also referred to as the IEEE 1149.1 Std.
specification.
Boundary scan is similar to internal scan in that it involves
the addition of logic to a design.
This added logic makes it possible to set and observe the
logic values at the primary inputs and outputs (the
boundaries) of a chip.
Boundary scan is also referred to as the IEEE 1149.1 Std.
specification.
Designs with boundary-scan registers inserted requires
setup attention because:
zThe logic cones at the primary outputs differ.
zThe boundary-scan design has extra state holding elements.
Boundary scan must be disabled in your design in the
following cases:
zIf the design contains an optional asynchronous TAP reset pin
(such as TRSTZ or TRSTN)
zIf the design contains only the four mandatory TAP inputs (TAS,
TCK, TDI and TDO)
Clock Tree Buffers
Clock Tree Buffers
Clock Tree Buffers
Clock Tree Buffers
Without user intervention, a verification of top, which
instantiates bloaka, will succeed.
Without user intervention, a module-level verification of
blocka, will fail because:
zThe pre-buffer blocka has three clock ports
In the pre-buffer design the clock pin of ff3 is clk
zThe post-buffer design has one
In the post-buffer design the clock pin of ff3 is clk3
Sequential Design Changes
Sequential Design Changes
Re-encoding an FSM
Pipeline re-timing
Gating the clock
Adding asynchronous bypass circuitry to registers
Pushing inversions across registers
Duplicating or merging registers
Re
Re
-
-
encoding an FSM
encoding an FSM
The architecture for a finite state machine (FSM) consists of
a set of flip-flops for holding the state vector, and a
combinational logic network that produces the next state
vector and the output vector.
Before verifying a re-encoded FSM in the implementation
against its counterpart in the reference design, you must
define the FSM state vectors and establish state names with
their respective encoding that allow FEC tools to make
verification possible.
Number of states must be the same, number of state bits can
be different.
Pipeline Re
Pipeline Re
-
-
timing
timing
Re-timing a design involves moving logic across register
boundaries
It is done to meet timing and area constraints
Re-timing can occur during synthesis or by making manual
changes to a design
Pipeline Re
Pipeline Re
-
-
timing
timing
Although the sequential behavioral of the design is not
changed, the register to register behavioral is
Individual logic cones will fail
New logic cones may appear in one design which do not
occur in the other
The number of pipeline stages must be the same in both
designs to be verified
The number of registers can be different
Gating the Clocking
Gating the Clocking
Clock gating can be used to implement load enable signals
in synchronous registers. It results in more power-efficient
circuits than multiplexer-based solutions.
In its simplest form, clock gating is the addition of logic in a
register's clock path that disables the clock when the register
output is not changing.
Gating the Clocking
Gating the Clocking
Gating the Clocking
Gating the Clocking
There are two clock-gating styles that are widely used in
designs: combinational clock gating and latch-based clock
gating. Both techniques often use a single AND or a single
OR gate to eliminate unwanted transitions on the clock signal.
Clock gating results in the following two failing points:
zA compare point is created for the clock-gating latch. This compare
point does not have a matching point in the other design, causing it
to fail.
zThe logic feeding the clock input of the register bank changes.
Thus, the compare points created at the register bank fail.
Combinational Gate Clocking
Combinational Gate Clocking
If glitches occur on the signal load_en, invalid data can
be loaded into the register. Hence, these two circuits are
functionally nonequivalent
Latch
Latch
-
-
based Clock Gating
based Clock Gating
Adding Asynchronous Bypass Circuitry
Adding Asynchronous Bypass Circuitry
A sequential cell where some of the asynchronous inputs
have combinational paths to the outputs is said to have an
asynchronous bypass.
Asynchronous bypass logic can result from:
zMapping from one technology library to another.
zVerilog simulation libraries. The Verilog module instantiates logic
creating a combinational path that directly affects the output of a
sequential user defined primitive (UDP).
zModeling a flip-flop with RTL code. The RTL has an explicit
asynchronous path defined or the RTL specifies that both Q and
QN have the same value when Clear and Preset are both active.
Asynchronous Bypass Failing Point
Asynchronous Bypass Failing Point
Pushing Inversions Across Registers
Pushing Inversions Across Registers
Inversion push means moving an inversion across register
boundaries.
Inversion pushing can cause two failing points.
Duplicating or Merging Registers
Duplicating or Merging Registers
Additional registers are sometimes inserted for additional
drive strength or speed.
Duplicate registers are often merged in order to optimize a
design.
FEC Process Flow
FEC Process Flow
Preparing the designs and
specifying constraints and
parameters
Mapping
Comparing key points
Diagnosing functional
differences
Formal Equivalence Checking Tools
Formal Equivalence Checking Tools
Verplex (Cadence) Conformal LEC
Synopsys Formality
Mentor Graphics FormalPro
FEC Limitations
FEC Limitations
FEC tools check logic function, not timing. To test timing, use
static timing analysis.
Some FEC tools can not compare designs that they have
different state encodings.
Comparisons between arithmetic operations and their gate-
level implementations can encounter problems. For example,
a comparison between a 64-bit A + B * C arithmetic
expression and its gate-level version will result in a timeout
condition if the synthesis has introduced optimizations that
merge the logic for the ADD and Multiply operations.
References
References
J. Bergeron, Writing Testbenches: Functional Verification of
HDL Models, Second Edition, Kluwer Academic Publishers,
2003
L. Bening and H. Foster, Principles of Verifiable RTL Design:
A Functional Coding Style Supporting Verification Processes
in Verilog, Kluwer Academic Publishers, 2000
D. Dempster and M. Stuart, Verification Methodology Manual:
Techniques for Verifying HDL Designs, Second Edition,
Teamwork International, 2001
F. Nekoogar, Timing Verification of Application-Specific
Integrated Circuits (ASICs), Prentice Hall PTR, 1999
C. Pixley, "Formal Verification of Commerical Integrated
Circuits," IEEE Design and Test of Computers, vol. 18, no. 4,
Jul.-Aug. 2001, pp. 4-5.
H. Foster, "Applied Boolean Equivalence Verification and
RTL Static Sign-Off," IEEE Design and Test of Computers,
vol. 18, no. 4, Jul.-Aug. 2001, pp. 6-15.
H. Hulgaard, P. F. Williams, and H. R. Andersen,
"Equivalence Checking of Combinational Circuits Using
Boolean Expression Diagrams," IEEE Trans. Computer-
Added Design of Integrated Circuits and Systems, vol. 18, no.
7, Jul. 1999, pp. 903-917.

Cell - Based IC Design, Implementation and Verification

Enviado por

Dados do documento

Descrição original:

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Cell - Based IC Design, Implementation and Verification

Enviado por

Direitos autorais:

Formatos disponíveis

Hedenstierna and Jeppson, 1987

Você também pode gostar