DOC-Reducing The Hardware Complexity of A Parallel Prefix Adder

Reducing the Hardware Complexity of a
Parallel Prefix Adder

Abstract
Currently, parallel prefix adders (PPA) are considered effective combinational circuits for
performing the binary addition of two multi-bit numbers. These adders are widely used in
arithmetic-logic units, which are parts of modern processors, such as microprocessors, digital
signal processors, etc. This paper deals with Kogge-Stone adder, which is one of the fastest PPA.
When performing the schematic implementation, this adder has a large hardware complexity.
Therefore, in this work for reducing its hardware complexity the scheme of modified PPA has
been developed.
Keywords- parallel prefix adder (PPA); Kogge-Stone adder; modified parallel prefix adder; the
number of logic gates
Chapter 1
Introduction
Addition is a prefix problem, which means that each result bit is dependent on all input bits of
equal or lower magnitude. Propagation of a carry signal from each bit position to all higher bit
positions is necessary. Carry propagate adders perform this operation immediately. The required
carry propagation from the least to the most significant bit results in a considerable circuit delay,
which is a function of the word length of the input operands. The most efficient way to speed-up
addition is to avoid carry propagation, thus saving the carries for later processing. This allows
the addition of two or more numbers in a very short time, but yields results in a redundant (carry-
save) number representation
Carry-save adders — as the most commonly used redundant arithmetic adders — play an
important role in the efficient implementation of multi operand addition circuits. They are very
fast due to the absence of any carry-propagation paths, their structure is very simple, but the
potential for further optimization is minimal. The same holds for signed-digit adders, which use a
slightly different redundant number representation. The addition results, however, usually have
to be converted into an irredundant integer representation in order to be processed further. This
operation is done using a carry-propagate adder.
in a prefix problem every output depends on all inputs of equal or lower magnitude, and every
input influences all outputs of equal or higher magnitude graph. Due to the associativity of the
prefix-operator, the individual operations can be carried out in any order. In particular, sequences
of operations can be grouped in order to solve the prefix problem partially and in parallel for
groups of input bits, resulting in the group variables. At higher levels, sequences of group
variables can again be evaluated, yielding levels of intermediate group variables, where the
group variable denotes the prefix result of bits at level l. The group variables of the last level
must cover all bits from to 0 and therefore represent the results of the prefix problem.
Various serial and parallel algorithms exist for solving prefix problems, depending on the bit
grouping properties. They result in very different size and delay performance measures when
mapped onto a logic network. The major prefix algorithms are now described and visualized by
16-bit examples using a graph representation. In the graphs, the black nodes depict nodes
performing the binary associative operation on its two inputs ( in Eq. 3.25), while the white
nodes represent feed-through nodes with no logic.
The hardware implementation of binary addition is a fundamental architectural
component in any processors, such as microprocessors, digital signal processors, mobile devices
and other hardware applications [Error! Reference source not found.]. In these systems when
building arithmetic logic unit (ALU), adders play an important role for performing the basic
arithmetic operations, such as addition, subtraction, multiplication, division, etc. [Error!
Reference source not found.]. Therefore, the hardware implementation of an effective adder is
necessary to increase the performance of ALU and, consequently, the processor itself as a whole.
Currently, a parallel prefix adder (PPA) is considered effective adder for performing the addition
of two multi-bit numbers. Circuit complexity and the speed of PPA are important parameters at
the stage of efficient hardware implementation and, therefore, in recent years various types of
PPA with different characteristics of the parameters have been developed.
In this paper Kogge-Stone adder [3] is investigated, which is one of the known effective fastest
PPA. Kogge-Stone is widely and efficiently used. Kogge-Stone, Han-Carlson and Knowles
adders require a large number of parallel wiring for wide bit adders. Thus packing the wires
close together will increase the coupling capacitance on each wire. Skylansky architecture
becomes slow due to its high fan-out. When interconnect is considered Han-Carlson become
attractive one as it requires only half the number of columns. Individually specifications are like
Kogge-Stone has least logic levels but hard to P and G. Brent-Kung is the very first and bad-one.
Ladner-Fischer has a bit more logic levels and high fan-out. Han-Carlson has more logic levels
but less cells. S. Knowles possesses many cells and wires and some fan-out. Sklansky has least
logic levels and highest fan-out. If wire capacitance is neglected Kogge-Stone adder is the best
among the others. The adders are sub divided according to their ability to accept and combine the
digits. Parallel-Prefix adders perform parallel addition i.e. more important in microprocessors,
DSPs, mobile devices and in other high speed applications. The reduction of logic complexity
and delay by the Parallel Prefix Adders enhance the performance with factors like delay and
power. Therefore the Parallel- Prefix adders are the suitable element in the high speed arithmetic
circuits. The major problem for binary addition using RCA is the carry chain. Increasing the
input operand width, increases the carry chain length. The worst case occurs when the carry
travels the lengthened possible path starting from the Least Significant Bit (LSB) to the Most
Significant Bit (MSB). In order to reduce the delay in RCA (or) to propagate the carry in
advance, we considering for carry look ahead adder. Basically this type of adder works on two
operations called propagates and generate. Increasing the adder bit width, increases the carry
complexity. So higher bit CLA designing becomes complexity. In order to compute the carry in
prior without delay and complexity, there is a idea called Parallel prefix.
The prominent parallel prefix tree adders are Kogge-Stone, Brent-Kung, Han-Carlson,
and Sklansky. it was found from the literature that Kogge-stone adder is the fastest adder when
compared to other adders. The adder priority in terms of worst-case delay is found to be Ripple-
Carry, Carry-Look-Ahead, Carry Select and Kogge-Stone. This is due to the number of
“Reduced stages”. Kogge-Stone adder implementation is the most straightforward, and also it
has one of the shortest critical paths of all tree adders. The drawback with the Kogge Stone adder
implementation is the large area consumed and the more complex routing (Fan-Out) of
interconnects. A Parallel Prefix Adder (PPA) is equivalent to the CLA adder. The two different
in the way their carry generation block is implemented. The parallel prefix carry look ahead
adder was first proposed some twenty years ago as a means of accelerating n-bit addition in
VLSI technology. It widely considered as the fastest adder and used for high performance
arithmetic circuits in the industries. A three step process is generally involved in the construction
of a Parallel Prefix Adder. The first step involves the creation of generate and propagate signals
for the input operand bits. The second step involves the generation of carry signals. In the final
step, the sum bits of the adder following stages of the operand bits and the preceding stage carry
bit using a xor gate.
Prefix adder architectures capable of three – operand addition for cell based design and their
synthesis have been designed and investigates in this thesis. Binary adders capable of constant
addition have also been presented and their performance investigated. The design is possible due
to the generation of a new set of intermediate outputs called “flag” bits. The research items and
the results of this work can be summarized as follows
Qualitative and quantitative comparisons of carry – skip, carry – select, and prefix adders for cell
based designs have been carried out. This thesis presents an algorithm to compute an
intermediate set of outputs called flag bits within a regular adder to make it capable of handling
three operands at a time. This design can be used as a replacement to carry-save adders with the
possibility of having the third operand as a constant or a variable binary number. The proposed
technique has a comparable performance to the conventional multi-operand adder. It eliminates
the need to have dedicated adder units to perform the operation since the new logic is
incorporated within a regular adder. The Kogge-Stone adder has a favorable performance in
terms of speed, with the trade off being high power and area consumption. The adder will be
implemented in practical application designs like decimal Arithmetic and multiplier units to
study the impact and performance gain that can be projected. The hardware will be optimized by
gate sizing in order to achieve better performance results. Such an adder has minimum delay
while performing the binary addition. However, for estimation of hardware costs this adder has a
great number of logic gates and Quine-complexity used in the schematic implementation.
Therefore, in the present work for reducing its hardware complexity a modified parallel prefix
adder is developed. Then, the comparison of the two presented adders is made by the following
parameters: the number of logic gates, Quine complexity, as well as the delay obtained by
simulation in Quartus II CAD environment based on FPGA Altera EP2C15AF484C6. A
perspective architecture is proposed for schematic implementation of various PPA. And
derivation of the formulas is also described for computing the hardware characteristics which are
dependent on the bit width of input operands of the presented adders.
CHAPTER 2
LITERATURE SURVEY
Geeta Rani, Sachin Kumar. “Delay Analysis of Parallel-Prefix Adders”. International

Journal of Science and Research (IJSR), ISSN: 2319-7064, Impact Factor (2012): 3.358.
Volume 3 Issue 6, June, 2014. pp. 2339.
This paper is a survey on the various Parallel-Prefix adders. This survey shows the various
aspects of the parallel-prefix adder and there specifications. Kogge-Stone, Han-Carlson and
Knowles adders require a large number of parallel wiring for wide bit adders. Thus packing the
wires close together will increase the coupling capacitance on each wire. Sklansky architecture
becomes slow due to its high fan-out. When interconnect is considered Han-Carlson become
attractive one as it requires only half the number of columns. Individually specifications are like
Kogge-Stone has least logic levels but hard to P and G. Brent-Kung is the very first and bad-one.
Ladner-Fischer has a bit more logic levels and high fan-out. Han-Carlson has more logic levels
but less cells. S. Knowles possesses many cells and wires and some fan-out. Sklansky has least
logic levels and highest fan-out. If wire capacitance is neglected Kogge-Stone adder is the best
among the others.
Sunil.M, Ankith.R.D, Manjunatha.G.D and Premananda.B.S. Design and implementation

of faster parallel prefix Kogge Stone adder. International Journal of Electrical and
Electronic Engeering & Tele communications 2014. ISSN 2319 – 2518. Vol. 3, No. 1,
January 2014. pp. 116
n tree adders, carries are generated in parallel and fast computation is obtained at the expense of
increased area and power. The main advantage of the design is that the carry tree reduces the
number of logic levels (N) by essentially generating the carries in parallel. The parallel-prefix
tree adders are more favorable in terms of speed due to the complexity 𝑂(𝑙𝑜𝑔2 𝑁) delay through
the carry path compared to that of other adders.
Grey cells are required for computation of generate bit in final stage and thus cannot be removed.
Black cells are the only redundant cells in tree adders. The removal of redundant cells can be
performed in different ways, but not all changes give the desired results. Thus all changes which
are going to be done have to be done in perspective of speed. Analyzing from the last stage gives
us a much better understanding of the redundant cells. In the last stage there are no redundant
cells as it contains only grey cells and hence none of them can be removed.
Athira.T.S, Divya.R, Karthik.M, Manikandan.A. Design of Kogge-Stone for fast addition.

Proceedings of 34th IRF International Conference, 26th February 2017, Bengaluru, India.
ISBN: 978-93-86291-639. pp. 27-28.
Adders use the combinations of logic gates to combine binary values for obtaining the sum. The
adders are sub divided according to their ability to accept and combine the digits. Parallel-Prefix
adders perform parallel addition i.e. more important in microprocessors, DSPs, mobile devices
and in other high speed applications. The reduction of logic complexity and delay by the Parallel
Prefix Adders enhance the performance with factors like delay and power. Therefore the
Parallel- Prefix adders are the suitable element in the high speed arithmetic circuits. The major
problem for binary addition using RCA is the carry chain.
Increasing the input operand width, increases the carry chain length. The worst case occurs
when the carry travels the lengthened possible path starting from the Least Significant Bit (LSB)
to the Most Significant Bit (MSB). In order to reduce the delay in RCA (or) to propagate the
carry in advance, we considering for carry look ahead adder. Basically this type of adder works
on two operations called propagates and generate. Increasing the adder bit width, increases the
carry complexity. So higher bit CLA designing becomes complexity. In order to compute the
carry in prior without delay and complexity, there is a idea called Parallel prefix approach.
CH. Sudha, Rani, CH. Ramesh. Design and Implementation of High Performance Parallel
Prefix Adders. International Journal of Innovative Research in Computer and
Communication Engineering. An ISO 3297: 2007 Certified Organization. Vol.2, Issue 9,
September 2014. pp. 5900.
The parallel-prefix tree adders are more favorable in terms of speed due to the complexity
O(log2N) delay through the carry path compared to that of other adders. The prominent parallel
prefix tree adders are Kogge-Stone, Brent-Kung, Han-Carlson, and Sklansky. it was found from
the literature that Kogge-stone adder is the fastest adder when compared to other adders. The
adder priority in terms of worst-case delay is found to be Ripple-Carry, Carry-Look-Ahead,
Carry Select and Kogge-Stone. This is due to the number of “Reduced stages”. Kogge-Stone
adder implementation is the most straightforward, and also it has one of the shortest critical paths
of all tree adders. The drawback with the Kogge Stone adder implementation is the large area
consumed and the more complex routing (Fan-Out) of interconnects. A Parallel Prefix Adder
(PPA) is equivalent to the CLA adder. The two different in the way their carry generation block
is implemented. The parallel prefix carry look ahead adder was first proposed some twenty years
ago as a means of accelerating n-bit addition in VLSI technology. It widely considered as the
fastest adder and used for high performance arithmetic circuits in the industries. A three step
process is generally involved in the construction of a Parallel Prefix Adder. The first step
involves the creation of generate and propagate signals for the input operand bits. The second
step involves the generation of carry signals. In the final step, the sum bits of the adder following
stages of the operand bits and the preceding stage carry bit using a xor gate.
Vibhuti Dave. “High-speed multi operand addition utilization flag bits” Submitted in
partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer
Engineering, Chicago. Illinois. May 2007. pp. 38-39.
Prefix adder architectures capable of three – operand addition for cell based design and their
synthesis have been designed and investigates in this thesis. Binary adders capable of constant
addition have also been presented and their performance investigated. The design is possible due
to the generation of a new set of intermediate outputs called “flag” bits. The research items and
the results of this work can be summarized as follows
• Qualitative and quantitative comparisons of carry – skip, carry – select, and prefix adders
for cell based designs have been carried out.
• This thesis presents an algorithm to compute an intermediate set of outputs called flag
bits within a regular adder to make it capable of handling three operands at a time.
• This design can be used as a replacement to carry-save adders with the possibility of
having the third operand as a constant or a variable binary number.
• The proposed technique has a comparable performance to the conventional multi-operand
adder.
• It eliminates the need to have dedicated adder units to perform the operation since the
new logic is incorporated within a regular adder.
• The Kogge-Stone adder has a favorable performance in terms of speed, with the trade off
being high power and area consumption
• The adder will be implemented in practical application designs like decimal Arithmetic
and multiplier units to study the impact and performance gain that can be projected
The hardware will be optimized by gate sizing in order to achieve better performance results.
Chapter 3
Existing system
Parallel prefix adder (PPA) is a multi-bit carry-propagate adder which is used for parallel
addition of two multi-bit numbers. PPA extend the generated and propagated logic of the carry
look-ahead adder to perform addition even faster . As the basic schematic structure of the various
PPA, perspective architecture is analyzed, it consists of three stages (Figure 1) : pre-processing
stage, prefix computation stage and final processing stage. Let consider each stage in more
detail.
At the pre-processing stage, carry-generate gi and carry propagate high signals are computed for
each pair of input operands Ai and Bi . The calculation of these signals is described by the
following corresponding logical equations:
gi = Ai & Bi ; i = 0,1,2,...,n −1
hi = Ai ^ Bi ; i = 0,1,2,...,n −1
At the prefix computation stage, group carry-generate Gi:k and group carry-propagate Hi:k
signals are calculated for each bit by the following equations:
The final processing stage involves the formation of output carries and sum-values
for each individual operand bit. The expression for Pi and Si is defined as the
following equations respectively:
Pi = G[i:0]
Si = hi ^Pi-1
Where, pi-1 is carry-in ( Pin = 0 ). In the framework of this paper, basic schematic
nodes are used for greater visibility of the given architecture when constructing the
presented adders. Figure 2 shows the basic schematic nodes: a black cell, a gray
cell, a white cell and a circle. These schematic nodes are implemented by the
logical equations at all stages of the given architecture. The number of logic gates
and Quine-complexity of each schematic node are computed for estimation of
hardware costs. In this work Quine-complexity is determined by the total number
of inputs of all the logic gates used in the schematic nodes [Error! Reference
source not found.]. And one can also calculate the number of logic gates used in
them.
Figure 2 shows that a black cell contains 3 logic gates and Quine-complexity is 6, and a gray cell
has 2 logic gates and Quine-complexity = 4. In these two cells, Gi:k and Hi:k signals are
calculated for each bit by equations (1.3) 􀉢 (1.4) at the prefix computation stage. 􀉢􀉢 using these
equations, a black cell and a gray cell receive inputs from the upper part of block spanning bits i :
j and take inputs from the lower part spanning bits j −1: k . Then, these schematic cells are
combined to form a tree of generate and propagate signals for the entire block spanning bits i : k .
So, the main challenge is to compute rapidly all the group generated signals G0:0 , G1:0 , G2:0 ,
G3:0 , . . . , Gn−1:0 . These signals, along with propagated signals H0:0 , H1:0 , H2:0 , H3:0 , . . ,
Hn−1:0 are called prefixes [Error! Reference source not found.]. The network of these schematic
nodes (black cell + gray cell) is a prefix tree. The white cell consists of 2 logic gates and Quine
complexity = 4, it serves to calculate gi and pi signals of input operands Ai and Bi with equations
(1.1) and (1.2) at the pre-processing stage. At the final stage for performing the results of the
binary addition with equation (1.6) the circle consisting of one logic gate is used and Quine-
complexity is equal 2.
Kogge-Stone adder is a parallel-prefix form carry look ahead adder [Error! Reference source not
found.], which has a minimum delay. Kogge-Stone adder was developed by peter M. Kogge and
Harold S. Stone which they published in 1973. This adder is widely used in high performance
applications. The scheme of a 16-bit Kogge-Stone adder is shown in figure 3.
This adder computes gi and hi signals for the preprocessing stage. Then at the first level (l = 1) of
prefix tree, Gi:k and Hi:k signals of 2-bit are computed within the same time. At the second level
(l = 2) of prefix tree, Gi:k and Hi:k of 4-bit are calculated by using the result of 2-bit at level 1.
Therefore, the actual carry-out value of the 4th bit would be available while the calculations at
level 2 are being computed. At the third level (l = 3) of prefix tree, the carry-out of the 8th bit is
computed by using the 4th bit carry result. The same method adopted at level 3 is applied to get
carry-out values of the 16th bit in fourth level (l = 4) and etc. All other carries of bit are also
computed in parallel. Finally, at the final processing stage the sums are computed from these
final carry-out signals of the prefix tree.
In the prefix tree the number of levels corresponds to l = log2 n and the number of schematic
nodes (white cell + gray cell) will be ( k = [n(log2 n) − n +1] ). Quine-complexity QK.S and the
number of logic gates CK.S in the Kogge-Stone adder are given by the following equation:
CK.S = 3n(log2 n) − n + 4
QK.S = 6n(log2 n) − 2n + 8
Figure 4 shows the scheme of a 32-bit Kogge-Stone adder with increasing the bit width of input
operands more than 16 bits.
Chapter 4
Proposed Method
In the framework of this paper the modified parallel prefix adder is developed for reducing the
hardware complexity of Kogge-Stone adder. Figure 5 shows the scheme of a 16-bit modified
parallel prefix adder.
The construction of the first level of the prefix tree of this adder is similar to the construction of
Kogge-Stone adder. The main structural difference begins from the second level of the prefix
tree. At the second level of the prefix tree, the groups of two schematic nodes are formed, at the
3rd level – groups compose four schematic nodes and at the 4th level – groups including 8
schematic nodes, etc. This adder first computes gi and hi signals for the first stage. Then at the
first level of prefix tree, Gi:k and Hi:k signals of 2-bit are computed at the same time, and then, it
computes Gi:k and Hi:k signals for pairs of columns, then for blocks of 4, then for blocks of 8,
then 16, and so on until the final Gi:k signal for every column is known. Finally, at the last stage
this adder computes the sums together with the generated signals obtained from the previous
prefix computation stage. The number of levels of the prefix tree corresponds to ( log2 n ) and
the number of schematic nodes will be
The number of logic gates CModified PPA and Quine-complexity QModified PPA of this
modified adder are calculated using the following equation:
The scheme of a 32-bit modified parallel prefix adder is shown in Figure 6 with increasing the
bit width of input operands more than 16 bits.
CHAPTER-5
XILINX AND VERILOG HDL
HISTORY OF VERILOG
Verilog was once began in the 365 days 1984 by the use of Gateway Design Automation
Inc as a proprietary hardware modeling language. It can be rumored that the usual language used
to be designed via utilizing taking elements from essentially the most preferred HDL language of
the time, referred to as HiLo, as well as from typical computer languages similar to C. At that
time, Verilog used to be no longer standardized and the language modified itself in almost all the
revisions that got here out inside of 1984 to 1990.
Verilog simulator first utilized in 1985 and increased appreciably via 1987. The
implementation of Verilog simulator supplied via Gateway. The first foremost extension of
Verilog is Verilog-XL, which delivered a couple of aspects and implemented the notorious "XL
algorithm" which is an awfully efficient system for doing gate-degree simulation.
Later 1990, Cadence Design system, whose major product on the moment integrated
skinny film procedure simulator, decided to accumulate Gateway Automation approach, together
with other Gateway merchandise., Cadence now emerge because the owner of the Verilog
language, and persevered to market Verilog as each a language and a simulator. Whilst,
Synopsys was as soon as promoting the top-down design methodology, utilizing Verilog. This
was as soon as a powerful blend.
In 1990, Cadence organized the Open Verilog global (OVI), and in 1991 gave it the
documentation for the Verilog Hardware Description Language. This was once the occasion
which "opened" the language.
BASIC CONCEPTS
Hardware Description Language
Two matters distinguish an HDL from a linear language like “C”:
Concurrency:
• The capacity to do a number of matters simultaneously i.E. One of a kind code-blocks can run
at the same time.
Timing:
• capacity to symbolize the passing of time and sequence routine accordingly
VERILOG Introduction
• Verilog HDL is a Hardware Description Language (HDL).

• A Hardware Description Language is a language used to describe a digital procedure; one may
describe a digital system at a few levels.
• An HDL could describe the design of the wires, resistors and transistors on an constructed-in
Circuit (IC) chip, i.E., the swap level.
• it could describe the logical gates and flip flops in a digital system, i.E., the gate level.
• an first-rate greater measure describes the registers and the transfers of vectors of knowledge
between registers. This is called the Register swap degree (RTL).
• Verilog helps all of those stages.
• A strong operate of the Verilog HDL is that you need to use the equal language for describing,
trying out and debugging your procedure.
VERILOG Features
• powerful history:
Supported through utilizing OVI, and standardized in 1995 as IEEE std 1364
• Industrial help:
speedy simulation and amazing synthesis (eighty five% were used in ASIC foundries via EE
events)
• universal:
makes it possible for entire approach in a single design surroundings (together with evaluation
and verification)
• Extensibility:
Verilog PLI that makes it viable for for extension of Verilog capabilities
Design Flow
The typical design flow is shown in figure,
Design Specification
• requirements are written first-Requirement/desires concerning the task
• Describe the performance total structure of the digital circuit to be designed.
• Specification: phrase processor like phrase, Kwriter, AbiWord and for drawing waveform use
tools like wave former or scan bencher or phrase.
RTL Description
• Conversation of Specification in coding format using CAD Tools.
Coding Styles:
• Gate Level Modeling

• Data Flow Modeling
• Behavioral Modeling
• RTL Coding Editor: Vim, Emacs, conTEXT, HDL TurboWriter

VLSI Design Flow
Functional Verification &Testing
• evaluating the coding with the requisites.

• trying out the procedure of coding with corresponding inputs and outputs.
• If trying out fails – as soon as again determine the RTL Description.
• Simulation: Modelsim, VCS, Verilog-XL,Xilinx.
Logic Synthesis
• dialog of RTL description into Gate level -web list kind.
• Description of the circuit in phrases of gates and connections.
• Synthesis: Design Compiler, FPGA Compiler, Synplify professional, Leonardo Spectrum,

Altera and Xilinx.
Logical Verification and Testing
• sensible Checking of HDL coding by simulation and synthesis. If fails – examine the RTL
description.
Floor Planning Automatic Place and Route
• construction of design with the corresponding gate measure net report.

• arrange the blocks of the web record on the chip
• function& Route: For FPGA use FPGA' companies P&R instrument. ASIC instruments require
steeply-priced P&R tools like Apollo. Students can use LASI, Magic
Physical Layout
• bodily design is the system of reworking a circuit description into the bodily design, which
describes the role of cells and routes for the interconnections between them.
Layout Verification
• Verifying the bodily layout constitution.

• If any change –as soon as once more investigate ground Planning automated location and Route
and RTL Description.
Implementation
• final stage within the design method.

• Implementation of coding and RTL in the form of IC.
Comments
Comments can be inserted in the code for readability and documentation. There
are two varieties to introduce comments.
• Single line feedback start with the token // and end with a carriage
return
• Multi line feedback start with the token /* and finish with the token */
Identifiers and Keywords
Identifiers are names used to gift an object, similar to a register or a perform or a module, a
reputation so that it could be referenced from different areas in a description.
Key words are reserved to stipulate the language constructs.
• Identifiers ought to start with an alphabetic personality or the underscore persona (a-z A-Z_ )
• Identifiers might contain alphabetic characters, numeric characters, the underscore, and the
buck sign (a-z A-Z 0-9 _ $ )
• Identifiers will also be as a lot as 1024 characters lengthy.
• keyword phrases are in lowercase.
Examples of keywords
• always
• begin
• end
MODULES
A module in Verilog contains exotic factors as proven in check. A module definition endlessly
starts offevolved with the important thing phrase module. The module title, port record, port
declarations, and no longer obligatory parameters have got to come first in a module definition.
Port list and port declarations are present supplied that the module has any ports to engage with
the outside atmosphere. The five add-ons inside a module are;
• variable declarations,
• dataflow statements
• instantiation of cut back modules
• behavioral blocks
• duties or aspects.
These add-ons can be in any order and at any function within the module definition.
The endmodule assertion have to continuously come final in a module definition. All
components apart from module, module title, and endmodule are non-obligatory and may also be
mixed and matched as per design wants. Verilog makes it possible for a few modules to be
outlined in a single file. The modules will also be outlined in any order inside the file.
Example Module Structure:

module<module name>(<module_terminals_list>);
…..
<module internals>
….
Endmodule
Instances
A module presents a template from which that you'd be able to create precise objects. When a
module is invoked, Verilog creates a exceptional object from the template. Every object has its
possess establish, variables, parameters and i/O interface. The method of creating objects from a
module template is often called instantiation, and the objects are referred to as occasions. In
illustration under, the very best-stage block creates four occasions from the Tflip- flop (T_FF)
template. Every T_FF instantiates a D_FF and an inverter gate. Each example have to take
delivery of a specific identify.
PORTS
Ports furnish the interface in the course of which a module can preserve up a correspondence
with its atmosphere. For illustration, the enter/output pins of an IC chip are its ports. The
atmosphere can engage with the module easiest through its ports. The internals of the module
customarily will not be seen to the atmosphere. This presents an awfully powerful flexibility to
the fashion designer. The internals of the module can also be modified without affecting the
atmosphere as long as the interface shouldn't be modified. Ports are additionally referred to as
terminals.
Port Declaration
All ports in the list of ports must be declared in the module. Ports can be declared as follows
Verilog Keyword Type of Port
inputInput port
outputOutput port
inout Bidirectional port
Each port in the port list is defined as input, output, or inout, based on the direction of the port
signal.
Port Connection Rules

You'll be able to visualize a port as which includes two models, one unit that's inside to the
module yet another that is outside to the module. The inner and outside models are linked. There
are ideas governing port connections when modules are instantiated inside exceptional modules.
The Verilog simulator complains if any port connection rules are violated.
Inputs:
• Internally must be of net data type (e.g. wire)
• Externally the inputs may be connected to a reg or net data type
Outputs:
• Internally may be of net or reg data type
• Externally must be connected to a net data type
Inouts:
• Internally must be of net data type (tri recommended)
• Externally must be connected to a net data type (tri recommended)
MODELING CONCEPTS
Verilog is each and every a behavioral and a structural language. Internals of each module to be
outlined at four stages of abstraction, relying on the wishes of the design. The module behaves
identically with the external atmosphere without reference to the level of abstraction at which the
module is described. The internals of the module hidden from the environment. As a
consequence, the extent of abstraction to explain a module can also be transformed with none
trade inside the surroundings. The phases are defined under
• Behavioral or algorithmic level

This is the superb measure of abstraction offered by utilising Verilog HDL. A module may also
be carried out in phrases of the preferred design algorithm without situation for the hardware
implementation small print. Designing at this degree is similar to C programming
• Dataflow level
At this stage the module is designed via specifying the data go with the flow. The designer is
conscious of how knowledge flows between hardware registers and the best way the information
is processed within the design.
• Gate level
The module is implemented in phrases of customary feel gates and interconnections between
these gates. Design at this stage is just like describing a design in phrases of a gate-degree just
right judgment diagram.
• Switch level
That's the bottom stage of abstraction offered via Verilog. A module can be applied in phrases of
switches, storage nodes, and the interconnections between them. Design at this measure requires
expertise of switch-degree implementation fundamental facets. Verilog allows the fashion
dressmaker to mix 'n match all 4 phases of abstractions in a design. Inside the digital design
group, the time period register swap measure (RTL) is most likely used for a Verilog description
that makes use of a blend of behavioral and dataflow constructs and is right to original sense
synthesis devices. If a design involves four modules, Verilog makes it possible for every of the
modules to be written at a different degree of abstraction. Because the design matures, most
modules are changed with gate-stage implementations.
Probably, the higher the extent of abstraction, the additional bendy and science impartial the
design. As one goes decrease towards trade-stage design, the design becomes science centered
and rigid. A small modification can purpose a enormous quantity of changes within the design.
Evaluating the analogy with C programming and meeting language programming. It can be less
difficult to application in bigger stage language akin to C. The applying can even be quite simply
ported to any laptop. On the other hand, if the design on the assembly stage, the application is
designated for that pc and aren't equipped to be conveniently ported to an additional computing
device.
GATE LEVEL MODELING
Verilog has developed in primitives like gates, transmission gates, and switches. These are rarely
utilized in design (RTL Coding), however are used in submit synthesis world for modeling the
ASIC/FPGA cells; these cells are then used for gate degree simulation. Also the output netlist
structure from the synthesis tool, which is imported into the place and route instrument, can also
be in Verilog gate degree primitives.
Gate Types
A usual experience circuit can also be designed with the support of use of common experience
gates. Verilog helps excellent judgment gates as predefined primitives. Theses primitives are
instantiated like modules besides that they're predefined in Verilog and should not have a module
definition. All circuit may also be designed by way of utilising common gates. There are two
classes of typical gates: and/or gates and buf/now not gates.
BEHAVIORAL AND RTL MODELING
Verilog supplies designers the potential to explain design effectivity in an algorithmic method. In
certain words, the trend clothier describes the habits of the circuit. Thus, behavioral modeling
represents the circuit at an extraordinarily immoderate stage of abstraction. Design at this
measure resembles C programming higher than it resembles digital circuit design. Behavioral
Verilog constructs are similar to c program language constructs in plenty of tactics. Verilog is
rich in behavioral constructs that furnish the clothier with a high-exceptional amount of
flexibility.
Operators
Verilog provided many different operators types. Operators can be,
• Arithmetic Operators
• Relational Operators
• Bit-wise Operators
• Logical Operators
• Reduction Operators
• Shift Operators
• Concatenation Operator
• Replication Operator
• Conditional Operator
• Equality Operator
Arithmetic Operators
• These perform arithmetic operations. The + and - can be utilized as each unary (-z) or binary
(x-y) operators.
• Binary: +, -, *, /, % (the modulus operator)
• Unary: +, - (that's used to specify the sign)
• Integer division truncates any fractional phase
• The effect of a modulus operation takes the signal of the major operand
• If any operand bit worth is the unknown rate x, then the entire have an effect on price is x
• Register know-how forms are used as unsigned values (poor numbers are saved in two's
complement form)
Relational Operators
Relational operators evaluate two operands and return a single bit 1or zero. These operators
synthesize into comparators. Wire and reg variables are positive therefore (-
three’b001) = = 3’b111 and (-3d001)>3d1 10, nonetheless for integers -1< 6
• The outcome is a scalar worth (illustration a < b)

• zero if the relation is fake (a is higher than b)
• 1 if the relation is true ( a is smaller than b)
• x if any of the operands has unknown x bits (if a or b includes X)
be aware: If any operand is x or z, then the result of that experiment is handled as false (zero)
Bit-wise Operators
Bitwise operators perform just a little of intelligent operation on two operands. This take every
bit in a single operand and participate within the operation with the corresponding bit inside the
different operand. If one operand is shorter than the other, it's going to be accelerated on the left
phase with zeroes to verify the size of the longer operand.
• Computations include unknown bits, in the following way:
• -> ~x = x
• -> 0&x = 0
• -> 1&x = x&x = x
• -> 1|x = 1
• -> 0|x = x|x = x
• -> 0^x = 1^x = x^x = x
• -> 0^~x = 1^~x = x^~x = x
• When operands are of unequal bit length, the shorter operand is zero-filled in the most
significant bit positions.
Logical Operators
Legitimate administrators give back a solitary piece 1 or zero. They're the equivalent as bit-
canny administrators only for single piece operands. They can take a shot at expressions, whole
numbers or organizations of bits, and deal with all qualities which may likewise be nonzero as
"1". Coherent administrators are without uncertainty utilized as a part of contingent (if ... Else)
explanations seeing that they work with expressions.
• Expressions connected with the backing of are assessed from left to appropriate
• assessment stops as fast when you consider that that that that the impact is noted
• The impact is a scalar esteemed at:
• - > zero if the connection is false
• - > 1 if the connection is correct
• - > x if any of the operands has x (obscure) bits

Reduction Operators
Rebate administrators work on the greater part of the bits of an operand vector and return a
solitary piece cost. These are the unary (one contention) kind of the bit-astute administrators.
• Reduction administrators are unary.

• They play out somewhat insightful operation on a solitary operand to deliver a solitary piece
result.
• Reduction unary NAND and NOR administrators work as AND as well as separately, yet with
their yields nullified.
• - > Unknown bits are dealt with as portrayed some time recently
Shift Operators
Shift administrators move the primary operand by the quantity of bits determined by the second
operand. Emptied positions are loaded with zeros for both left and right moves
(There is no sign augmentation).
• The left operand is moved by the quantity of bit positions given by the right operand.
• The cleared piece positions are loaded with zeroes
Concatenation Operator
The link administrator joins two or more operands to shape a bigger vector.
• Concatenations are communicated utilizing the prop characters { and }, with commas isolating
the expressions inside.
• - >Example: + {a, b[3:0], c, 4'b1001}/if an and c are 8-bit numbers, the outcomes has 24 bits
• Unsized consistent numbers are not permitted in connections.
Operator Precedence
Procedural Blocks
Verilog behavioral code is inside strategy pieces, yet there is a special case: some behavioral
code additionally exist outside method squares. We can see this in subtle element as we gain
ground. There are two sorts of procedural squares in Verilog:
• introductory: beginning squares execute just once at time zero (begin execution at time zero).
• dependably: dependably squares circle to execute again and again; at the end of the day, as the
name proposes, it executes dependably.
In a dependably square, when the trigger occasion happens, the code inside start and end is
executed; then by and by the dependably piece sits tight for next occasion activating. This
procedure of holding up and executing on occasion is rehashed till reproduction stops.
Xilinx Verilog HDL Tutorial

Getting started
On the off chance that you wish to take a shot at this instructional exercise and the
research facility at home, you should download and introduce Xilinx and ModelSim. These
apparatuses both have free understudy forms. It would be ideal if you perform Appendix B, C,
and D in a specific order before proceeding with this instructional exercise. Moreover in the
event that you wish to buy your own particular Spartan3 board, you can do as such at Digilent's
Website. Digilent offers scholastic evaluating. If you don't mind take note of that you should
download and introduce Digilent Adept programming. The product contains the drivers for the
board that you require furthermore gives the interface to program the board.
1. Introduction
Xilinx Tools is a suite of programming devices utilized for the outline of advanced circuits
actualized utilizing Xilinx Field Programmable Gate Array (FPGA) or Complex Programmable
Logic Device (CPLD). The outline method comprises of (a) configuration passage, (b) blend and
execution of the outline, (c) useful reenactment and (d) testing and confirmation. Advanced
outlines can be entered in different ways utilizing the above CAD devices: utilizing a schematic
section instrument, utilizing an equipment portrayal dialect (HDL) – Verilog or VHDL or a mix
of both. In this lab we will just utilize the configuration stream that includes the utilization of
Verilog HDL.
The CAD devices empower you to outline combinational and consecutive circuits beginning
with Verilog HDL plan determinations. The progressions of this outline strategy are recorded
beneath:
1. Make Verilog outline information file(s) utilizing format driven editorial manager.
2. Accumulate and execute the Verilog outline file(s).
3. Make the test-vectors and recreate the outline (useful reenactment) without utilizing a PLD
(FPGA or CPLD).
4. Allot info/yield pins to execute the configuration on an objective gadget.
5. Download bitstream to a FPGA or CPLD gadget.
6. Test plan on FPGA/CPLD gadget
A Verilog information document in the Xilinx programming environment comprises of the
accompanying portions:
Header: module name, rundown of information and yield ports.
Announcements: info and yield ports, registers and wires.
Rationale Descriptions: conditions, state machines and rationale capacities.
End: endmodule
All your designs for this lab must be specified in the above Verilog input format. Note
that the state diagram segment does not exist for combinational logic designs.
2. Programmable Logic Device: FPGA
On this lab digital designs will also be implemented within the Basys2 board which has a
Xilinx Spartan3E –XC3S250E FPGA with CP132 package deal deal. This FPGA phase belongs
to the Spartan cherished ones of FPGAs. These contraptions are available in a type of programs.
We can be utilizing devices that are packaged in 132 pin bundle care for the next phase wide
variety: XC3S250E-CP132. This FPGA is a gadget with about 50K gates. Specific information
on this device is on hand on the Xilinx internet site.
3. Creating a New Project
Xilinx instruments can also be started by clicking on the assignment Navigator Icon on
the home windows computer. This must open up the assignment Navigator window on
your display. This window indicates the last accessed project.
Xilinx Project Navigator window (snapshot from Xilinx ISE software)
3.1 Opening a project
Choose File->New venture to create a new undertaking. This will likely deliver up a new
assignment window on the laptop. Refill the crucial entries as follows:
New Project Initiation window (snapshot from Xilinx ISE software)
Project title: Write the name of your new undertaking

task vicinity: The listing the place you want to retailer the brand new venture (note: do not
specify the undertaking vicinity as a folder on desktop or a folder within the Xilinxbin listing.
Your H: pressure is the high-quality situation to place it. The challenge place direction is not to
have any spaces in it eg: C:NivashTAnew labsample exerciseso_gate will not be for use)
go away the highest degree module form as HDL.
Instance: If the assignment title have been “o_gate”, enter “o_gate” because the task identify and
then click on “subsequent”.
Clicking on NEXT should bring up the following window:
Device and Design Flow of Project (snapshot from Xilinx ISE software)
For each of the properties given below, click on the ‘value’ area and select from the list
of values that appear.
• device household: household of the FPGA/CPLD used. On this laboratory we will be

making use of the Spartan3E FPGA’s.
• gadget: The range of the distinctive gadget. For this lab you can also enter XC3S250E
(this will also be placed on the hooked up prototyping board)
• package deal deal: The style of package handle the quantity of pins. The Spartan FPGA
used on this lab is packaged in CP132 bundle.
• % Grade: The velocity grade is “-4”.
• Synthesis instrument: XST [VHDL/Verilog]
• Simulator: The instrument used to simulate and confirm the performance of the design.
Modelsim simulator is integrated within the Xilinx ISE. As a end result select “Modelsim-XE
Verilog” as the simulator or even Xilinx ISE Simulator can be used.
• Then click on subsequent to avoid wasting tons of the entries.
All undertaking files reminiscent of schematics, netlists, Verilog documents, VHDL files,
and many others., will likely be stored in a subdirectory with the assignment title. A task can
only have one top degree HDL source file (or schematic). Modules will also be delivered to the
venture to create a modular, hierarchical design.
With a purpose to open an existing mission in Xilinx instruments, decide on File->Open
mission to show the list of tasks on the laptop. Pick the undertaking you need and click ok.
Clicking on NEXT on the above window brings up the following window:
Create New source window (snapshot from Xilinx ISE software) If creating a new source file,
Click on the NEW SOURCE.
3.2 Creating a Verilog HDL input file for a combinational logic design
In this lab we can enter a design making use of a structural or RTL description using the
Verilog HDL. Which you can create a Verilog HDL enter file (.V file) making use of the HDL
Editor available within the Xilinx ISE tools (or any text editor).
Within the previous window, click on on the new supply
A window pops up as proven in figure 4. (observe: “Add to task” choice is selected with the
aid of default. If you don't decide upon it then you'll have to add the new supply file to the
challenge manually.)
Creating Verilog-HDL source file (snapshot from Xilinx ISE software)
Prefer Verilog Module and inside the “File determine:” area, enter the determine of the
Verilog supply file you're going to create. Moreover make targeted that the alternative Add to
project is chosen so that the supply needn't be delivered to the venture again. Then click on
subsequent to take delivery of the entries. This pops up the following window.
Define Verilog Source window (snapshot from Xilinx ISE software)
Within the Port identify column, enter the names of all input and output pins and specify the
path thus. A Vector/Bus will also be outlined by way of getting into right bit numbers in the
MSB/LSB columns. Then click on subsequent> to get a window showing all of the new supply
information. If any changes are to be made, simply click on <Back to go back and make changes.
If everything is acceptable, click on Finish > next > next > finish to proceed.
New Project Information window (snapshot from Xilinx ISE software)

Whilst you click on on conclude, the deliver file will most likely be displayed inside the
sources window within the mission Navigator.
If a source has to be eliminated, without problems appropriate click on on the supply file
inside the Sources in mission window within the challenge Navigator and opt for dispose of in
that. Then choose mission -> Delete Implementation understanding from the challenge Navigator
menu bar to dispose of any associated records.
3.3 Editing the Verilog source file
The source file will now be displayed in the task Navigator window. The supply file window
can be utilized as a textual content editor to make any vital alterations to the supply file. The
entire input/output pins will likely be displayed. Save your Verilog application periodically
through settling on the File->save from the menu. You can also edit Verilog applications in any
textual content editor and add them to the undertaking directory utilizing “Add replica source”.
Verilog supply code editor window in the task Navigator (from Xilinx ISE utility)
together with good judgment within the generated Verilog supply code template:
a short Verilog Tutorial is to be had in Appendix-A. As a result, the language syntax and
development of logic equations may also be pointed out Appendix-A.
The Verilog supply code template generated suggests the module name, the report of ports
and likewise the declarations (input/output) for each and every port. Combinational common feel
code can be delivered to the verilog code after the declarations and earlier than the endmodule
line.
For example, an output z in an OR gate with inputs a and b will even be b;
take into account that the names are case sensitive.

Other constructs for modeling the great judgment perform: A given excellent judgment
function can also be modeled in lots of approaches in verilog. Correct right here is another
illustration where the good judgment operate, is applied as a fact desk utilising a case
announcement:
4. Synthesis and Implementation of the Design
The design wants to be synthesized and utilized earlier than it could be checked for
correctness, by means of going for walks practical simulation or downloaded onto the
prototyping board. With the highest-measure Verilog file opened (can also be completed through
double-clicking that file) inside the HDL editor window inside the correct half of ofof the task
Navigator, and the view of the venture being within the Module view , the put into effect design
option will also be obvious within the procedure view. Design entry utilities and Generate
Programming File options can also be seen inside the strategy view. The former can be utilized
to include person constraints, if any and the latter may also be mentioned later.
To synthesize the design, double click on on the Synthesize Design choice inside the
approaches window.
To put into effect the design, double click on the enforce design substitute inside the
strategies window. It will go via steps like Translate, Map and role & Route. If any of these steps
would now not be completed or executed with blunders, it is going to location a X mark in front
of that, or else a tick mark can be positioned after each of them to indicate the optimistic
completion. If the entire factor is finished effectually, a tick mark will also be put before the put
in force Design substitute. If there are warnings, you can be competent to peer mark in entrance
of the alternative indicating that there are some warnings. That you would be able to nonetheless
seem on the warnings or mistakes inside the Console window reward at the backside of the
Navigator window. At any time when the design file is saved; all these marks disappear
soliciting for a contemporary compilation.
Implementing the Design (snapshot from Xilinx ISE software)
The schematic diagram of the synthesized verilog code can be considered by means of
double clicking View RTL Schematic below Synthesize-XST menu in the approach Window.
This is competent to be a useful procedure to debug the code if the output just isn't assembly our
requirements inside the proto kind board.
Via utilising double clicking it opens the easiest stage module displaying handiest input(s)
and output(s) as proven under.
Top Level Hierarchy of the design
By double clicking the rectangle, it opens the realized internal logic as shown below.
Realized logic by the Xilinx ISE for the verilog code

CHAPTER 6
RESULTS
Here the results for 16-bit kogge stone adder is shown in below
Similarly for 32 bit kogge stone adder is shown in below

RTL structure is
The area of the proposed 32-bit kogge-stone adder is shown in below

AREA:
Similarly delay is
The power consumption is shown in below
CONCLUSION
In this article, the following tasks have been solved: analysis of the perspective architecture for
constructing various multi-bit PPA schemes; derivation of formulas for estimating the hardware
complexity of multi-bit PPA; schematic implementation of the standard 16-bit and 32-bit Kogge-
Stone adders and schematics implementation of 16-bit and 32-bit modified parallel prefix adders.
Then, a comparative analysis of parameters and simulation results of the presented adders have
been carried out. As a result, researches have shown, that the modified parallel prefix adder
proposed in the work has an advantage in terms of hardware complexity in comparison with the
known structure of Kogge- Stone adder. Additionally, in terms of speed the proposed parallel
prefix adder has the advantage over group-prefix and carry-lookahead adders, and famous as
parallel prefix adders Sklansky and Brent-Kung. As a future work this architecture can be reduce
further by applying new algorithms or reducing the cells by speeding the device.

DOC-Reducing The Hardware Complexity of A Parallel Prefix Adder

Enviado por

Dados do documento

Descrição original:

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

DOC-Reducing The Hardware Complexity of A Parallel Prefix Adder

Enviado por

Direitos autorais:

Formatos disponíveis

Reducing the Hardware Complexity of a

Parallel Prefix Adder

Geeta Rani, Sachin Kumar. “Delay Analysis of Parallel-Prefix Adders”. International

Sunil.M, Ankith.R.D, Manjunatha.G.D and Premananda.B.S. Design and implementation

Athira.T.S, Divya.R, Karthik.M, Manikandan.A. Design of Kogge-Stone for fast addition.

• Verilog HDL is a Hardware Description Language (HDL).

The typical design flow is shown in figure,

• Conversation of Specification in coding format using CAD Tools.

• Gate Level Modeling

• RTL Coding Editor: Vim, Emacs, conTEXT, HDL TurboWriter

• evaluating the coding with the requisites.

• dialog of RTL description into Gate level -web list kind.

• Description of the circuit in phrases of gates and connections.

• Synthesis: Design Compiler, FPGA Compiler, Synplify professional, Leonardo Spectrum,

Logical Verification and Testing

Floor Planning Automatic Place and Route

• construction of design with the corresponding gate measure net report.

• Verifying the bodily layout constitution.

• final stage within the design method.

Example Module Structure:

Port Connection Rules

• Behavioral or algorithmic level

GATE LEVEL MODELING

BEHAVIORAL AND RTL MODELING

• The outcome is a scalar worth (illustration a < b)

• - > zero if the connection is false

• - > 1 if the connection is correct

• - > x if any of the operands has x (obscure) bits

• Reduction administrators are unary.

Xilinx Verilog HDL Tutorial

3.1 Opening a project

Project title: Write the name of your new undertaking

• device household: household of the FPGA/CPLD used. On this laboratory we will be

Within the previous window, click on on the new supply

Creating Verilog-HDL source file (snapshot from Xilinx ISE software)

New Project Information window (snapshot from Xilinx ISE software)

For example, an output z in an OR gate with inputs a and b will even be b;

take into account that the names are case sensitive.

4. Synthesis and Implementation of the Design

Realized logic by the Xilinx ISE for the verilog code

Similarly for 32 bit kogge stone adder is shown in below

The area of the proposed 32-bit kogge-stone adder is shown in below

Você também pode gostar