22

INTRODUCTION
Today's technologies make possible powerful computing devices with multi-media

capabilities. Consumer's attitudes are gearing towards better accessibility and mobility. Their
desire has caused a demand for an ever increasing number of portable applications requiring low-
power and high throughput. For example, notebook and handheld computers are now made with
competitive computational capabilities as those found in desktop machines. Equally demanding
are personal communication applications in a pocket-sized device. In these applications, not only
voice, but data as well as video are transmitted via wireless links. It is important that these high
computational capabilities are placed in a low-power, portable environment. The weight and size
of these portable devices is determined by the amount of power required. The battery lifetime for
such products is crucial.
Hence, a well planned low energy design strategy must be in place. As the density of the
integrated circuits and size of the chips and systems continue to grow, it becomes more and more
difficult to provide adequate cooling for the systems .In addition to heat removal, there are also
economic and environmental issues for low power development. In the United States, computer
equipment accounts for about 2-3% of national electricity consumption. This figure is expected
to increase as there is tremendous increase in household computer applications, Web phones,
handheld computers, and internal terminals. These economic and environmental reasons have
compelled the requirement for energy efficient computers.
In order to meet the demand in high computational applications, the clock rate is steadily
increasing and clock skew being an increasingly significant part of the clock cycle. The energy
consumed by low-skew clock distribution networks is perpetually growing. Clock-related power
consumption can reach more than 30-40% of the total power of microprocessor and is becoming
a larger fraction of the chip power. In addition, the number of logic gate delays in a clock period
is reduced by 25% per generation. As a result, latency of flip flops or latches is becoming a larger
portion of the cycle time. In order to achieve a design that is both high-performance and power-
efficient, careful attention must be paid to the design of the flip flops and latches.
1.LOW POWER AND LOW VOLTAGE CMOS DESIGN
The design of portable devices requires consideration for peak power consumption to
ensure reliability and proper operation. However, the time averaged power is often more critical
as it is linearly related to the battery life. There are four sources of power dissipation in digital
CMOS circuits: switching power, short-circuit power, leakage power and static power. The
following equation describes these four components of power:
Pswitching is the switching power. For a properly designed CMOS circuit, this power
component usually dominates, and may account for more than 90% of the total power. denotes
the transition activity factor, which is defined as the average number of power consuming
transitions that is made at a node in one clock period. Vs is the voltage swing, where in most
cases it is the same as the supply voltage, Vdd. CL is the node capacitance. It can be broken into
three components, the gate capacitance, the diffusion capacitance, and the interconnect
capacitance. The interconnect capacitance is in general a function of the placement and routing.
fck is the frequency of clock. The switching power for static CMOS is derived as follows.
During the low to high output transition, the path from Vdd to the output node is con-
ducting to charge CL. Hence, the energy provided by the supply source is
where is the current drawn from the supply. Here, R is the resistance of the
path between the Vdd and the output node. Therefore, the energy can be rewritten as
During the high to low transition, no energy is supplied by the source. Hence, the average power
consumed during one clock cycle is
Eq. (2.4) and Eq. (2.5) estimate the energy and the power of a single gate only. From a system
point of view, is used to account for the actual number of gates switching at a point in time.
Pshortcircuit is the short-circuit power. It is a type of dynamic power and is typically
much smaller than Pswitching. Isc is known as the direct-path short circuit current. It refers to
the conducting current from power supply directly to ground when both the NMOS and PMOS
transistors are simultaneously active during switching.
Pleakage is the leakage power. Ileakage refers to the leakage current. It is primarily
determined by fabrication technology considerations and originates from two sources. The first is
the reverse leakage current of the parasitic drain-/source-substrate diodes. This current is in the
order of a few femtoamperes per diode, which translates into a few microwatts of power for a
million transistors. The second source is the sub threshold current of MOSFETs, which is in the
order of a few nanoamperes. For a million transistors, the total subthreshold leakage current
results in a few milliwatts of power.
Pstatic is the static power and Istatic is static current. This current arises from circuits that
have a constant source of current between the power supplies such as bias circuitries, pseudo-
NMOS logic families. For CMOS logic family, power is dissipated only when the circuits
switch, with no static power consumption.
Energy is independent of the clock frequency. Reducing the frequency will lower the
power consumption but will not change the energy required to perform a given operation, as
depicted by Eq. (2.4) and Eq. (2.5). It is important to note that the battery life is determined by
energy consumption, whereas the heat dissipation considerations are related to the power
consumption.
There are four factors that influence the power dissipation of CMOS circuits. They are
technology, circuit design style, architecture, and algorithm. The challenge of meeting the
contradicting goals of high performance and low power system operation has motivated the
development of low power process technologies and the scaling of device feature sizes.
Design considerations for low power should be carried out in all steps in the design hierarchy,
namely 1) Fundamental, 2) material, 3) device, 4) circuit, and 5) system.
2.1 LOW VOLTAGE
Power consumption is linearly proportional to voltage swing (Vs) and supply voltage
(Vdd) as indicated in Eq. (2.5). For most CMOS logic families, the swing is typically rail-to-rail.
Hence, power consumption is also said to be proportional to the square of the supply voltage,
Vdd. Therefore, lowering the Vdd is an efficient approach to reduce both energy and power,
presuming that the signal voltage swing can be freely chosen. This is, however, at the expense of
the delay of circuits. The delay, td, can be shown to be proportional to .The
exponent is between 1 and 2. It tends to be closer to 1 for MOS transistors that are in deep
sub-micrometer region, where carrier velocity saturation may occur. increases toward 2 for
longer channel transistors.
The current technology trends are to reduce feature size and lower supply voltage.
Lowering Vdd leads to increased circuit delays and therefore lower functional throughput.
Smaller feature size, however, reduces gate delay, as it is inversely proportional to the square of
the effective channel length of the devices. In addition, thinner gate oxides impose voltage
limitation for reliability reasons. Hence, the supply voltage must be lowered for smaller
geometries. The net effect is that circuit performance improves as CMOS technologies scale
down, despite of the Vdd reduction. Therefore, the new technology has made it possible to fulfill
the contradicting requirements of low-power and high throughput.
The various techniques that are currently used to scale the supply voltage include
optimizing the technology and device reliability, trading off area for low power in architecture
driven approach, and exploiting the concurrency possibility in algorithmic transformations.
Hence, the voltage scaling is limited by the threshold voltage Vth.
In applications such as digital processing, where the throughput is of more concern than
the speed, architecture can be designed to reduce the supply voltage at the expense of speed
without throughput degradation. Hence, the performance of the system can be maintained.
2.2 SWITCHING ACTIVITY REDUCTION
CMOS circuits dissipate power only when switching, therefore it is important to minimize
the switching activity for low power applications. Switching is decreased when the data rate is
low. Switching activity can be reduced by algorithmic optimization, architecture optimization,
logic topology, and circuit optimization, which are discussed as follows.
Algorithmic optimization depends heavily on the application and on the characteristics of
the data. Furthermore, the data representation may have a significant impact on the switching
activity. Recent researches show that the use of a gray code in address bits, where data changes
sequentially, results in less transition than the use of binary code.
Architecture optimization can be achieved through delay balancing, precomputation logic,
and power management scheme. Balanced tree topologies are often used to balance path delay,
hence reduce glitching. Precomputation logic predicts the output signal one clock cycle ahead
while using minimum circuit overhead. It generally limits a small subset of inputs to pass over to
the combinational blocks, and hence minimizes the switching activity of the system as a whole.
As shown in Figure 2.4 is a latch with clock gating.
Figure 2.1 Latches with Clock Gating
The XOR gate compares the values of D and Q. If D and Q are the same, the output of
the XOR gate is 0. The AND gate then prevents the clock from triggering the latch. On the other
hand, if D and Q are different, then the XOR-AND logic allows for the passing of the clock
signal. This scheme eliminates any unnecessary clock switching internal to the latch. Power
management technique is one of the most effective approaches in switching activity reduction.
This power-down method puts the circuits in a sleep mode when they are idle. It can be applied
at different levels of hierarchy, from module to chip level, even at the printed circuit board.
Circuit optimization may come down to the choice of logic families as well as gate topologies.
The selection is also application oriented.
2.3 SWITCHING CAPACITANCE REDUCTION

Energy consumption is proportional to switching capacitance as shown in Eq. (2.4).
Switching capacitance consists of transistor parasitic as well as wires capacitance from metal
interconnects. In general, fewer the transistor counts, lesser the parasitic capacitances of the gate
oxide and the source/drain diffusion capacitances. Complementary Pass gate Logic (CPL) family
demonstrates the least transistor count, compared to dynamic and static logic family. The
interconnect capacitance can be further divided into three main components, as shown in Figure
2.5: parallel plate capacitance, fringing field effect, and wire-wire capacitance. These three are
inter-related by the width (W) and the height (H) of the wire, as well as the thickness of dielectric
(tox). As tox increases, parallel plate capacitance reduces. But when tox becomes comparable to
W and H, the fringing field effect dominates. When W is much larger than tox, parallel plate
capacitance dominates. But when W is smaller than H, the wire-wire capacitance would
dominate. The optimum ratio for minimum capacitance is obtained when W/H is 1.75.
Figure 2.2 Interconnect Capacitance
Moreover, for low power design, the rule is to size up only the transistors that are on
critical paths to meet the speed requirement and keep the rest of transistors minimum size as
much as possible. Layout optimization is also crucial. The appropriate layout styles not only
minimize the diffusion capacitances, but also the interconnect length, and hence leads to
significant power saving.
2. 4 LOW POWER CLOCK GATING DESIGN

Dynamic power can contribute up to 50% of the total power dissipation. Clock-gating is
the most common RTL optimization for reducing dynamic power. Effective clock-gating
implementation requires skillful application and comprehensive verification. There is a vast array
of clock-gating techniques available to designers. Clearly not all of these are equal when it
comes to reducing switching activity. Many transformations are simple, while others are highly
guarded, patented algorithms.
Most clock-gating is done at the Register Transfer Level (RTL). RTL clock-gating
algorithms can be grouped into three categories: system-level, sequential and combinational.
System-level clock-gating stops the clock for an entire block, effectively disabling all
functionality. On the contrary, combinational and sequential clock-gating selectively suspend
clocking while the block continues to produce output. Combinational clock-gating is a
straightforward substitution to the RTL code. It reduces power by disabling the clock on registers
when the output is not changing. Opportunities to insert combinational clock-gating can be found
by looking for conditional assignments in the code.
Clock-gating logic is substituted when code like "if (cond) out <= in" is present.
Combinational clock-gating is now a feature in the RTL compilers. Power aware synthesis tools
identify RTL coding patterns and make the appropriate substitution. Hardware designers only
need to understand some simple RTL coding guidelines to gain the benefits of combinational
clock-gating.
Since combinational clock gated flops maintain a one to one state mapping with the
original RTL, Combinational Equivalence Checking Tools can be used for functional
verification. This makes verification simple to setup and comprehensive. On the other hand
because switching activity is eliminated only when data is not changing, the actual power
savings is limited. In typical designs, combinational clock-gating can reduce dynamic power by
about 5-to-10%.
Figure 2.3 Combinational Clock gating
Sequential clock-gating alters the RTL micro-architecture without affecting design

functionally. Power is optimized by identifying unused computations, data dependent functions
and don't-care cycles in the original code. There are many types of sequential clock-gating
transformations. Identifying opportunities for sequential clock-gating is difficult, requiring
sequential analysis. One example of a sequential optimization is turning off subsequent pipeline
stages based on a propagated valid condition. Because of the additional logic, this transformation
makes sense only if the datapath is multiple bits wide.
Sequential clock-gating is a multi-cycle optimization with multiple implementation
tradeoffs and RTL modifications. Consequently there is a greater demand on functional
verification resources. On the other hand sequential clock-gating can save significant power,
typically reducing switching activity by 15-to-25% on a given block.
Figure 2.4 Sequential Clock gating
Since sequential optimizations change the state of the design, Combinational Equivalence
Checking Tools cannot be used for verification. This is not the case for Sequential Equivalence
Checking (SEC). SEC tools can comprehensively verify sequential changes to RTL like clock-
gating. System-level clock-gating is designed into the original hardware architecture and coded
as part of the RTL functionality. For example, sleep modes in a cell phone may strategically
disable the display, keyboard or radio depending on the phones current operational mode.
System-level clock-gating shuts off entire RTL blocks. Because large sections of logic are not
switching for many cycles it has the most potential to save power. On the other hand these
modifications invasive to the design function. The enable logic is part of an overall power
management strategy and sometime includes consideration for software control.
Verification of system-level power optimizations must be thought through in the system-
level test plan. Most hardware engineers understand how to write RTL in such a way that
synthesis tools can recognize and automate combinational clock-gating. Likewise hardware
architects recognize and build in system-level clock-gating opportunities. Even with these
optimizations place, there is substantial dynamic power saving opportunities remaining in the
RTL if designers understand the cost / reward tradeoffs of sequential clock-gating.
2.5 SEQUENTIAL CLOCK-GATING IN THE DESIGN FLOW

A standard practice is for design teams to create a block-wise power budget at the
beginning of a project. As blocks are implemented, designers optimize those blocks that are over
budget. Accurate power analysis for technologies 90nm and below depends on physical place
and route information. Unfortunately this information is not available until late in the design
flow. This means sequential clock-gating is done late in the project, further highlighting the
importance of comprehensive verification. Identifying the enable condition is difficult for
sequential clock-gating. The enable logic can become very complex. Multi-cycle analysis of the
design is needed, making it nearly impossible to ensure correctness by construction.
To adequately verify sequential clock-gating with simulation, test benches must be
monitored and modified to cover all enable/disable conditions. Further complicating the
verification is the fact that clock-gating typically cross hierarchies and interacts with neighboring
blocks. Previously the cost of verification has limited the designer's ability to make sequential
clock-gating. Only mandatory sequential optimizations where allowed, that has changing with
the availability of Sequential Equivalence Checking tools .SEC functionally verifies sequential
optimizations by comparing the clock gated RTL to the corresponding original design. SEC uses
formal sequential analysis to verify all possible input sequences that enable and disable clocks
without test benches or assertions. This has the advantage of saving the time of modifying test
benches and running regressions. Additionally, SEC efficiently verifies clock-gating schemes
that cross hierarchies and block boundaries without having to conceive specific test bench
sequences.
2.6 CONVENTIONAL CLOCK DISTRIBUTION TECHNIQUES
There are a number of conventional clock distribution techniques that could be considered
for high-speed synchronous designs, each with specific advantages and disadvantages. Clock
distribution can be divided into largely two parts; final stage driver, and pre-driver network. The
final stage driver feeds the clock to the final load, which consists of the clocked circuits such as
flip-flops and latches. To distribute the clock signal from the central clock source to the final
stage drivers a pre-driver network is used.
2.6.1 TAPERED CLOCK BUFFER CHAIN

The simplest pre-driver technique used for clock distributions is to utilize a large buffer
chain. The clock signal is buffered through a number of gain stages to the final clock drivers.
Usually the technique is used to distribute the clock in a wide wire across the chip, from where
the clock is distributed using a more local clock distribution technique. The skew from the clock
source to the output of the buffer chain is not of any concern, because the output is only one
point, which means that the skew at the main clock wire on the chip is considered to be zero.
However, if the clock signals are tapped at different distances from the main clock wire the skew
will increase, and the skew follows the contour of the clock wire. Figure 2.5 shows an n-stage
tapered buffer chain where each inverter is upsized with a certain tapering factor compared to the
preceding one, thus providing a large final gain capable of driving the large clock load.
Figure 2.5 n-Stages Clock Buffer Chain
2.6.2 CLOCK TREES

If the final clock drivers are distributed across the chip, the pre-driver network must also
be able to distribute the clock. A technique to do this is to utilize so called binary trees, also
known as H-trees. Instead of buffering the clock through a large buffer chain placed on one spot
of the chip, the pre-driver buffers can be distributed across the H-tree network. To reduce the
skew in the tree all branches need to be balanced in order for their delays to be matched. This
balancing is done by matching the RLC delay of each wire segment, which can be a quite
cumbersome task. However, there are tools available that can do this automatically.
2.6.3 GRID CLOCK DISTRIBUTION

Clock grids are common as final driver networks because they require no matching of
delays from the source to the final loads. The clock is also accessible at many more points
compared to a binary tree, which makes the grid approach appealing to designers as it is
forgiving to late design changes. Clock skew is usually not considered an issue for a clock grid,
because it is usually implemented either dense or small, so that the delay between different taps
is limited. The principle is where the drivers feed the clock signal to the grid. An obvious
disadvantage with the grid approach is that it leads to a large amount of wiring, which results in
excessive loading of the clock drivers, thus causing additional clock power.
2.6.4 LENGTH-MATCHED SERPENTINES

Length-matched serpentines are a clocking technique that is similar to the clock tree, but
the length of the branches is adjusted to balance for any load mismatch. This way the skew
compared to the other serpentines can be minimized. This approach is also appealing for the
cases where the clock load is asymmetrically distributed. It is compatible with the buffer chain in
that it can distribute the clocks from the main clock wire to the final loads with minimum skew.
2.7 CONVENTIONAL LOW-POWER CLOCKING TECHNIQUES

Because of the clock power dissipation, a number of techniques have been proposed in
order to reduce the power dissipation in conventional clock networks. If most of the power
dissipated in the clock distribution networks is assumed to be due to active switching power,
there exist a limited number of possibilities to reduce the power dissipation according to the
expression in equation
2.7.1 FREQUENCY AND VOLTAGE REDUCTIONS

The clock frequency (fclk) has a direct linear relationship on the switching power
dissipation, which means that if the clock frequency can be reduced the clock power will
decrease. However, with an unchanged amount of hardware, the amount of operations that can be
done each second naturally reduces. This leads to a decrease in the throughput of the circuit. In
order to reduce this performance penalty, an increased amount of parallelism can be
incorporated. This leads to an area penalty and an increase in the clock load because more
circuits are needed, which also applies to the clocked circuits. Nevertheless, frequency is a
powerful tool in order to limit the average clock power in a system. Especially for systems where
the need for the highest performance is only required temporarily. During the more normal
operation modes, which require lower computing speed, the clock frequency can be lowered, and
the average clock power dissipation can thereby be reduced. Moreover, the switching power
varies as the square of the power supply voltage (Vdd). Hence considerable power savings is
possible by scaling down the power supply. As the delay of the digital gates increases with lower
power supply voltage, this will also lead to a reduction of the throughput. However, similarly to
the case with temporarily changing the frequency, the power supply voltage can also be reduced
when the required computation speed is low. Dynamic voltage and frequency scaling during
operation is something that is commonly used for microprocessors, which can incorporate a
number of different power-saving modes. When the computation need is low, both power supply
voltage and clock frequency of the chip is reduced, leading to considerable average clock and
total power reductions.
2.7.2 LOW-SWING CLOCKING

An alternative to scaling down the power supply for all circuits, including the clock
network, would be to scale down the power supply voltage for only the clock drivers. Using low-
swing clock signals on the global clock grid and then level-convert and use a high-voltage clock
locally have been proposed. The major part of the clock power is dissipated in the final driver,
thus a global low-swing clock technique can only reduce a minor part of the clock power.
Instead, techniques using low-swing clock signal all the way to the clock load have been
proposed, and proved feasible showing substantial clock power savings. However, as the driving
strength of the clock buffers in the clock network is reduced at lower power supply, to maintain
driving performance and robustness the clock buffers will need to be upsized, which can mitigate
some of the positive effects.
2.7.3 CLOCK GATING

The clock signal is the only signal in synchronous design which has an activity ratio of
one, meaning that it switches all the time. A common technique to limit the activity ratio at least
locally in a system is to incorporate so called clock gating. Clock gating means that the clock is
masked using a control signal, which blocks the clock signal coming from the clock source, to
circuits further downstream in the network. Clock gating can be performed at many different
levels of granularity. At the local level the final clock driver is gated, reducing both the power in
the final driver and the switching power in the clocked elements. The gating can also be done
higher in the clock network hierarchy, by gating the clock to larger blocks. Different architectural
methods to find the granularity of the gating and the potential blocks to be gated have been
proposed.
2.7.4 CLOCK LOAD REDUCTION

According to equation of the clock switching power changes linearly with the clock load,
CL, of the clock network. Reducing the clock load can be accomplished for instance by
designing clocked elements with reduced loading on the clock network, which could be
accomplished by downsizing of the transistors in the flip-flops. However, this leads to a negative
impact on the performance of the flip-flops, and therefore to a global performance penalty of the
design.
2.8 CHALLENGES WITH THE CLOCKED DESIGN

Most digital circuits are synchronous, which means that their operation is controlled by a
clock. Although the use of a clock has certain advantages in the design of a digital circuit, it also
introduces a number of significant problems that are becoming more serious and more prevalent
as technology becomes smaller and faster.
Following are some of the challenges with clocked design:
Chip partitioned into multiple timing domains: This makes the logic susceptible to
metastability along with additional latency.
Clock Distribution/Clock Skew.
Performance Overhead
Design being synchronous, single slow component or logic slows down the whole chip.
Clock consumes large part of the Chip Power (40-70%)
Challenges with designing reusable components: Design normally has to be altered when
migrating to a new SOC due to additional Clocking /System constraints.
Probably the most significant problem is clock skew, which is the difference in arrival
time of the clock signal to different parts of a circuit. When a circuit is large and slow, the clock
skew is insignificant. But as circuits shrink and their speeds grow, this difference becomes very
significant and extra design time and often extra circuitry needs to be used to solve the problem.
It is becoming difficult to distribute clock as network spreads over die and may have irregular
layout. With all of the problems caused by the clock, it is very tempting to simply remove it from
the system. This is the fundamental idea behind asynchronous design. However, it is not as
simple as just removing the clock, since the operation of the circuit must still be controlled
somehow. Asynchronous circuits essentially govern themselves, and are therefore called self-
timed circuits.
2.9 POWER OPTIMIZATION

Eliminating glitching is one of the most important techniques for power reduction in
CMOS logic. Glitch reduction can often be applied more effectively in sequential systems than is
possible in combinational logic. Sequential machines can use registers to stop the propagation of
glitches, independent of the logic function being implemented.
Figure 2.6 Power optimization
Many sequential timing optimizations can be thought of as retiming. Figure illustrates

how flip-flops can be used to reduce power consumption by blocking glitches from propagating
to high capacitance nodes. (The flip-flop and its clock connection do, of course, consume some
power of their own.) A well-placed flip-flop will be positioned after the logic with high signal
transition probabilities and before high capacitance nodes on the same path.
Blocking glitch Propagation Beyond retiming, we can also add extra levels of registers to
keep glitches from propagating. Adding registers can be useful when there are more glitch-
producing segments of logic than there are ranks of flip-flops to catch the glitches. Such changes,
however, will change the number of cycles required to compute the machines outputs and must
be compatible with the rest of the system. Proper state assignment may help reduce power
consumption. For example, a one-hot encoding requires only two signal transitions per cycle on
the old state and new state signals. However, one-hot encoding requires a large number of
memory elements. The power consumption of the logic that computes the required current-state
and next-state functions must also be taken into account.
90 NM TECHNOLOGY
The 90 nm process refers to the level of CMOS process technology that was
reached in the 20022003 timeframe, by most leading semiconductor companies, like Intel,
AMD, Infineon, Texas Instruments, IBM, and TSMC.
The origin of the 90 nm value is historical, as it reflects a trend of 70% scaling

every 23 years. The naming is formally determined by the International Technology Roadmap
for Semiconductors (ITRS).
The 193 nm wavelength was introduced by many (but not all) companies for
lithography of critical layers mainly during the 90 nm node. Yield issues associated with this
transition (due to the use of new photoresists) were reflected in the high costs associated with
this transition.Even more significantly, the 300 mm wafer size became mainstream at the 90 nm
node. The previous wafer size was 200 mm diameter.
As of 2009, 45 nm technology is largely replacing 90 nm and 65 nm technology in

leading-edge chip products. However, some products, notably chipsets, have moved from older
130 nm technology to the 90 nm process.
Example: Elpida 90 nm DDR2 SDRAM process
Use of 300 mm wafer size

Use of KrF (248 nm) lithography with optical proximity correction
512 Mbit 1.8 V operation Derivative of earlier 110 nm and 100 nm processes
MAJOR FABRICATION STEPS IN DEEP SUBMICRON TECHNOLOGY
Advanced silicon integration technology for 3D packaging now offers post-processing of CMOS
such as wafer thinning to 50m and through-wafer vias of <10m. These technologies might be
applied to create new tracking detectors which can handle vertexing under the difficult rate
conditions. The sensor layers can be only ~50m thick with low noise performance and better
radiation hardness by using small volume pixels. Multi-layer sensors with integrated coincidence
signal processing could discriminate real tracks from various sources of background
p and n wells
Shallow trench isolation
Threshold shift and anti-punch through implants
Thin oxide and gate polysilicon
Lightly doped drains and sources
Sidewall spacer
Heavily doped drains and sources
Siliciding (Salicide and Polycide)
Bottom metal, tungsten plugs, and oxide
Higher level metals, tungsten plugs/vias, and oxide
Top level metal, vias and protective oxide
Starting Material
The substrate should be highly doped to act like a good conductor.
Step 1 - n and p wells
These are the areas where the transistors will be fabricated - NMOS in the p-well and PMOS in
the n-well. Done by implantation followed by a deep diffusion.
Step 2 Shallow Trench Isolation

The shallow trench isolation (STI) electrically isolates one region/transistor from another. An
important advantage of STI is that it minimizes the heat cycle needed for n+ or p+ isolation
compared to LOCOS. This is a significant advantage for any process where there are implants
before STI.
Step 3 Threshold Shift and Anti-Punch Through Implants
The natural thresholds of the NMOS is about 0V and of the PMOS is about 1.2V. An pimplant
is used to make the NMOS harder to invert and the PMOS easier resulting in threshold voltages
balanced around zero volts. Also an implant can be applied to create a higher-doped region
beneath the channels to prevent punch-through from the drain depletion region extending to
source depletion region.
Step 4 Thin Oxide and Polysilicon Gates
A thin oxide is deposited followed by polysilicon. These layers are removed where they are not
wanted.
Step 5 Lightly Doped Drains and Sources
A lightly-doped implant is used to create a lightly-doped source and drain next to the channel of
the MOSFETs.
Step 6 Sidewall Spacer
A layer of dielectric is deposited on the surface and removed in such a way as to leave
sidewall spacers next to the thin-oxide-polysilicon-polycide sandwich. These sidewall spacers
will prevent the part of the source and drain next to the channel from becoming heavily doped.
Step 7 Implantation of the Heavily Doped Sources and Drains
Note that not only does this step provide the completed sources and drains but allows for ohmic
contact into the wells and substrate.
Step 8 Siliciding (Salicide and Polycide)

This step reduces the resistance of the bulk diffusions and polysilicon and forms an ohmiccontact
with material on which it is deposited. Salicide = Self-aligned silicid.
Step 9 Intermediate Oxide Layer
An oxide layer is used to cover the transistors and to planarize the surface.
Step 10- First-Level Metal
Tungsten plugs are built through the lower intermediate oxide layer to provide contact between
the devices, wells and substrate to the first-level metal.
Completed Fabrication
After multiple levels of metal are applied, the fabrication is completed with a thicker toplevel
metal and a protective layer to hermetically seal the circuit from the environment.Note that metal
is used for the upper level metal vias. The chip is electrically connected by removing the
protective layer over large bonding pads.
DSM technology typically has a minimum channel length between 0.35mand 0.1m
DSM technology addresses the problem of excessive depletion region widths injunction
isolation techniques by using shallow trench isolation
DSM technology may have from 4 to 8 levels of metal
Lightly doped drains and sources are a key aspect of DSM technology.
EXISTING SYSTEM
Pulse-triggered Flip Flop (P-FF) has been considered a popular alternative to the
conventional masterslave-based FF in the applications of high-speed operations. Besides the
speed advantage, its circuit simplicity is also beneficial to lowering the power consumption of
the clock tree system. A P-FF consists of a pulse generator for generating strobe signals and a
latch for data storage. Since triggering pulses generated on the transition edges of the clock
signal are very narrow in pulse width, the latch acts like an edge-triggered FF. The circuit
complexity of a P-FF is simplified since only one latch, as opposed to two used in conventional
masterslave configuration, is needed. P-FFs also allow time borrowing across clock cycle
boundaries and feature a zero or even negative setup time. P-FFs are thus less sensitive to clock
jitter. Despite these advantages, pulse generation circuitry requires delicate pulse width control in
the face of process variation and the configuration of pulse clock distribution network.
Depending on the method of pulse generation. Pulse-triggered flip-flops can be classified
into two types, implicit and explicit, and this classification is due to the pulse generators they
use. In implicit-pulse triggered flip-flops (ip-FF), the pulse is generated inside the flip-flop, for
example, hybrid latch flip-flip (HLFF), semi-dynamic flip-flop(SDFF) , and implicit-pulsed data-
close-to-output flip-flop (ip-DCO). Whereas, in explicit-pulse triggered flip-flops (ep-FF), the
pulse is generated externally, for example, ex-plicit-pulsed data-close-to-output flip-flop (ep-
DCO).
One effective technique to obtain power savings inside a flip-flop can be devised by realizing the
fact that a common property among various high-speed flip-flops is the utilization of dynamic
structure. This dynamic behavior causes a lot of power to be wasted as a result of unnecessary
internal switching activity, especially in moderate or lower data activity environments. Reducing
these activities can effectively result in reducing the overall power dissipation. In this regard,
several existing approaches to reduce the internal switching activity are surveyed and classified
into conditional precharge and conditional capture techniques.
In an implicit-type P-FF, the pulse generator is a built-in logic of the latch

design, and no explicit pulse signals are generated. In an explicit-type P-FF, the designs of pulse
generator and latch are separate. Implicit pulse generation is often considered to be more power
efficient than explicit pulse generation. This is because the former merely controls the
discharging path while the latter needs to physically generate a pulse train. Implicit-type designs,
however, face a lengthened discharging path in latch design, which leads to inferior timing
characteristics. The situation deteriorates further when low-power techniques such as conditional
capture, conditional precharge, conditional discharge, or conditional data mapping are applied.
TECHNIQUES FOR REDUCING SWITCHING ACTIVITY

Most of the flip-flops presented here are dynamic in nature, and some internal nodes are
precharged and evaluated in each cycle without producing any useful activity at the output when
the input is stable. Reducing this redundant switching activity has a profound effect in reducing
the power dissipation, and in the literature many techniques were presented for this purpose. A
brief survey of such techniques is conducted in this work, and the main techniques were
classified into: conditional precharge and conditional capture FF.
CONDITIONAL PRECHARGE TECHNIQUE
The general idea of this technique is that the precharging path is controlled to avoid precharging
the internal node when D stays HIGH. Below figure shows the general scheme of the conditional
precharge technique. In the absence of the pMOS precharge control and when D stays HIGH for
a long time, the discharge path will be on during the evaluation periods, causing node X to
discharge after each precharging phase. To eliminate these charging/discharging activities, a
pMOS transistor is inserted in the precharging path, which will prevent the precharging of node
X in case the data input is stable HIGH.
Conditional Precharge Technique
Conditional Capture Technique

This technique is based on the clock-gating idea and Figure shows the general scheme for
this technique. This technique is mainly applied for implicit pulse-triggered flip-flops such as
CCFF. Essentially these flip-flops employ the internal clock-gating approach. Flip-flops in this
category feature a transparent window period that is used to sample the input. This window,
created by an implicit pulse generator, is determined by the time when both clocked transistors in
the first stage are simultaneously on. After sampling a HIGH state at the input, the output Q will
be HIGH. This output state could be used to shut the transparent window as long as it is HIGH,
preventing the redundant activities of the internal node X. In this technique, a Q -controlled gate
is inserted on the path of the delayed clock to the first stage.
In Fig proposed conditional capture technique (CCFF) is introduced to reduce redundant power
at the internal node. This flip-flop employs a scheme much like the JK-type-flip-flop , but it adds
one more gate that is switching with the clock compared to HLFF . This addition leads to an
increase in the power consumed by the clock system, and it may offset the savings gained from
reducing the internal redundant switching power. Moreover, employing the double-edge
triggered technique will be complicated and the transistor count would increase because it
requires the duplication of the NOR gate and other clocked transistors. A revised condition
captured flip-flop fig is proposed to improve the energy-delay-product (EDP). A further
enhancement on this flip-flop could be employed to reduce the switching activity on the internal
node , which may further improve the EDP.
Conditional Capture Technique

Proposed Conditional Capture Technique
PULSE TRIGGERED FLIP FLOP (PTFF)
In Implicit-type P-FF designs, which are used as the reference design in later performance
comparisons, are first reviewed. A state-of-the-art P-FF design is given in Fig. It contains an
AND logic-based pulse generator. Inverters I5 and I6 are used to latch data and Inverters I7 and
I8 are used to hold the internal node X. The pulse generator takes complementary and delay
skewed clock signals to generate a transparent window equal in size to the delay by inverters I1-
I3.
Figure 4.4 Pulse-triggered Flip Flop
Two practical problems exist in this design. First, during the rising edge, nMOS transistors
N2 and N3 are turned on. If data remains high, node X will be discharged on every rising edge of
the clock. This leads to a large switching power. The other problem is that node X controls two
larger MOS transistors (P2 and N5). The large capacitive load to node X causes speed and power
performance degradation.
HYBRID LATCH FLIP-FLOP (HLFF)

Improved P-FF design, named MHLLF, by employing a static latch structure presented.
Node X is no longer precharged periodically by the clock signal. A weak pull-up transistor P1
controlled by the FF output signal Q is used to maintain the node X level at high when Q is zero.
This design eliminates the unnecessary discharging problem at node X. However, it encounters a
longer Data-to-Q (D-to-Q) delay during 0 to 1 transitions because node X is not pre-
discharged. Larger transistors N3 and N4 are required to enhance the discharging capability.
Figure 4.5 Hybrid Latch Flip Flop
SCCER DESIGN
low power P-FF design named SCCER using a conditional discharged technique . In this design,
the keeper logic (back-to-back inverters I7 and I8 in Fig. 1(a)) is replaced by a weak pull up
transistor P1 in conjunction with an inverter I2 to reduce the load capacitance of node X. The
discharge path contains nMOS transistors N2 and N1 connected in series. In order to eliminate
superfluous switching at node X, an extra nMOS transistor N3 is employed. Since N3 is
controlled by Q_fdbk, no discharge occurs if input data remains high. The worst case timing of
this design occurs when input data is 1 and node X is discharged through four transistors in
series, i.e., N1 through N4, while combating with the pull up transistor P1. A powerful pull-down
circuitry is thus needed to ensure node X can be properly discharged. This implies wider N1 and
N2 transistors and a longer delay from the delay inverter I1 to widen the discharge pulse width.
PROPOSED SYSTEM
The proposed design, as shown in Fig. 2, adopts two measures to overcome the
problems associated with existing P-FF designs. The first one is reducing the number of nMOS
transistors stacked in the discharging path. The second one is supporting a mechanism to
conditionally enhance the pull down strength when input data is 1. Refer to Fig. 2, the upper
part latch design is similar to the one employed in SCCER design [12]. As opposed to the
transistor stacking design in Fig. 1(a) and (c), transistor N2 is removed from the discharging
path. Transistor N2, in conjunction with an additional transistor N3, forms a two-input pass
transistor logic (PTL)-based AND gate [13], [14] to control the discharge of transistor N1. Since
the two inputs to the AND logic are mostly complementary (except during the transition edges of
the clock), the output node _ is kept at zero most of the time. When both input signals equal to
0 (during the falling edges of the clock), temporary floating at node _ is basically harmless. At
the rising edges of the clock, both transistors N2 and N3 are turned on and collaborate to pass a
weak logic high to node _, which then turns on transistor N1by a time span defined by the delay
inverter I1. The switching power at node _ can be reduced due to a diminished voltage swing.
Unlikethe MHLLF design [11], where the discharge control signal is drivenby a single transistor,
parallel conduction of two nMOS transistors (N2 and N3) speeds up the operations of pulse
generation.With this design measure, the number of stacked transistors along the discharging
path is reduced and the sizes of transistors N1-N5 can be reduced also.
In this design, the longest discharging path is formed when input data is
1 while the Qbar output is 1. To enhance the discharging under this
condition, transistor P3 is added. Transistor P3 is normally turned off because
node is pulled high most of the time. It steps in when node is discharged
to _ below the __. This provides additional boost to node _ (from
____ to __). The generated pulse is taller, which enhances the pull-down
strength of transistor N1. After the rising edge of the clock, the delay inverter
I1 drives node _ back to zero through transistor N3 to shut down the
discharging path. The voltage level of Node rises and turns off transistor P3
eventually. With the intervention of P3, the width of the generated
discharging
pulse is stretched out. This means to create a pulse with sufficient width for
correct data capturing, a bulky delay inverter design, which constitutes most
of the power consumption in pulse generation logic, is no longer needed. It
should be noted that this conditional pulse enhancement technique takes
effects only when the FF output _ is subject to a data change from 0 to 1. The
leads to a better power performance than those schemes using an
indiscriminate pulsewidth enhancement approach. Another benefit of this
conditional pulse enhancement scheme is the reduction in leakage power
due to shrunken transistors in the critical discharging path and in the delay
inverter.
Schematic of the proposed P-FF design with pulse Control Scheme

DESIGN TOOLS
Microwind is a tool for designing and simulating circuits at layout level. The tool features
full editing facilities (copy, cut, past, duplicate, move), various views (MOS characteristics, 2D
cross section, 3D process viewer), and an analog simulator. DSCH is software for logic design.
Based on primitives, a hierarchical circuit can be built and simulated. It also includes delay and
power consumption evaluation. Silicon is for 3D display of the atomic structure of silicon, with
emphasis on the silicon lattice, the dopants, and the silicon dioxide.
5.1 Tools from Microwind

Microwind
MICROWIND is truly integrated EDA software encompassing IC designs from concept to
completion, enabling chip designers to design beyond their imagination. MICROWIND
integrates traditionally separated front-end and back-end chip design into an integrated flow,
accelerating the design cycle and reduced design complexities. It tightly integrates mixed-signal
implementation with digital implementation, circuit simulation, transistor-level extraction and
verification providing an innovative education initiative to help individuals to develop the skills
needed for design positions in virtually every domain of IC industry.
DSCH (Schematic Editor and Digital Simulator)

The DSCH program is a logic editor and simulator. DSCH is used to validate the
architecture of the logic circuit before the microelectronics design is started. DSCH provides a
user-friendly environment for hierarchical logic design, and fast simulation with delay analysis,
which allows the design and validation of complex logic structures.
DSCH also features the symbols, models and assembly support for 8051 and 16F84
controllers. Designers can create logic circuits for interfacing with these controllers and verify
software programs using DSCH.
It have some important points are,

User-friendly environment for rapid design of logic circuits.
Supports hierarchical logic design.
Handles both conventional pattern-based logic simulation and intuitive on screen mouse-
driven simulation.
Improved built-in extractor which generates a SPICE netlist from the schematic diagram
(Compatible with PSPICETM and WinSpiceTM).
Generates a VERILOG description of the schematic for layout conversion.
Immediate access to symbol properties (Delay, fanout).
Model and assembly support for 8051 and PIC 16F84 microcontrollers.
Sub-micron, deep-submicron, nanoscale technology support.
Supported by huge symbol library.
Prothumb (Mix-signal Simulator)

No SPICE or external simulator is needed for verification of CMOS circuits. Microwind
program has in built analog like simulator which supports MOS Level 1, Level 3 or BSIM4
model. With features like fast time-domain, voltage and current estimation, very intuitive post
processing, frequency estimation, delay estimation, makes PROthumb a time saver. Even power
estimation of circuit simulation can be checked on-screen.
It have some important points are,

Built-in SPICE-like analog simulator.
Features fast time-domain, voltage and current estimation, with very intuitive post
processing: frequency estimation, delay estimation. (No external SPICE/ analog
Simulator required).
Supports level1, level3 and BSIM4 models for all technologies from 1.2m till 22 nm.
MOS characteristic viewer with access to parameters of main model.
Time-domain voltage and current waveforms available at the press of one single button.
DC/AC characteristics, signal frequency vs. time, eye diagrams Min/Typ/Max analog
simulation.
Eye diagram view for signal output.
On screen power estimation.
Onscreen storage of waveforms for result hold-on.
Forward & backward buttons to move in simulation results.
Nanolambda (Precision CMOS Layout Editor)

MICROWIND possess a precision CMOS layout editor, which supports technologies
right from 1.2um till 22 nm with unsurpassed illustration capabilities. With its enhanced editing
commands and layout control your development times would be shorter than you ever imagined .
Some important points are,
Huge technology support till 22 nanometers

Design-error-free cell library (Contacts, vias, MOS devices, etc.).
Powerful automatic compiler from Verilog structure circuit into layout.
On-line design rule checker with large rule base.
Protutor ( MOS Characteristics tutor)

Valuable screen to understand the MOS characteristics, with a user interface that
designers will like. Change the model parameters and see their effects on Id/Vd, Id/Vg
Id(log)/Vg, threshold vs. length. You can also fit the simulations with measurements we made in
test-chips fabricated in 0.35, 0.25 and 0.18m. In the manual, a tutorial on MOS models is
given, with details on all parameters. Some Points are,
Change the model parameters and see their effects on Id/Vd, Id/Vg, Id(log)/Vg, threshold
vs Length.
You can also fit the simulations with measurements we made in test chips fabricated in
0.35, 0.25 and 0.18 m.
Full length tutorial on MOS models is provided in manual, with details on all parameters.
Documentation includes several aspects of MOS modeling
MEMSim (Floating Gate Memory Simulator)

The double-gate MOS has been introduced in MICROWIND for the simulation of non-
volatile memories such as EPROM, EEPROM and FLASH. The command "UV exposure" erases
floating gates and removes all electrons. The programming is performed by a very high voltage
supply on the gate (7V in 0.12m), a 1.2V voltage difference between drain and source. Some
electrons are sufficiently accelerated to pass through the gate oxide by hot tunneling effect.
Highlights are,
Simulation of non-volatile memories such as EPROM, EEPROM and FLASH using

double-gate MOS.
Erasure of floating gates and removal all electrons.
Programming can be performed by a very high voltage supply on the gate.
VirtualFab (Cross sectional and 3D Viewer)

You will never teach deep-sub micron technology like before. As VirtualFab offers you a
facility to analyze and view cross sectional view of silicon layers and 3D view of circuits. With
MICROWIND v3.1 enables to draw real-time images of the layout and navigate in full-3D on
the surface or inside the IC. This command is based on OpenGL and offers outstanding picture
quality. The user can modify the viewing position in X, Y, Z and play with light sources to create
illustrative views of the layout. Some points are,
3D fabrication process simulator with cross sectional viewer.

Step-by-step 3-D visualization of fabrication for any portion of layout.
See how the contacts and metallization are created.
See the self-aligned diffusion after the polysilicon gate is fabricated.
Check planes of VDD, VSS, and others signals.
5.2 MICROWIND TOOL DESIGN FLOW

Integrated circuits have changed the way of our life. You name one gadget and will find
the power of silicon which has made such complex electronic circuits possible. Integrated
circuits come in many different flavors these days. User designed chips in particular , CPLDs and
FPGAs have revolutionized the way of system design. But ASIC remains in lead, due to their
speed, power and performance advantages. Every critical system design is flagged with ASICs.
To learn the IC design process, techniques & 'critical' requirement handling, engineers practice
for hours and hours.on EDA tools to master know-how of design fundamentals.
Modern ASIC design tools like DSCH and MICROWIND provides very easy to go-
through design flow for CMOS IC designs. It supports traditional schematic circuit building
methods, layout editing, various analysis & verification methods, and fab sign off. But more than
rights & lefts of IC design flow, it's the basic design methodology and circuit building techniques
which leads to success in fabrication. But many of times, engineers face hurdles during
simulation and failures in prototyping.
EDA tools like MICROWIND & DSCH, which offers a complete IC design flow, which
starts with schematic building of digital circuits and then converting into verilog file for
compilation in CMOS layout using MICROWIND layout compiler. Every engineer needs to
verify circuit before going for Fabrication. FPGAs are best available platform for ASIC
prototyping.
A prototype is A system model to test and develop the product before its final
implementation.Field Programmable Gate Arrays (FPGA) are build around using Look-Up
Tables (LUTs) and switch matrix, and are rich in resources. Advantages like high gate density,
flexibility, moderate speed, etc. gives ideal platform to ASIC designers for prototyping their
designs before going for fabrication of ASIC.
MICROWIND supports entire front-end to back-end design flow .For front-end

designing, we have DSCH (digital schematic editor) which posses in-built pattern based
simulator for digital circuits. User can also build analog circuits and convert them into SPICE
files and use 3rd party simulators like WinSpice or pSPICE.
DSCH can convert the digital circuits into Verilog file which can be further synthesized
for FPGA/CPLD devices of any vendor. The same Verilog file can be compiled for layout
conversion in MICROWIND.
The back-end design of circuits is supported by MICROWIND. User can design digital
circuits and compile here using Verilog file. MICROWIND automatically generates a error free
CMOS layout. Although this place-route is not optimized enough as we do not indulge in
complex place & route algorithms.
User can also create CMOS layout of their own using compile one line Verilog syntax or
custom build the layouts by manual drawing.
The CMOS layouts can be verified using inbuilt mix-signal simulator and analyzed further
for DRC, cross talks, delays, 2D cross section, 3D view, etc.
5.3 SYNCHRONOUS DESIGN TECHNIQUES
Here are some global design guideline for successful implementation of digital circuits for ASIC
and FPGA platforms.
1. Use a single master clock and maintain synchronous flow of data.
2. Use a single master set or reset. Preferably, use asynchronous resets because they work
independently from the clock. When an asynchronous reset establishes the initial state, it puts the
entire circuit into a known state and helps make logic simulation and manufacturing test easier.
Keep in mind that CMOS ASIC technology prefers active-low asynchronous set or reset, but
often FPGAs use active high.
3. Avoid race conditions on de-asserting concurrent set and preset signals. You cannot predict in
simulation how the flip-flop will behave when both set and reset are de-asserted close in time.
4. Do not use delayed logic or monostable pulse generators, which relies on delays for its
operation (they are unpredictable in ASIC & FPGA). Instead use synchronous pulse generators
which have known timings and does not generates glitches.
5. Use clock-enabled flip-flops for clock division. In many FPGA implementations, ripple clock
dividers are popular. Not only can ripple clock dividers cause problems with EDAtools, the
generated clock will experience a phase delay.
6. Use clock-enabled flip-flops to avoid glitching state decoders. FPGAs are sometimes tolerant
when a state decoder goes through 11 while changing from 01 to 10. To ASICs, this causes
implementation-dependent glitches. Using clock-enabled flip-flops not only avoids glitches but
also adds no additional clock delays.
7. Have resets and transition states for Finite State Machines. Although FSMs are usually
synchronous, they still can have issues during the conversion process. Make sure there are no
dead states because during power-up the FSM can enter an unused state. Make sure reset is also
available on your FSM to make life easier during simulation and test vector generation.
8 Avoid latches; use flip-flops instead. Latches cause complications with static timing and
timing-driven layout tools. Latches are difficult to analyze, and the gate savings between a latch
and a flop are less important with submicron technology.
THE MOS AS A SWITCH

The MOS transistor is basically a switch. When used in logic cell design, it can be on or
between drain and source. The MOS is turned on or off depending on the gate voltage. In CMOS
technology, both n-channel (or nMOS) and pchannel MOS (or pMOS) devices exist. The nMOS
and pMOS symbols are reported below. The symbols for the ground voltage source (0 or VSS)
and the supply (1 or VDD).
The n-channel MOS device requires a logic value 1 (or a supply VDD) to be on. In
contrary, the p-channel MOS device requires a logic value 0 to be on. When the MOS device is
on, the link between the source and drain is equivalent to a resistance. The order of range of this
on resistance is 100-5K. The off resistance is considered infinite at first order, as its value
is several M.
5.4THREE LEVELS OF DESIGN IN MICROWIND AND DSCH
The specifications we are going to see may be different for different foundry and technology.
Design Example (3 Levels): NOR Gate
Logic Design
Circuit Design
Layout Design
Microwind / DSCH NOR Example: NOR Gate Logic

Open the Schematic Editor in Microwind (DSCH3). Click on the transistor symbol in the
symbol Library on the right.
Instantiate NMOS or PMOS transistors from the symbol library and place them in the
editor window.
Instantiate 2 NMOS and 2 PMOS transistors.
Connect the drains and sources of transistors.
Connect Vdd and GND to the schematic.
Connect input button and output LED.
Now we have NOR schematic ready.
Use your logic simulator to verify the functionality of your schematic.
The next step is to simulate the circuit and check for functionality.
Click on, Simulate -> Start simulation.
This brings up a Simulation Control Window.
Click on the input buttons to set them to 1 or 0. Red color in a switch indicates a '1'. As
shown,
The simulation output can be observed as a waveform after the application of the inputs as
above. Click on the timing diagram icon in the icon menu to see the timing diagram of the input
and output waveforms.
Simulate your system with your hand calculated transistor sizes.
Click File -> Make Verilog File. The Verilog, Hierarchy and Netlist window appears. This
window shows the verilog representation of NOR gate.
Click OK to save the Verilog as a .txt file.
Microwind / DSCH NOR Example: Circuit Design
Open the layout editor window in Microwind. Click File -> Select Foundry and select
X.rul.This sets your layout designs in X technology.
Click on Compile -> Compile Verilog File. An Open Window appears Select the .txt
verilog file saved before and open it.
After selecting the .txt file, a new window appears called Verilog file.Click on Size on the
right top menus. This shows up the NMOS and PMOS sizes. Set the sizes according to choice.
Click Compile and then Back to editor in the Verilog File Window. This creates a layout in
layout editor window using automatic layout generation procedure.
Add a capacitance to the output of the design. The value of the capacitance depends on your
choice.
Click on OK. The capacitance is shown on the left bottom corner with a value of 0.015fF.
Click Simulate -> Run simulation. A simulation window appears with inputs and output,
shows the tphl, tplh and tp of the circuit. The power consumption is also shown on the
right bottom portion of the window.
If you are unable to meet the specifications of the circuit change the transistor sizes.
Generate the layout again and run the simulations till you achieve your target delays.
Microwind / DSCH NOR Example: Layout Design
Design the layout manually. Open the layout editor window in Microwind. Click File ->
Select Foundry and select X.rul, Vdd and GND rails are of Metal1. The top rail is used as
Vdd and the bottom one as GND. Click on Metal 1 in the palette and then creates the
required rectangle in the layout window.
The next step is to build the NMOS transistors. Click on the transistor symbol in the
palette. Set the W, L of the transistor
Then click on Generate device. The source of the transistor is connected to the GND rail.
Create another NMOS and place it in parallel to the first NMOS device. We share the two
devices' drain diffusions. A DRC check can be run by clicking on Analysis -> Design
Rule Checker.
The next step is to place two PMOS transistors in series. Place the PMOs transistor on
layout close to the Vdd rail on the top. To construct two PMOS transistors in series,
diffusions are shifted to a side and another poly line is added as second transistor. The
diffusion is shared to save area and reduce capacitance.
The next step is to connect the inputs and the output of the two transistors. Poly inputs is
connected. Metal output is connected.
The next step is to connect the poly to metal1 and then to metal2. The first symbol in the
first row of the palette is the poly to metal1 contact.
Then we connect the metal1 to metal2 contact to the previous contact. This is the 4th
contact on the first row
The next step is to connect the output Metal1 to Metal2. Once again use the 4 th contact in
the first row.
Now we connect metal2 to the two inputs and one output and bring them to the top to go
out of the cell. Observe the two inputs (left & right) and an output (middle) above the
Vdd rail in dark blue color.
Now we label the inputs and output as In1, In2 and out. Click on Add a Pulse Symbol in
the palette (5th from the right in the 3rd row). Then click on the metal2 of one of the inputs. A
window appears. Change the name of the input signal. Insert a 01 sequences and click on Insert.
The click on Assign. Similarly assign the 2nd input a pulse.
Select the Visible Node symbol from the palette (7 th in the third row). Select it and click
on the output. The 'Add a Visible Property' window appears. Change the label name to out.
Select Visible in Simulation. Click on Assign. Now the output is also labeled.
Select Vdd Supply and GND from the palette (third row). Also click on the capacitor (3rd
in 2nd row) symbol and add it to the output. Also, extend the pwell into the Vdd Rail. The click
on Edit -> Generate -> Contacts. Select PATH and then in Metal choose Metal1 and N+
polarization.
To run
the
Simulation of your circuit, click on Simulate -> Start Simulation. Depending on the input
sequences assigned at the input the output is observed in the simulation. The power value is also
given.
SIMULATION RESULTS
Ip-dco
MLLFF
SCCER
Proposed PFF
APPLICATION
Nanometer CMOS technology

Communication signal processing ICs
Multimedia signal processing IC
D.S.P processors
FPGA System Arithmetic logic unit
CONCLUSION AND FUTURE WORK

In this paper, the various Flip flop design like, ip-DCO, MHLLF and SCCER are discussed.
These were been also designed in Wicrowind tool and those result waveforms are also discussed.
The comparison table also added to verify the designed methods. With these all results SCCER
performed better than ip-DCO and MHLLF designs.
Future work
To improve the performance design of the P-Flip flop, The Pulse enhancement scheme will be
designed and also these results will be discussed with the existing pulse trigger Flip Flop
REFERENCES
[1]H. Kawaguchi and T. Sakurai, A reduced clockswing flip-flop (RCSFF) for 63% power
reduction, IEEE J. Solid-State Circuits, vol.33, no. 5, pp. 807-811, May 1998.
[2] A. G. M. Strollo, D. De Caro, E. Napoli, and N. Petra,A novel high speed sense-amplifier-
based flip-flop, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 13, no. 11, pp. 1266-
1274, Nov. 2005.
[3] H. Partovi, R. Burd, U. Salim, F. Weber, L. DiGregorio, and D. Draper, Flow-through latch
and edge-triggered flip-flop hybrid elements, in IEEE Tech. Dig. ISSCC, 1996, pp. 138-13.
[4]F. Klass, C. Amir, A. Das, K. Aingaran, C. Truong, R. Wang, A. Mehta, R. Heald, and G. Yee,
A new family of semi-dynamic and dynamic flip flops with embedded logic for high-
performance processors, IEEE J. Solid-State Circuits, vol. 34, no. 5, pp. 712-716, May 1999.
[5] S. D. Naffziger, G. Colon-Bonet, T. Fischer, R. Riedlinger, T. J. Sullivan, and T. Grutkowski,

The implementation of the Itanium 2 microprocessor, IEEE J. Solid-State Circuits, vol. 37, no.
11, pp.1448-1460, Nov. 2002.
[6] J. Tschanz, S. Narendra, Z. Chen, S. Borkar, M. Sachdev, and V. De, Comparative delay and
energy of single edge-triggered and dual edge triggered pulsed flip-flops for high-performance
microprocessors, in Proc. ISPLED, 2001, pp. 207-212.
[7] B. Kong, S. Kim, and Y. Jun, Conditional-capture flip-flop for statis- tical power reduction,
IEEE J. Solid-State Circuits, vol. 36, no. 8, pp.1263-1271, Aug. 2001.
www.asic-world.com
www.vlsi-world.com
www.microwind.net
www.circuitstoday.com

22

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

22

Enviado por

Direitos autorais:

Formatos disponíveis

INTRODUCTION

Today's technologies make possible powerful computing devices with multi-media

Figure 2.1 Latches with Clock Gating

2.3 SWITCHING CAPACITANCE REDUCTION

Figure 2.2 Interconnect Capacitance

2. 4 LOW POWER CLOCK GATING DESIGN

Sequential clock-gating alters the RTL micro-architecture without affecting design

2.5 SEQUENTIAL CLOCK-GATING IN THE DESIGN FLOW

2.6.1 TAPERED CLOCK BUFFER CHAIN

Figure 2.5 n-Stages Clock Buffer Chain

2.6.2 CLOCK TREES

2.6.3 GRID CLOCK DISTRIBUTION

2.6.4 LENGTH-MATCHED SERPENTINES

2.7 CONVENTIONAL LOW-POWER CLOCKING TECHNIQUES

2.7.1 FREQUENCY AND VOLTAGE REDUCTIONS

2.7.2 LOW-SWING CLOCKING

2.7.3 CLOCK GATING

2.7.4 CLOCK LOAD REDUCTION

2.8 CHALLENGES WITH THE CLOCKED DESIGN

2.9 POWER OPTIMIZATION

Many sequential timing optimizations can be thought of as retiming. Figure illustrates

The origin of the 90 nm value is historical, as it reflects a trend of 70% scaling

As of 2009, 45 nm technology is largely replacing 90 nm and 65 nm technology in

Example: Elpida 90 nm DDR2 SDRAM process

Use of 300 mm wafer size

The substrate should be highly doped to act like a good conductor.

Step 1 - n and p wells

Step 2 Shallow Trench Isolation

Step 3 Threshold Shift and Anti-Punch Through Implants

Step 4 Thin Oxide and Polysilicon Gates

Step 5 Lightly Doped Drains and Sources

Step 6 Sidewall Spacer

Step 7 Implantation of the Heavily Doped Sources and Drains

Step 8 Siliciding (Salicide and Polycide)

Step 9 Intermediate Oxide Layer

Step 10- First-Level Metal

DSM technology may have from 4 to 8 levels of metal

In an implicit-type P-FF, the pulse generator is a built-in logic of the latch

TECHNIQUES FOR REDUCING SWITCHING ACTIVITY

Conditional Precharge Technique

Conditional Capture Technique

Conditional Capture Technique

PULSE TRIGGERED FLIP FLOP (PTFF)

Figure 4.4 Pulse-triggered Flip Flop

HYBRID LATCH FLIP-FLOP (HLFF)

Schematic of the proposed P-FF design with pulse Control Scheme

5.1 Tools from Microwind

DSCH (Schematic Editor and Digital Simulator)

It have some important points are,

Prothumb (Mix-signal Simulator)

It have some important points are,

Nanolambda (Precision CMOS Layout Editor)

Huge technology support till 22 nanometers

Protutor ( MOS Characteristics tutor)

MEMSim (Floating Gate Memory Simulator)

Simulation of non-volatile memories such as EPROM, EEPROM and FLASH using

VirtualFab (Cross sectional and 3D Viewer)

3D fabrication process simulator with cross sectional viewer.

5.2 MICROWIND TOOL DESIGN FLOW

MICROWIND supports entire front-end to back-end design flow .For front-end

5.3 SYNCHRONOUS DESIGN TECHNIQUES

1. Use a single master clock and maintain synchronous flow of data.

THE MOS AS A SWITCH

5.4THREE LEVELS OF DESIGN IN MICROWIND AND DSCH