Você está na página 1de 27

Dynamic and Leakage Power Reduction in MTCMOS Circuits Using an Automated Efficient Gate Clustering Technique

Mohab Anis, Shawki Areibi *, Mohamed Mahmoud and Mohamed Elmasry


VLSI Research Group, University of Waterloo, Canada * School of Engineering, University of Guelph, Canada
1

Presentation Outline
Low Power Design in DSM Concept of sleep transistors Previous work Sizing the sleep transistor Bin-Packing technique Set-Partitioning technique Conclusion and extended work done
2

Why Low Power Design ?


Growing market of mobile and handheld electronic systems. Difficulty in providing adequate cooling. Fans create noise and add to cost. Heat dissipation impacts packaging technology and cost Increasing standby time of portable devices.

In DSM regimes, leakage power has become as big a problem as dynamic power
3

Concept of sleep transistors


MTCMOS technology is an increasingly popular technique to reduce leakage power
LVT Logic Block VX LVT Logic Block

VX
R I

Proper ST sizing is a key issue

SLEEP

HVT

ST size

Area

, Pdynamic

, Pleakage Modeling of a sleep transistor as a resistor

ST size

Delay

First Approach [1]


Single ST to support whole circuit Increase in interconnect resistance for distant blocks ST size to compensate added resistance Area Pdynamic Pleakage

LVT Logic Circuit

SLEEP

HVT

More significant in the DSM regime


[1] S.Mutah et al. 1-V Power Supply High-Speed Digital Circuit Technology with Multi-Threshold Voltage 5 CMOS, IEEE J. of Solid-State Circuits, pp.847-853, 1995.

Second Approach [2]


Single ST is sized according to a mutual exclusive discharge pattern algorithm.

ST assignments are wasteful.

G1

G4

G6 G7

G8 G9 G10

G2

Increase in interconnect resistance for

distant blocks. ST size


added resistance. Pdynamic Pleakage

to compensate

G3

G5

More significant in the DSM regime.


[2] J.Kao et al. MTCMOS Hierarchical Sizing Based on Mutual Exclusive Discharge Patterns, in Proc. of 35 th DAC, 6 pp. 495-500, Las Vegas, 1998

Sizing the sleep transistor


Objective: Constant ST size, causing 5% degradation in circuit speed.
(W/L)sleep =

Isleep 0.05 n Cox (Vdd-VtL)(Vdd-VtH)

Isleep is chosen to be 250 A. (W/L)sleep 6 for 0.18 m CMOS technology VtL = 350mV, VtH = 500mV
7

4-bit CLA Adder

Preprocessing of Gate Currents


Random I/Ps to CLA adder are applied, highest current discharge is monitored, and multiplied by corresponding switching activity

Monitor the peak current value and time of occurrence + duration

Currents are combined into single current Ieq = max{Ii}, when Ii in time max{Ii}
9

Timing Diagram

G1 T1 65 I1 (G1)

F0=2 F0=4 G2 T2 79 T1+T2=210psec 260psec

T1=80psec I2 (G2) 120psec

time time

I1 (G1):
I2 (G2):

0 0 11 22 33 43 54 65 54 43 33 22 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 12 18 24 30 37 43 49 55 61 67 73 79 73 67 61 55 49 43 37 30 24 18 12 6 0 0 0 0 0 0 0

10

Preprocessing Heuristic
1. 2. 3. Initialize current vectors Set all Gates free; to move to sub-cluster; For all gates in circuit If gate i is not clustered yet assign gate i to new cluster k update cluster current vector calculate max current, start, end time For all other gates in circuit If (gate j is not clustered yet) add current of gate j to cluster k If (combination max current) append gate to cluster update cluster info set gate j locked in cluster k End For End For Return all clusters formed.
11

4.

Bin-Packing Technique
Objective: Minimize the No. of used STs.
Subject to: 1. Ieq Imax for any ST. 2. Ieq are assigned only once.

12

Currents Assignment
Sleep Transistors 1 2 IEQ1 IEQ2 IEQ5 IEQ6 G1 G2 G3 G4 G9 G10 G11 G12 G13 G15 G17 G19 G20 G21 G22 G24 G25 G26 G27 G28

Equivalent Currents Assigned Gates

IEQ3 IEQ4 IEQ7


G5 G6 G7 G8 G14 G16 G18 G23

Currents (A)

250

240
13

Clustering of CLA adder

14

Set-Partitioning Technique
Cell Lmin

Sleep Device cavity Ground rail

Vdd
Cell Height

G1

G2 G10
G20

G3 G11 G25

G4

G5 G13 G26

G6 G14

G7 G15

G8
gnd

G19

G9 G24

G12
G21

G16

G17

G18
Vdd

G22

G27

G23 G28
gnd

15

Cost Function
Cj = ( w1 . Cj1 ) + ( w2 . Cj2 )

Cj1 = Sleep_Transistor max_current - currenti i


Cj2 = duv in a group Sj Gv
duv

dvw

Sj

Gw Gu
dwu
16

Clustering Heuristic
Create_Clusters ( ) 1. Calculate distances between all gates; 2. Initialize maxgates_per_cluster=n; 3. Create clusters with Single gates; 4. For cl=2; cl maxgates_per_cluster Create_n_Gate_Cluster (cl) 5. For all clusters created calculate_cost ( ) Create_n_Gate_Clusters (cl) 1. For cluster of type cl create_new_cluster ( ) While not done Choose Gate with minimum distances If sum of currents capacity append gate to newly created cluster End If If total gates within cluster limit break; End While End For 2. Return newly created cluster

17

Set-Partitioning Technique
Objective: Minimize CjSj Subject to: 1. of currents for Sj Imax 2. Groups must cover all gates with no repetition.

18

Grouping of gates
Cell Lmin

Sleep Device cavity Ground rail

Cell Height

Vdd

G1

G2 G10
G20

G3 G11 G25

G4

G5 G13 G26

G6 G14

G7 G15

G8
gnd

G19

G9 G24

G12
G21

G16

G17

G18
Vdd

G22

G27

G23 G28
gnd

19

Computational Time
BP/SP CPU TIME
SP CPU Time 2000 1800 1600 1400 1200 1000 800 600 400 200 0 -20028 BP CPU Time

Time (secs)

30

31

61

160

204

Number of Gates

20

Results (% Savings)
REF Benchmark 4-bit CLA adder 28 14 % 12 % 96 % 93 % 95, 92 % 7% 5% 87 % 78 % 87, 77 % 32-bit Parity Checker 31 18 % 16 % 92 % 85 % 92, 85 % 9% 6% 85 % 70 % 84, 69 % 6-bit Multiplier 4-bit 74181 ALU 61 17 % 14 % 93 % 83 % 93, 83 % 11 % 8% 86 % 66 % 86, 67 % 32-bit Single Error Correcting C499 202 20 % 19 % 95 % 89 % 95, 89 % 9% 8% 87 % 71 % 86, 70 % 27-channel interrupt controller C432 160 2% 0% 99 % 89 % 99, 88 % 2% 0% 98 % 77 % 98, 76 % 21 No. of gates Pdynamic to [1] Pdynamic to [2] BP Pleakage to [1] Pleakage to [2] ST_Area [1],[2] Pdynamic to [1] Pdynamic to [2] SP Pleakage to [1] Pleakage to [2] ST_Area [1],[2] 30 31 % 23 % 95 % 78 % 95, 78 % 19 % 9% 85 % 35 % 85, 34 %

% Power Savings (Bin-Packing)


Pdyn/1
100 90 80 70 60 50 40 30 20 10 0 CLA Parity Mult ALU Error C432

Pdyn/2

Pleak/1

Pleak/2

Benchmarks

22

% Power Savings (Set-Partitioning)


Pdyn/1 100 80 60 40 20 0 CLA Parity Mult ALU Error C432 Benchmarks Pdyn/2 Pleak/1 Pleak/2

23

% ST Area Saving (Bin-Packing)


St-Area[1] 100 80 60 40 20 0 CLA Parity Mult ALU Error C432 Benchmarks St-Area[2]

24

% ST Area Saving (Set-Partitioning)


St-Area[1] St-Area[2]

100 80 60 40 20 0 CLA Parity Mult ALU Error C432


Benchmarks
25

Conclusion
BP technique cluster gates in MTCMOS circuits. Pdynamic and Pleakage are reduced by 15% and 90% compared to [1] and [2] respectively. SP takes routing complexity into consideration. Pdynamic and Pleakage are reduced by 11% and 77% compared to [1] and [2] respectively.
26

Extended Work Done


A hybrid clustering technique that combines the BP and SP techniques is devised, to produce a more efficient and faster solution. Noise associated with ground bounce is taken as taken as a design criterion (< 50mV). Investigating effect of different ST sizes on circuit parameters. Investigating effect of the cost function weights w1 and w2 on circuit parameters.

27

Você também pode gostar