major priority of a designer attempting to meet implementation requirements Power optimization reduction of its switching activity of switched capacitance Optimization for combinational and sequential circuits are provided Exiting techniques the optimization design flow followed Introduction Reliability hot spot detection electro migration and packaging Power consumption is major concern All levels of design flow starting from the system and reaching down to layout level Gate level design techniques Logic level optimization
Logic level optimization It is the design task where RT level circuit description is optimized in terms of some criteria such as area, time and power Output is optimized gate net list Two basic steps 1 the technology independent 2 Technology dependent Optimization Circuit Boolean description is optimized ignoring the technology No, of literals of the Boolean function No. of levels of the Boolean network Above two used to estimate area and delay Estimation inaccurate, the tech independent optimization step is important Optimized Boolean network can be reused in a future technology change Decomposing the whole optimization problem in two sub problems and solving each problem individually initial problem complexity is reduced achieving better results Output of technology independent optimization is optimized considering the adopted technology Technology independent optimization Technology independent optimization techniques are classified as Optimization techniques for combinational circuit Optimization techniques for sequential circuit Combinational circuit is further classified number of circuit levels Two level logic minimization targets at 2 level circuit implementation - PLA and PLDs Multilevel logic optimization targets to cell based design such as standard cells and gate array of FPGAs Sequential circuit
State assignment step to encode the states of FSM aiming at the minimization of the circuit area Logic optimization step techniques retiming reduction of the circuit delay Adopted technology library Logic circuit is optimized considering the adopted technology library Three steps technology decomposes ( circuit is decomposed) into basic gates Technology mapping where the circuit is mapped to the gates of the library Post mapping optimization step where techniques for instance rewiring are taken place Output of the technology dependent step is an optimized gate netlist used for floor planning, placement, routing and layout optimization Optimization of a logic circuit Lot of techniques and algorithm have been developed in the past aiming at the optimization of a logic circuit Due to dynamic power consumption becomes three dimensional optimization (area-time-power) Dynamic power dissipation is proportional to the switching activity of the circuit Appropriate switching activity estimators are used to calculate quickly and with high accuracy the switching activity of the circuit during optimization Combinational circuit technology independent optimization Two level logic minimization Minimum area for programmable devices or multilevel circuits 2 level Boolean logic minimization algorithms for area such as Expresso, Expresso Exact , Mini Employing statistical characteristics of input signal method that modifies the expand routine of Expresso has been proposed Cost function to reduce the switching activity Drawbacks static probability assumed to be equal (0.5) for every input signal input signals are spato temporally uncorrelated Power dissipation of the input signals is not taken into account temporal correlation Power prime implements was proposed to identify the set of all implicants that are sufficient and necessary for obtaining min power (PPI) Since all signals are assumed spatiotemp uncorrelated proposed solution is not accurate Accurate method presented temporal correlation of the signal and power consumption of the primary input signals are taken into considerations It forces the output of the AND gates to stay at low logic level for a number of combinations of the applied input vectors Reduced switching activity and power consumption Two level circuits Goal: Signal with low switching activity to appear as many times as it is possible in the first level of circuit In the example x 3 - blocking variable uses in the main routines of Expresso Series of iterations a circuit with reduced power consumption Number of iterations is user specified Upper and lower bounds of average power consumption/gate Number of groups of logic variables with similar statistical parameters MULTILEVEL CIRCUITS
Techniques are classified as 1) Boolean optimization techniques use dont care set 2) Algebraic optimization techniques without using dont care subset of Demorgans law is used distribute law is applied while complement of logic variable is not defined faster than Boolean quality is smaller BOOLEAN TECHNIQUES
Dont care reduces the area and time improvement Total power consumption may be increased transitive fan out nodes are changed Without increasing the switching activity The impact of transformation on the function f must be exactly known when g is optimized Switching activity of node f as a function of g is optimized Dont care of node g consist 1) External dont care set (EDCs) 2) Satisfiability dont care set (SDCs) 3) Observability dont care set(ODCs) ODC CALCULATION
ODC can be computed ODC g =ODC f + ODC fg with ODC fg
Susbset of ODCs for node f called - Propagated power relevant observability Dont care (PPODC f ) Using PPODC f to compute ODC g
ensured that any change in function of g doesnt increase the switching activity of f ODC CALCULATION
p(f) signal probability of f ODC of node g is modified Power relevant observability Dont care (PODC) PODC g = PPODC f + ODCf g To ensure that the switching activity of any other node in the transition fanout of g for instance h should be increased Monotone power relevant observability dont care (MPODC g ) for node g was introduced ALGEBRAIC TECHNIQUES
Power saving factor extraction affect the loading on its inputs and amount of logic sharing Multilevel network can be represented by 2 level logic function Efficient methods for node elimination, factorization and logic decomposition If Ea(sw) + Eg(sw) > Ec(sw) + Eh(sw) then PA > PB If prob(a) = 0.5 and prob(b) = prob(c) = 0.25 Then PA-PB = 13/128 Power savings of 18% expected (compared to minimum literal network) LEXICOGRAPHIC COST
Increase in the number of literals in the ckt results into increased switched capacitance more power consumption Lexicographic cost (Da,Dp) Da literal saving factor Dp power saving factor Power saving factor is very expensive to be computed Better to calculate it only for a subset of candidate divisors GUARDED EVALUATION
Shutdown techniques Tl(s) latest arrival time for s Te(I) earliest arrival time of signal I I - Transparent latches can be added for reducing the switching activity of F SEQUENTIAL CIRCUIT - STATE ASSIGNMENT
Given the state transition graph (STG) of a FSM assign binary codes in all states such that a given cost function G is minimized The number of transitions at the state lines in two successive clock periods to be minimized Combing reduced switching activity of the state lines with low power implemented combinational logic significant power reduction HAMMING DISTANCE
Minimize the number of transitions of the state lines Hamming distance between the codes assigned to pairs of states Transition probability from one state to another one is not the same for every pair of states Hamming distance between pairs of states must be weighted with the corresponding transition probability COST FUNCTION
Cost function G = Wi,j H(Si,Sj) H(Si,Sj) is the hamming distance between the binary codes assigned to states Si and Sj. Wi,j transition probability between the states Si and Sj pij transition probability labels with these values the corresponding edges of the STG. Objective during encoding procedure is to minimize the number of states implies that number of flip flop . Fewer flip flop imply a more complex combinational circuit to evaluate the next state function Increase the switching activity increase the power STATE TRANSITION PROBABILITY
pi,j is an approximation - the exact state transition probability Pi,j pi,j computed by considering only one input combinations - ignoring the probability the m/c being at state Si Exact transition probability Pij = Pi pij -solving a system of linear equations called Chapman - Kolmogorov equation Impact of state assignment in the consumed power of the combinational circuit - number of heuristics were introduced Heuristics - combinational circuit optimized in terms of area and low power consumption Wij - modified to take into account the exact state transition probability Linear integer programming was used for solving the state assignment - genetic algorithm Cost function are proposed when the combinational circuit is implemented either as 2 or multilevel gate network LOGIC OPTIMIZATION PRE COMPUTATION
Selectively pre computing of the output logic values of a ckt one clock cycle before they are required use to reduce the internal switching activity of the combinational logic in the successive clock cycle Partitioned into 2 sets corresponding to registers R1 and R2 If function g1 or g2 equals 1, the value of the output function f is fully determined Boolean variables function g1 and g2 are subset of the input signals Remaining signals (X k+1, ..XN) can be frozen PRE COMPUTATION
Assumption: It is not allowed both g1 and g2 to be evaluated to 1 Logic level of g1 or g2 is high during clock cycle T. Enable signal of register R2 is low Output of R2 during clock cycle T+1 are not changed Output of R1 is updated the function f is evaluated correctly Input block A changes the switching activity of this block is reduced g1 and g2 occupy extra area - consumes additional power Appropriate trade off analysis RETIMING
Novel method to reduce the power for pipeline sequential ckt - known as retiming The techniques that reposition the filpflop of the ckt resulting into the min. either area or the delay Idea is to place a flipflop in a ckt node with high glitching activity and high load capacitance Glitches are not propagated to the transitive fanout of the node resulting in a reduction of the total switching Attention to be paid because the switching activity of some nodes of the circuit may be changed due to retiming - increase power consumption Number of used registers should be minimized Power dissipation of the registers and clock are not negligible To preserve the timing behavior of the ckt when retiming for low power SYNTHESIS OF FSM WITH GATED CLOCK
The operation of the FSM there are conditions where the state and output of the FSM dont change Clocking the ckt wastes power both in the combinational logic and registers Stop the clock during idle conditions Gated clock benefits are Clock is stopped no power is consumed by the combinational ckt since inputs unchanged No power consumed by the flipflop and gated clock ling Setting a new activation signal GCLK Fa activation signal - uses primary inputs and state lines of the machine Power savings 10-30% Fa takes place with high probability during the ckt operation TECHNOLOGY DEPENDENT OPTIMIZATION PATH BALANCING
Logic ckt are interconnected can strongly affect the overall switching activity and hence the power dissipation Timing skew between signals in a ckt can cause spurious transition - resulting extra power Delay paths that converge at each gate must be balanced f=abcd implemented in two ways Tree implementation of function f provides glitches elimination reducing effectively the total power Path balancing achieved before tech mapping by selectively collapsing and logic decomposition PATH BALANCING
After tech mapping - delay insertion and pin reordering Selectively collapsing the fanins of a node arrival time at the output of the node can be changed Logic decomposition and extraction - minimize the level difference between the inputs of the node Inserting variable delay buffers - delays of all paths in the ckt can be made equal Use min. no. of delay elements to achieve max reduction in glitching activity Path delay balanced by an appropriate signal to the pin assignment TECHNOLOGY DECOMPOSITION Next step during logic synthesis of a network to convert the network to another (contains 2 input AND/NAND and inverter gates - called decomposition Carried out before mapping Decomposition scheme that minimizes the total switching activities of the network - power efficient technology mapping Given the switching activity at each input a node - Tsui et al suggested a technique for AND decomposition TECHNOLOGY DECOMPOSITION Reduces the total switching activity - zero delay - used 2 input AND gate - introduced in decomposition model Signal d highest switching activity is injected last in configuration A - better power performance Techniques found being optimal for dynamic CMOS ckt - also produces good results of static CMOS ckt Decomposition procedure reduces the total switching activity by 5% over the conventional balanced tree decomposition method TECHNOLOGY MAPPING Some design techniques for low power consumption Main concepts is to hide nodes with high switching activity inside the gates - can drive smaller load capacitance Two steps 1) Computation of power delay curves ( power consumption Vs arrival time) of all nodes in the network 2) Mapping solution according to the previous curves and required time at the primary inputs 18% power saving at 16 % increase in area without any penalty in network performance TECHNOLOGY MAPPING Implies that power delay mapper reduces the number of high switching activity subnetworks at the expenses of increasing the number of low switching activity Reduces network average load Total power cost - steady state transitions and hazards match can be calculated from the computed power delay curves at the inputs of the gate and power delay characteristics of the gate itself POST MAPPING OPTIMIZATION Estimation of power - in tech independent - inaccurate The final structure of the ckt and load capacitance of each node are unknown Post mapping optimization - allows performing power and timing analysis on the mapped ckt - provides more realistic estimation Freedom for optimization is reduced after the ckt mapping Idea is based on the redundancy addition and removal Some exiting connections become redundant which consumes high power - remarkable power reduction Zero delay gate model - ckt signals are uncorrelated Switching activity Ei of a node i= Ei = 2pi(1-pi) Input capacitance for every gate is Cg Probability of input lines pa=pb=pd=pe= and pc= New connections l1 is added to the ckt, this connection is redundant. l1 conections from C to g3 becomes redundant (ie s-a-1 faults on these lines are undectable) g4 is ignored Switching activity has been reduced power consuption of the transformed ckt is smaller POST MAPPING OPTIMIZATION Automatic test pattern generator (ATPG) able to identify permissible transformations on the network may reduce power consumption Gate resizing smaller gates are slower - non critical gates in the ckt - algebraic decision diagrams are used to compute the timing behavior of the ckt POST MAPPING OPTIMIZATION