Power Minimization Techniques at The Algorithm Level

[POWER MINIMIZATION TECHNIQUES ] April 18, 2011
Power minimization Techniques at the algorithm level

After examining methods for estimating power consumption at algorithm level , the next logical step is examine power minimization techniques at this level . We will start by mentioning some of the general approaches for power minimization and then look at specific techniques that can be used for minimization of both the algorithm inherent dissipation and the implementation overhead . 1.1 General approaches to power minimization The recurring theme in low power design at all levels of abstraction is voltage reduction . At algorithm level , functional pipelining , retiming , algebraic transformations and loop transformations can be used to increase speed and allow lower voltages. These approaches translate into larger silicon area implementation , hence the approach has been termed trading area for power . Another technique for low power design is avoiding wasteful activity . At the algorithm level , the size and complexity of a given algorithm ( e.g operation counts , word length ) determine the activity . If there are several algorithms for a given task , the one with least number of
operations is generally preferable. 1.2 Reducing the algorithm inherent dissipation Important transformations in this category include operation and strength reduction. Operation reduction includes common sub-expression elimination, algebraic transformation , dead code elimination . Strength reduction refers to replacing energy consuming operations by a combination of simpler operations. The most common in this category is expansion of multiplications by constants into shift and add operations .Thought this transformation typically results in lower power , it may sometimes have the opposite effect if it results in an increase in critical path .Another drawback is that it introduces extra overhead in the form of registers and control. 1.3 Minimization for the Implementation overhead The implementation overhead can be reduced by reducing the chip area as this translates into reduced bus capacitances . Retiming can be used to increase resource utilization by distributing
M.Tech VLSI and EMBEDDED SYSTEMS , SJCE Page 1
[POWER MINIMIZATION TECHNIQUES ] April 18, 2011 operations more uniformly .However a larger areas can help to reduce the algorithm inherent power .This shows that power minimization is a subtle tradeoff process. Design Example 1 : Vector Quantization , Algorithmic Optimization Continuing with this design example , the properties of operation count and critical path will be used to aid in the choice and optimization of algorithms . Tree Search Vector Quantization (TSVQ) Tree Search Vector Quantization encoding requires far less computation. TSVQ performs a binary search of vector space instead of a full search .As a result, the computational complexity is proportional to log2N rather than N , where N is the number of vectors in the codebook . Fig 1.1 diagrams the structure of the tree search. At each level of the tree, the input vector is compared with two codebook entries .If at level 1 , for example , the input vector is closer to the left entry , then the right branch of the tree is analyzed no further and an index bit 0 is transmitted. This process is repeated until a leaf of tree is reached. Hence only 2*log2(256)=16 distortion comparisons have to be made , compared to 256 distortions calculations in the FSVQ.
Fig 1.1 : Tree Search Encoding
M.Tech VLSI and EMBEDDED SYSTEMS , SJCE
Page 2
1.4 Mathematical Optimizations In TSVQ , there is a large computational reduction available by mathematically rearranging the computation of the difference between the input vector X, and two codevectors Ca and Cb ,originally given by
MSEab =
15 ( =0
)2
15 ( =0
)2
..eq1
Since the given node in the comparison tree always compares the same two codevectors , the calculation of the errors can be combined under one summation. With quadratics expanded , this yields
MSEab =
15 2 =0
2Xi2 + Xi2 (Cbi2 2XiCbi + Xi2)

15 =0
eq2
which can be simplified and regrouped into eq 2
MSEab =
15 ( 2 =0
Cbi2)+
2( )
eq3
The first summation can be precomputed once the codebook is known and stored in a single memory location . The quantities 2(Cbi Cai) may also be calculated and pre stored . Therefore at each level of the tree the number of multiplications is reduced from 32 to 16 , while memory and add/subtracts go from 32 to 17 and from 33 to 17 respectively .The impact of algorithm selection and mathematical transformations is summarized in table 1.1 shown below for 256 vector codebook. Table 1.2 shows algorithmic inherent dissipation Memory Access Full Search Tree Search Optimized Tree 4096 256 136 Multiplications 4096 256 128 Add/Subtract 8448 520 136
Table 1.1 VQ Operation count summary

[POWER MINIMIZATION TECHNIQUES ] April 18, 2011 Memory Access Full Search Tree Search Optimized Tree 363.7nF 22.7nF 12.1nF Multiplications 66.3nF 4.1nF 2.1nF Add/Subtract 5.1nF 155pF 82.7pF
Table 1.2 VQ Algorithmic Inherent Dissipation Design example 2 : FIR Filter , Algorithmic Operation The algorithmic transformations described in this section represents one of the most powerful and widely applicable class of optimization techniques .The direct FIR form has 12 additions and 1 multiplications in critical path and cannot meet the throughput constraint below 3V for the given hardware library. The throughput required is 3.125Mhz .The critical path can be reduced to only 1 multiplication and 1 addition operation .This allows for reduction in supply voltage below 3V while maintaining the same throughput . The area-energy trade offs for both the versions with the variation of the supply voltage as generated by algorithmic estimation tools are shown in fig 1.2 . The retimed version allows voltage to be reduced to 1.5V thus reducing the power consumption drastically .The designer can choose the voltage that best suited simultaneously taking into account the area , throughput and energy.
Fig 1.2 : Effect of voltage scaling on the direct and retimed versions
Architectural estimation/analysis
Estimating power at the architectural level can be more accurate for two reasons: More precise information can be obtained regarding the signal statistics, hence yielding more accurate models for the hardware operators and modules. The implementation overhead is now precisely defined in terms of controllers, memories and buses and can thus be estimated more accurately . Reliable power analysis at the module level requires two important entities: Capacitance models for hardware modules and activity models for data or control signals. 2.1 Capacitance models for hardware modules Capacitance models for hardware modules- the capacitance of the RTL level modules such as an adder, multiplier or memory can be expressed as a function of the complexity parameters of the module. For example the switching capacitance of the multiplier is proportional to the square of its input wordlength. The latter is a common complexity parameter for most modules. Some modules may require more parameters. For example the capacitance model for a logarithmic shifter is given by CT=CO N+C1 L+C2 NL+C3 N2 L+C4 MNL+C5 SNL
Where Nis the wordlength, S and M are the actual and maximum shift values, while L= [ log2(m+1)] represents the number of shifts stages. 2.2 Activity models for data signals Activity models for data signals: the average dissipation of a modules is a strong function of the applied signals. Enumerating the capacitance model over all possible input patterns is clearly non-feasible. The power factor approximation(PFA) technique uses a heuristically or experimentally determined weighting factor, called the power factor, to model the average power consumed by the given module over a range of designs. The approach works fine if the applied data resembles random white noise, but produces a large error for correlated data.
Page 5
[POWER MINIMIZATION TECHNIQUES ] April 18, 2011 A accurate model for incorporating the effect of the switching activity is based on the realization that twos complement data words can be divided into two regions based on their activity behavior. The activity in the higher order sign bit depends on the temporal correlation of the data, whereas the lower bits behave as the white noise data.
The distinct activity bit of the two bits types depicted in fig 2.1. the figure displays 0-> 1 bit transition probabilities for data streams with different temporal correlation . A module is now completely characterized by its capacitance models in the msb and the lsb regions. The break-points between regions can be determined from the applied signals statistics as obtained from simulation or theoretical analysis. This model called tha dual bit type or DBT model, has been utilized to accurately determine datapath and memory power consumption. Fig 2.2 shows a comparision between the power predicted by the DBT model and by a switch level estimator (IRSIM)for the logarithmic shifter mentioned previously. The overall rms error is only 9.9%.
Fig 2.1 : Transition activity versus bit for typical data streams
Fig 2.2 : Log shifter: Switch level power vs module level power model 2.3 Activity models for control signals The DBT model is restricted in use to twos complement( or similar ) signals and is for instance, not applicable to control buses, for those signals, a simpler model has to be employed that uses the average number of bits switching per event as main statistical parameter. This socalled activity based control (ABC) model can be efficiently used to estimate the power consumption of random-logic controllers.
The power dissipation of such a module is, once again, a function of a number of complexity parameters N1, the total number of inputs to the combinational logic implementing the controller; N0 the total number of outputs (primary and next state ) from the logic blocks; and Ns, an estimate of the number of min-terms for the given logic function. The capacitive weighting coefficients are a function of the switching parameters 1 and 0.That represent the average number of bits switching on the in- and output busses. Figure 11.10 compares the power consumption estimated by the ABC model to that of a switch-level simulator (IRSIM) for a
[POWER MINIMIZATION TECHNIQUES ] April 18, 2011 standard cell implementation. For the more predictable ROM and PLA implementations, estimation error is within 10%.
Fig 2.3 : Random logic controller : IRSIM CAP vs ABC model
Uncertainties of the final placement and routing make it difficult to estimate interconnect power consumption at the architectural level. Possible solutions to the dilemma include using interconnect estimation based on derivatives of rents rule or back-annotation after early floorplanning.
Page 8
Reference : LOW POWER DESIGN METHODOLOGIES by Rabaey and Pedram
Page 9

Power Minimization Techniques at The Algorithm Level

Enviado por

Dados do documento

Descrição original:

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Power Minimization Techniques at The Algorithm Level

Enviado por

Direitos autorais:

Formatos disponíveis

[POWER MINIMIZATION TECHNIQUES ] April 18, 2011