Você está na página 1de 5

Low Power ALU Cluster In this project an ALU Cluster is designed in Verilog HDL.

The code is FPGA as well as ASIC proven and can be used for any design. The ALU is designed in such a way that it also incorporates Low Power methodologies. Some of the Low Power Methods used are Gray Codes, Pipelining, Clock Gating and Pre-computation Kernel. The ALU performs different operations like addition, subtraction, multiplication, division, and, or, xor and not. Other operations can also be added and thus the functionality can be improved. The operand selection is done using opcodes that are gray codes. Opcode 000 001 011 010 110 111 101 100 Operation Addition Subtraction Multiplication Division And Or Xor Not

The operations can be implemented by constructing individual modules with enable signals generated by the opcode decoder that decodes the above table. Thus each module should have an enable signal, that will be generated by the decoder. The operations can be done in two ways 1. Either use the operands directly (prior ensure that they have corresponding synthesizability support in the synthesis tool used) like + for addition * for subtraction. 2. (or) make custom HDL modules for all the operations, like full adder for Addition and multiplier for multiplication. Both techniques are perfect coding styles and can be decided upon the requirement and time. For the second method the following steps are provided for assistance (SUGGESTION!!) 1. For Addition A full-adder design or a carry look-ahead adder can be used or even a carrysave adder. The full-adder design or ripple carry adder is the most simple and straight forward one, a faster and a bit more complex is carry look-ahead method. Other architectures can also be used for addition. 2. For Subtraction A full-subtractor can also be used or subtraction can also be carried out using 2`s complement method and using an adder. 3. For Multiplication A multiplication unit can be made by using the following explanation. The following Diagram shows a implementation RTL logic with Datapath and Controller for multiplication.

binary multiplication requires only shifting and adding. The following example shows how each partial product is added in as soon as it is formed. This eliminates the need for adding more than two binary numbers at a time.

(13 _ 11) is reworked below showing the location of the bits in the registers at each clock time.

The detailed explanation can be found in Fundamentals of Logic Design by Charles Roth.

4. For Division Successive Subtraction can be used to generate division module logic, the details can be found in Fundamentals of Logic Design by Charles Roth. The following shows the datapath and controller for the divider.

5. For the rest logic operations and, or, xor and not signs can be used directly. When modules are not being used they can be clocked down by using clock gating feature. This can be used to reduce power consumption. The clock gating feature is used in close conjunction with a pre-computation kernel. The pre-computation kernel is meant to act also as logic for clock gating. The pre-computation kernel is primarily a combinational logic block that will detect certain cases input cases switch off the operation modules and provide deterministic output. The pre-computation kernel will handle the following cases. 1. For Multiplication, if any of the two inputs say A or B or both are zero the output product is zero. Thus, in this case it will cause the gated clocks of multiplier unit to switch off and the output will be obtained as zero without the internal computation. Likewise if any one of the 2 inputs are one then the same output as the other non-unity input will be obtained. During this process the gated clocks of the multiplier will be switched off. 2. A similar logic works for Division, if (A/B) and if B is zero the kernel will generate an error signal and if A is zero then the output will be zero. Likewise if B is one then the output will be the value in A. Thus, in both cases the gated clocks will be switched off by the signal generated by the pre-computation logic. 3. The same technique can be done for addition, if both inputs are zero the output is zero and the whole logic will be switched off by the pre-computation kernel. Likewise if the two inputs to a subtractor are same then the output is zero, the kernel will automatically switch off the gated clock. 4. The same technique is used even for and, or and xor logic. If anyone input or both are zero for and gate the final output is zero, for xor gate if both inputs are same the output is zero. Thus, in both cases the gated clocks will be switched off by the signal generated by the precomputation logic. Following shows the Low power ALU Path

From the above figure how the Low power methods work together can be seen. In place of the multiplier there will be divider, adder and others similarly. The inputs to the Pre-computation kernel are the inputs A and B. Prior to the Pre-computation Kernel the operation will be decoded by the decoder that will be a combinational logic employing the table seen on page 1 employing the gray codes. The Clock gating logic is as shown below.

The above figure shows the implementation of Clock Gating logic. The buffers are a necessary aspect of this design to remove any possible skew that may arise due to switching on and off of the clock by the signal generated by the Pre-computation Kernel. The outputs of the ALU are stored in an external common register that can allow the outputs of the current ALU inputs to be written back to the input depending upon the instructions. The basic ALU core is as follows.

The cluster ALU is designed by including multiple ALU core of the above type and designing a cluster level decoder to decode the instructions. The Cluster ALU can run parallely many such instructions which have dependencies between them by exchanging the previous results by storing them in the Write Back Register (WBR). When each ALU is performing some operation the other units are switched off by disabling the clock gate input. Implementation tips: 1. The Decoder within each ALU may be designed using a case statement. 2. The Decoder will have certain control or enable signals that will enable the respective units. 3. The Pre-computation kernel should be necessarily a combinational logic.

Você também pode gostar