2 Anamika Sinha Dept. of ECE, CUTM, Paralakhemundi, Odisha Sinha.anamika185@gmail.com
3 Shivdhari & Gourab Dept. of ECE, CUTM, Paralakhemundi, Odisha Shiv09jitm@gmail.com, gourab.siuli@gmail.com Abstract - Digital signal processing (DSP) applications, the critical operations usually involve many multiplications and/or accumulations. So, for real time signal processing applications, high throughput multiplieraccumulator (MAC) is always a key element to achieve a high-performance digital signal processing application. This is because speed and throughput rate are always the concerns of digital signal processing systems. This is because, the limited battery energy of these portable products restricts the power consumption of the system. The goal of this project is to design and VLSI implementation of MAC for high-speed DSP applications. For designing the MAC various multipliers and one bit full adders are considered. The total operation is coded with VHDL, synthesized and simulated using Xillinx ISE 10.1.
Key Words: Adders, CAD tools, multipliers, VHDL,
1. INTRODUCTION
In the majority of the Digital signal processing (DSP) applications, the critical operations usually involve many multiplications and /or accumulations. So, for real time signal processing applications, high throughput multiplier accumulator (MAC) is always a key element to achieve a high-performance digital signal processing application. In the last few years, the main consideration of MAC design is to enhance its speed. This is because speed and throughput rate are always the concerns of digital signal processing systems. Due to the increase of portable electronic products, low power designs also become major considerations. This is because the limited battery energy of these portable products restricts the power consumption of the system. Therefore the main motivation is to investigate various pipelined MAC architectures and circuit and the design techniques which are suitable for the implementation of high through put signal processing algorithms. The goal of this project was to design and VLSI implementation of pipelined MAC for high-speed DSP applications. For designing the MAC, various architectures of multipliers and one bit full adders are considered. The static and dynamic one bit full adder was implemented as the basic block. The total process is coded with VHDL to describe the hardware. Finally, the whole process is implemented on Spartan 3 Board.
2. OVERVIEW OF MAC UNIT
MAC is composed of an adder, multiplier and an accumulator. Usually adders implemented are Carry-Select or Carry-Save adders, as speed is of utmost importance in DSP. One implementation of the multiplier could be as a parallel array multiplier. The inputs for the MAC are to be fetched from memory location and fed to the multiplier block of the MAC, which will perform multiplication and give the result to adder which will accumulate the result and then will store the result into a memory location. This entire process is to be achieved in a single clock cycle is the architecture of the MAC unit. The design consists of one 17 bit register, one 8-bit Wallace tree multiplier, 17-bit accumulator using ripple carry and two18-bit accumulator registers. To multiply the values of A and B, Wallace tree multiplier is used instead of conventional multiplier because Wallace tree multiplier can increase the MAC unit design speed. Ripple Carry Adder (RCA) is used as an accumulator in this design. Apparently, together with the utilization of Wallace tree multiplier approach, carry save adder in the final stage of the Wallace tree multiplier and Ripple Carry adder as the accumulator, this MAC unit design is not only reducing the standby power consumption but also can enhance the MAC unit speed so as to gain better system performance. The product of Ai X Bi is always fed back into the 17-bit Ripple Carry accumulator and then added again with the next product Ai x Bi. This MAC unit is capable of multiplying and adding with previous product consecutively up to as many as eight times. Operation: Output = Ai Bi (2.1). The design of 8x8 multiplier unit is carried out that can perform accumulation on 17 bit number. This MAC unit has 18 bit output and its operation is to add repeatedly the multiplication results. The total design area is also being inspected by observing the total count of transistors. Power delay product is calculated by multiplying the power consumption result with the time delay. inches (8.25 cm) wide, with a 5/16 inch (0.8 cm) space between them. Text must be fully justified.
All copyrights Reserved by NCRTCST-2012,Departments of Computer Science and Engineering & Information Technology,CMR College of Engineering and Technology,Hyderabad,A.P,India. Published by IJECCE (www.ijecce.org) 79
International Journal of Electronics Communication and Computer Engineering Volume 3, Issue (1) NCRTCST, ISSN 2249 071X National Conference on Research Trends in Computer Science and Technology - 2012
3. OPERATION
A single MAC unit has multiplier, adder, and accumulator. The most typical feature that differentiates a DSP from any GPP is the multiply and Accumulate unit. All DSP Algorithms would require some form of the Multiplication and Accumulation Operation. This is the most important block in DSP systems. It is composed of an adder, multiplier and the accumulator. Usually adders implemented in DSPs are Ripple Carry Adders, Carry- Select or Carry-Save adders, as speed is of utmost importance in a DSP. Basically the multiplier will multiply the inputs and give the results to the adder, which will add the multiplier results to the previously accumulated results. This operation eases the computation of the most important formula i.e., b(n)x(n-k) which is needed in filters, Fourier analyzers, etc. The inputs for the MAC are supposed to be fetched from some memory location and fed to the multiplier block of the MAC, which will perform multiplication and give the result to adder which will accumulate the result and then if needed will also store the result into a memory location. This entire process is to be achieved in a single clock cycle.
Fig 1. Multiply Accumulate Unit
Table 1 Various Blocks delay, power, speed and power delay product
The following table shows pin definitions for a 4-bit unsigned up accumulator with an asynchronous clear.
All copyrights Reserved by NCRTCST-2012,Departments of Computer Science and Engineering & Information Technology,CMR College of Engineering and Technology,Hyderabad,A.P,India. Published by IJECCE (www.ijecce.org) 80
International Journal of Electronics Communication and Computer Engineering Volume 3, Issue (1) NCRTCST, ISSN 2249 071X National Conference on Research Trends in Computer Science and Technology - 2012
4. EXPERIMENTAL RESULTS
The VHDL code of MAC process is synthesized and simulated using Xilinx ISE 10.1. It is implemented on xc3s50-5pq208.
Table 2 Device Utilization Summary (estimated values) of AES encryption process Logic Utilization Used Available Utilization Number of Slices 9 768 1% Number of Slice Flip Flops 8 1536 0% Number of 4 input LUTs 16 1536 1% Number of bonded IOBs 18 124 14% Number of BRAMs 1 4 25% Number of GCLKs 1 8 12%
Figure 6 Shows the RTL schematic of Mix Column Operation. It has two input ports and one output port.
Fig. 6 RTL view of Mix Column operation Figure 7 displays the simulation waveform of Mix Column operation.
Fig. 7 Simulation waveform of Mix Column Step All copyrights Reserved by NCRTCST-2012,Departments of Computer Science and Engineering & Information Technology,CMR College of Engineering and Technology,Hyderabad,A.P,India. Published by IJECCE (www.ijecce.org) 81
International Journal of Electronics Communication and Computer Engineering Volume 3, Issue (1) NCRTCST, ISSN 2249 071X National Conference on Research Trends in Computer Science and Technology - 2012
5. CONCLUSION
The MAC process is coded with VHDL and synthesized using Xilinx ISE 10.1. The mix column process is implemented using xc3s50-5pq208 FPGA Xilinx device.
6. REFERENCES
[1] Design and VLSI Implementation of Pipelined Multiply Accumulate Unit:Shanthala S, Cyril Prasanna Raj, Dr.S.Y.Kulkarni. [2] General Data-Path Organization of a MAC unit for VLSI Implementation DSP Processors:Aamir A. Farooqui', Vojin G. Oklobdzija2 'Department of Electrical and Computer Engineering, University of California, Davis, CA 95616. e-mail : aamirf @ ece.ucdavis .edu. 21ntegration Berkeley, California.
[3] VLSI Implememtation for MAC-Level DWT Architecture:Shiuh-Rong Huang and Lan-Rong Dung Department of Electrical and Control Engineering National Chiao Tung University Hsinchu, Taiwan, R.O.C. [4] The Design of MAC unit for DWT Implementation Chokri SOUANI* Student IEEE, Mohamed DID* Member IEEE, Kholdoun TO=**, Rached TOURKI*:* Electronics & Micro-Electronics Laboratory. [5] Shyh-Jye Jou, Chang-Yu Chen, En-Chung and Chau-Chin Su A Pipeline Multiplier-Accumulator Using a High Speed Low-Power Static and Dynamic Full Adder Journal of Solid State Circuits, Vol 32,no- 1, January 2000. [6] G. Goto, et. Al., A 54x54-b regularly structured tree multiplier, IEEE J. Solid-State Circuits, vol. 27, no.9, Sept. 1992. [7] Pascal C.H. Meier, Rob A. Rutenbar and Richard carley, Exploring multiplier architecture and Layout for low power, IEEE Custom Integrated circuits Conference, 1996.
All copyrights Reserved by NCRTCST-2012,Departments of Computer Science and Engineering & Information Technology,CMR College of Engineering and Technology,Hyderabad,A.P,India. Published by IJECCE (www.ijecce.org) 82