Você está na página 1de 4

International Journal of Electronics Communication and Computer Engineering

Volume 3, Issue (1) NCRTCST, ISSN 2249 071X


National Conference on Research Trends in Computer Science and Technology - 2012

Hardware Implementation of MAC Unit

1
Pratap Kumar Dakua
Dept. of ECE, CUTM,
Paralakhemundi, Odisha
dakuapratapkumar@gmail.com


2
Anamika Sinha
Dept. of ECE, CUTM,
Paralakhemundi, Odisha
Sinha.anamika185@gmail.com


3
Shivdhari & Gourab
Dept. of ECE, CUTM,
Paralakhemundi, Odisha
Shiv09jitm@gmail.com,
gourab.siuli@gmail.com
Abstract - Digital signal processing (DSP) applications, the
critical operations usually involve many multiplications and/or
accumulations. So, for real time signal processing applications,
high throughput multiplieraccumulator (MAC) is always a
key element to achieve a high-performance digital signal
processing application. This is because speed and throughput
rate are always the concerns of digital signal processing
systems. This is because, the limited battery energy of these
portable products restricts the power consumption of the
system. The goal of this project is to design and VLSI
implementation of MAC for high-speed DSP applications. For
designing the MAC various multipliers and one bit full adders
are considered. The total operation is coded with VHDL,
synthesized and simulated using Xillinx ISE 10.1.

Key Words: Adders, CAD tools, multipliers, VHDL,

1. INTRODUCTION

In the majority of the Digital signal processing
(DSP) applications, the critical operations usually involve
many multiplications and /or accumulations. So, for real
time signal processing applications, high throughput
multiplier accumulator (MAC) is always a key element to
achieve a high-performance digital signal processing
application. In the last few years, the main consideration of
MAC design is to enhance its speed. This is because speed
and throughput rate are always the concerns of digital signal
processing systems. Due to the increase of portable
electronic products, low power designs also become major
considerations. This is because the limited battery energy of
these portable products restricts the power consumption of
the system. Therefore the main motivation is to investigate
various pipelined MAC architectures and circuit and the
design techniques which are suitable for the implementation
of high through put signal processing algorithms. The goal
of this project was to design and VLSI implementation of
pipelined MAC for high-speed DSP applications. For
designing the MAC, various architectures of multipliers and
one bit full adders are considered. The static and dynamic
one bit full adder was implemented as the basic block. The
total process is coded with VHDL to describe the hardware.
Finally, the whole process is implemented on Spartan 3
Board.

2. OVERVIEW OF MAC UNIT

MAC is composed of an adder, multiplier and an
accumulator. Usually adders implemented are Carry-Select
or Carry-Save adders, as speed is of utmost importance in
DSP. One implementation of the multiplier could be as a
parallel array multiplier. The inputs for the MAC are to be
fetched from memory location and fed to the multiplier
block of the MAC, which will perform multiplication and
give the result to adder which will accumulate the result and
then will store the result into a memory location. This entire
process is to be achieved in a single clock cycle is the
architecture of the MAC unit. The design consists of one 17
bit register, one 8-bit Wallace tree multiplier, 17-bit
accumulator using ripple carry and two18-bit accumulator
registers. To multiply the values of A and B, Wallace tree
multiplier is used instead of conventional multiplier because
Wallace tree multiplier can increase the MAC unit design
speed. Ripple Carry Adder (RCA) is used as an accumulator
in this design. Apparently, together with the utilization of
Wallace tree multiplier approach, carry save adder in the
final stage of the Wallace tree multiplier and Ripple Carry
adder as the accumulator, this MAC unit design is not only
reducing the standby power consumption but also can
enhance the MAC unit speed so as to gain better system
performance. The product of Ai X Bi is always fed back
into the 17-bit Ripple Carry accumulator and then added
again with the next product Ai x Bi. This MAC unit is
capable of multiplying and adding with previous product
consecutively up to as many as eight times. Operation:
Output = Ai Bi (2.1).
The design of 8x8 multiplier unit is carried out that can
perform accumulation on 17 bit number. This MAC unit has
18 bit output and its operation is to add repeatedly the
multiplication results. The total design area is also being
inspected by observing the total count of transistors. Power
delay product is calculated by multiplying the power
consumption result with the time delay.
inches (8.25 cm) wide, with a 5/16 inch (0.8 cm) space
between them. Text must be fully justified.

All copyrights Reserved by NCRTCST-2012,Departments of Computer Science and Engineering &
Information Technology,CMR College of Engineering and Technology,Hyderabad,A.P,India.
Published by IJECCE (www.ijecce.org) 79


International Journal of Electronics Communication and Computer Engineering
Volume 3, Issue (1) NCRTCST, ISSN 2249 071X
National Conference on Research Trends in Computer Science and Technology - 2012

3. OPERATION

A single MAC unit has multiplier, adder, and
accumulator. The most typical feature that differentiates a
DSP from any GPP is the multiply and Accumulate unit. All
DSP Algorithms would require some form of the
Multiplication and Accumulation Operation. This is the
most important block in DSP systems. It is composed of an
adder, multiplier and the accumulator. Usually adders
implemented in DSPs are Ripple Carry Adders, Carry-
Select or Carry-Save adders, as speed is of utmost
importance in a DSP. Basically the multiplier will multiply
the inputs and give the results to the adder, which will add
the multiplier results to the previously accumulated results.
This operation eases the computation of the most important
formula i.e., b(n)x(n-k) which is needed in filters, Fourier
analyzers, etc. The inputs for the MAC are supposed to be
fetched from some memory location and fed to the
multiplier block of the MAC, which will perform
multiplication and give the result to adder which will
accumulate the result and then if needed will also store the
result into a memory location. This entire process is to be
achieved in a single clock cycle.

Fig 1. Multiply Accumulate Unit



Table 1 Various Blocks delay, power, speed and power delay product

The following table shows pin definitions for a 4-bit unsigned up accumulator with an asynchronous clear.








All copyrights Reserved by NCRTCST-2012,Departments of Computer Science and Engineering &
Information Technology,CMR College of Engineering and Technology,Hyderabad,A.P,India.
Published by IJECCE (www.ijecce.org) 80


International Journal of Electronics Communication and Computer Engineering
Volume 3, Issue (1) NCRTCST, ISSN 2249 071X
National Conference on Research Trends in Computer Science and Technology - 2012

4. EXPERIMENTAL RESULTS

The VHDL code of MAC process is synthesized and
simulated using Xilinx ISE 10.1. It is implemented on
xc3s50-5pq208.



Table 2 Device Utilization Summary (estimated values) of AES encryption process
Logic Utilization Used Available Utilization
Number of Slices 9 768 1%
Number of Slice Flip Flops 8 1536 0%
Number of 4 input LUTs 16 1536 1%
Number of bonded IOBs 18 124 14%
Number of BRAMs 1 4 25%
Number of GCLKs 1 8 12%

Figure 6 Shows the RTL schematic of Mix Column Operation. It has two input ports and one output port.



Fig. 6 RTL view of Mix Column operation
Figure 7 displays the simulation waveform of Mix Column operation.


Fig. 7 Simulation waveform of Mix Column Step
All copyrights Reserved by NCRTCST-2012,Departments of Computer Science and Engineering &
Information Technology,CMR College of Engineering and Technology,Hyderabad,A.P,India.
Published by IJECCE (www.ijecce.org) 81


International Journal of Electronics Communication and Computer Engineering
Volume 3, Issue (1) NCRTCST, ISSN 2249 071X
National Conference on Research Trends in Computer Science and Technology - 2012



5. CONCLUSION

The MAC process is coded with VHDL and synthesized
using Xilinx ISE 10.1. The mix column process is
implemented using xc3s50-5pq208 FPGA Xilinx device.

6. REFERENCES

[1] Design and VLSI Implementation of Pipelined Multiply
Accumulate Unit:Shanthala S, Cyril Prasanna Raj,
Dr.S.Y.Kulkarni.
[2] General Data-Path Organization of a MAC unit for VLSI
Implementation DSP Processors:Aamir A. Farooqui',
Vojin G. Oklobdzija2 'Department of Electrical and
Computer Engineering, University of California, Davis,
CA 95616. e-mail : aamirf @ ece.ucdavis .edu.
21ntegration Berkeley, California.

[3] VLSI Implememtation for MAC-Level DWT
Architecture:Shiuh-Rong Huang and Lan-Rong Dung
Department of Electrical and Control Engineering
National Chiao Tung University Hsinchu, Taiwan, R.O.C.
[4] The Design of MAC unit for DWT Implementation Chokri
SOUANI* Student IEEE, Mohamed DID*
Member IEEE, Kholdoun TO=**, Rached TOURKI*:*
Electronics & Micro-Electronics Laboratory.
[5] Shyh-Jye Jou, Chang-Yu Chen, En-Chung and Chau-Chin
Su A Pipeline Multiplier-Accumulator Using a High
Speed Low-Power Static and Dynamic Full Adder
Journal of Solid State Circuits, Vol 32,no- 1, January
2000.
[6] G. Goto, et. Al., A 54x54-b regularly structured tree
multiplier, IEEE J. Solid-State Circuits, vol. 27, no.9,
Sept. 1992.
[7] Pascal C.H. Meier, Rob A. Rutenbar and Richard carley,
Exploring multiplier architecture and Layout for low
power, IEEE Custom Integrated circuits Conference,
1996.

All copyrights Reserved by NCRTCST-2012,Departments of Computer Science and Engineering &
Information Technology,CMR College of Engineering and Technology,Hyderabad,A.P,India.
Published by IJECCE (www.ijecce.org) 82

Você também pode gostar