Você está na página 1de 13

EECS150 - Digital Design

Lecture 21 - Multipliers & Shifters

April 9, 2013
John Wawrzynek

Spring 2013

Page 1

EECS150 - Lec21-mult-shift

Multiplication

a 3b 3

a 3b 2
a2b 3

a3
b3

a2
b2

a1
b1

a0
b0

a 3b 0

a 2b 0

a 1b 0

a 0b 0

a3 b 1
a 2b 2
a 1b 3

a 2b 1
a 1b 2
a 0b 3

a 1b 1
a 0b 2

a 0b 1

...

a1b0+a0b1 a0b0

Multiplicand
Multiplier

Partial
products

Product

Many different circuits exist for multiplication.


Each one has a different balance between
speed (performance) and amount of logic (cost).
Spring 2013

EECS150 - Lec21-mult-shift

Page

Shift and Add Multiplier

Cost n, = n clock cycles.


What is the critical path for
determining the min clock
period?
Spring 2013

Sums each partial


product, one at a time.
In binary, each partial
product is shifted
versions of A or 0.

Control Algorithm:
1. P 0, A multiplicand,
B multiplier
2. If LSB of B==1 then add A to P
else add 0
3. Shift [P][B] right 1
4. Repeat steps 2 and 3 n-1 times.
5. [P][B] has product.

EECS150 - Lec21-mult-shift

Page

Shift and Add Multiplier


Signed Multiplication:
Remember for 2s complement numbers MSB has negative weight:

ex: -6 = 110102 = 020 + 121 + 022 + 123 - 124


= 0

+ 2 + 0 + 8

- 16 = -6

Therefore for multiplication:


a) subtract final partial product
b) sign-extend partial products
Modifications to shift & add circuit:
a) adder/subtractor
b) sign-extender on P shifter register
Spring 2013

EECS150 - Lec21-mult-shift

Page

Bit-serial Multiplier

Bit-serial multiplier (n2 cycles, one bit of result per n cycles):

Control Algorithm:
repeat n cycles {
// outer (i) loop
repeat n cycles{
// inner (j) loop
shiftA, selectSum, shiftHI

Note: The occurrence of a control


signal x means x=1. The absence
of x means x=0.

shiftB, shiftHI, shiftLOW, reset

Spring 2013

EECS150 - Lec21-mult-shift

Page

Array Multiplier
Single cycle multiply: Generates all n partial products simultaneously.
Each row: n-bit adder with AND gates

What is the critical path?


Spring 2013

EECS150 - Lec21-mult-shift

Page

Carry-Save Addition

Speeding up multiplication is a
matter of speeding up the
summing of the partial products.
Carry-save addition can help.
Carry-save addition passes
(saves) the carries to the output,
rather than propagating them.

310
+ 210
c
s

carry-save add

carry-propagate add

Example: sum three numbers,


310 = 0011, 210 = 0010, 310 = 0011
0011
0010
0100 = 410
0001 = 110

carry-save add

310 0011
c 0010 = 210
s 0110 = 610
1000 = 810

In general, carry-save addition takes in 3 numbers and produces 2.


Whereas, carry-propagate takes 2 and produces 1.
With this technique, we can avoid carry propagation until final addition
Spring 2013

EECS150 - Lec21-mult-shift

Page

Page

Carry-save Circuits

When adding sets of numbers,


carry-save can be used on all
but the final sum.
Standard adder (carry
propagate) is used for final sum.
Carry-save is fast (no carry
propagation) and cheap (same
cost as ripple adder)

Spring 2013

EECS150 - Lec21-mult-shift

Array Multiplier using Carry-save Addition

Fast carrypropagate adder

Spring 2013

EECS150 - Lec21-mult-shift

Page

Carry-save Addition
CSA is associative and communitive. For example:
(((X0 + X1) + X2 ) + X3 ) = ((X0 + X1) +( X2 + X3 ))

A balanced tree can be used to


reduce the logic delay.

This structure is the basis of the


Wallace Tree Multiplier.
Partial products are summed
with the CSA tree. Fast CPA
(ex: CLA) is used for final sum.
Multiplier delay log3/2N + log2N

Spring 2013

EECS150 - Lec21-mult-shift

Page

10

Constant Multiplication
Our discussion so far has assumed both the multiplicand
(A) and the multiplier (B) can vary at runtime.
What if one of the two is a constant?
Y=C*X
Constant Coefficient multiplication comes up often in
signal processing and other hardware. Ex:
yi = yi-1+ xi
x
y
i

where is an application dependent constant that is


hard-wired into the circuit.
How do we build and array style (combinational) multiplier
that takes advantage of the constancy of one of the
operands?
Spring 2013

EECS150 - Lec21-mult-shift

Page

11

Multiplication by a Constant

If the constant C in C*X is a power of 2, then the multiplication is simply


a shift of X.
Ex: 4*X

What about division?

What about multiplication by non- powers of 2?

Spring 2013

EECS150 - Lec21-mult-shift

Page

12

Multiplication by a Constant
In general, a combination of fixed shifts and addition:
Ex: 6*X = 0110 * X = (22 + 21)*X

Details:

Spring 2013

EECS150 - Lec21-mult-shift

Page

13

Multiplication by a Constant
Another example: C = 2310 = 010111

In general, the number of additions equals the number of


1s in the constant minus one.
Using carry-save adders (for all but one of these) helps
reduce the delay and cost, but the number of adders is still
the number of 1s in C minus 2.
Is there a way to further reduce the number of adders (and
thus the cost and delay)?
Spring 2013

EECS150 - Lec21-mult-shift

Page

14

Multiplication using Subtraction


Subtraction is ~ the same cost and delay as addition.
Consider C*X where C is the constant value 1510 = 01111.
C*X requires 3 additions.

We can recode 15
from 01111 = (23 + 22 + 21 + 20 )
to
10001 = (24 - 20 )
where 1 means negative weight.

Therefore, 15*X can be implemented with only one


subtractor.

Spring 2013

EECS150 - Lec21-mult-shift

Page

15

Canonic Signed Digit Representation


CSD represents numbers using 1, 1, & 0 with the least
possible number of non-zero digits.
Strings of 2 or more non-zero digits are replaced.
Leads to a unique representation.

To form CSD representation might take 2 passes:


First pass: replace all occurrences of 2 or more 1s:
01..10 by 10..10
Second pass: same as a above, plus replace 0110 by 0010

Examples:
011101 = 29
100101 = 32 - 4 + 1

0010111 = 23
0011001
0101001 = 32 - 8 - 1

0110110 = 54
1011010
1001010 = 64 - 8 - 2

Can we further simplify the multiplier circuits?


Spring 2013

EECS150 - Lec21-mult-shift

Page

16

Constant Coefficient Multiplication (KCM)


Binary multiplier: Y = 231*X = (27 + 26 + 25 + 22 + 21+20)*X

CSD helps, but the multipliers are limited to shifts followed by adds.
CSD multiplier: Y = 231*X = (28 - 25 + 23 - 20)*X

How about shift/add/shift/add ?


KCM multiplier: Y = 231*X = 7*33*X = (23 - 20)*(25 + 20)*X

No simple algorithm exists to determine the optimal KCM representation.


Most use exhaustive search method.
Spring 2013

EECS150 - Lec21-mult-shift

Page

17

Fixed Shifters / Rotators

fixed shifters
hardwire the shift
amount into the circuit.

Logical
Shift

Ex: verilog: X >> 2

(right shift X by 2 places)

Rotate

Fixed shift/rotator is
nothing but wires!
So what?

Spring 2013

Arithmetic
Shift

EECS150 - Lec21-mult-shift

Page 18

Variable Shifters / Rotators

Example: X >> S, where S is unknown when we synthesize the circuit.

Uses: shift instruction in processors (ARM includes a shift on every


instruction), floating-point arithmetic, division/multiplication by
powers of 2, etc.

One way to build this is a simple shift-register:


a) Load word, b) shift enable for S cycles, c) read word.

Worst case delay O(N) , not good for processor design.

Can we do it in O(logN) time and fit it in one cycle?


Spring 2013

EECS150 - Lec21-mult-shift

Page 19

Log Shifter / Rotator


Log(N) stages, each shifts (or not) by a power of 2 places,
S=[s2;s1;s0]:
Shift by N/2

Shift by 2

Shift by 1

Spring 2013

EECS150 - Lec21-mult-shift

Page 20

LUT Mapping of Log shifter

Efficient with 2to1 multiplexors, for instance, 3LUTs.


Virtex6 has 6LUTs. Naturally makes 4to1 muxes:
Reorganize shifter to use 4to1 muxes.

Final stage
uses F7 mux
Spring 2013

EECS150 - Lec21-mult-shift

Page 21

Improved Shifter / Rotator


How about this approach? Could it lead to even less delay?

What is the delay of these big muxes?


Look a transistor-level implementation?

Spring 2013

EECS150 - Lec21-mult-shift

Page 22

Barrel Shifter

Cost/delay?
(dont forget
the decoder)

Spring 2013

EECS150 - Lec21-mult-shift

Page 23

Connection Matrix
Generally useful structure:
N2 control points.
What other interesting
functions can it do?

Spring 2013

EECS150 - Lec21-mult-shift

Page 24

Cross-bar Switch

Nlog(N) control
signals.

Supports all
interesting
permutations
All one-to-one and
one-to-many
connections.

Spring 2013

EECS150 - Lec21-mult-shift

Commonly used in
communication
hardware (switches,
routers).

Page 25

Você também pode gostar