Escolar Documentos
Profissional Documentos
Cultura Documentos
. "
_ . " 1
". . '. i
.
- '
1.like 10 my daughler's hamsler as an intuitive example. After baving a hamster as
a family pel, I ve learned Ihal hamsters basically have four stales: Sleeping, Eating, Run-
IIlIIg 011The Wheel, and TryillgToEscape. They spend mOSI of their day leeping (being
nocturnal), a bit of tllne ealing or running On the wheel, and the rest of their time desper-
alely Irylng 10 escape from Ihei r cage.
As a more electronics-oriemed example, lei 's design a system thai repealedly sets
an OUlpul X 10 0 for one clock cycle and 10 1 for one clock cycle. The syslem clearly
has on ly two states, which we' ll call Off and Oil. In slate Of(, X = 0; in stale 011_ x = 1.
We can show Ihose slales, and the transilions between them, usi ng the state diagram in
Figure 3.36.
- .
Outputs: x
I I I I
I I t I
clk cycle 1 h cycle 2 h cycle 3 h cycle 4 i
i i ! i
Outputs: I I , I
X --r--1---J!
Figure 3.36 A simple slale diagram (len) and Ihe timing diagram de cribing the state diagram's
behavior (ri ght). Above the timing diagram. we see the FSM going from one Sl'ate 10 the other in
each clock cycle. "e 1 k A" represenls Ihe rising edge of the clock signal.
Assume we Slarl in Slale Off. The diagram shows thai x is set 10 0 while the y lem
is in Slale Off. The diagram also shows thai on Ihe neXI rising edge of the clock signal .
C/kA, the syslem Iransilions 10 Slale 011, and the diagram shows thul i el 10 I in Ibal
Slale. On the next rising edge of the clock, [he di agram shows lhal the y"lem tran i-
li ons 10 slale Off again. A l.iming diagram showing the sy lem' i hown in
Figure 3.36.
Recall in Example 3.3 thai we wan led a syslem Ihal held ils OUrpUI high for three
cycles. Toward that end. lel's extend the simple Sime diagram of Figure 3.36 I ha\e on
off Siale and three on slales, as shown in Figure 3.37. The OUIPUI will be 0 for one C) -Ie.'
and Ihen 1 for Ihree cycles. as shown in the liming diagmm of the figure.
Sequential Logic Design- Controllers
Outputs: x IkA 1 elk JUULJLILJLJLJl-
\. b" bi
r b"
A bi B bi C r
bo=O bo= 1 bo=O
(a)
FSM inputs: bi; FSM outputs: bo
0
CD
(0
unused
bo=1 bo=O
(c)
Combinational logic
Inpuls Oulpuls
s1 sO bi n1 nO bo
0 0 0 0 0 0
0 0 1 0 1 0
--cn--o- Tb-T-'
o 1 1 1 0 1
--1-'0-0- -6-'0--0--'
1 o 1 1 o 0
--1--'--0- -6-'0--0--'
1 1 1 0 0 0
(d)
;:.J!l
clk
bi
(b)
n1 = s1 'sObi + s1s0'bi
nO = s1 'sO'bi
bo = s1 'sObi' + s1 'sObi = 51 '50
(e)
124
Sequential Logic
Step 2:
Step 3:
Step
Slep 5:
. FSM has three states. the architecture has a two-bit
the archit ect ure. Smce the
regi ster. as shown in Figure 3.53(b).
. I vardly encode the thrce , tates as 00. 01. and
Encode Ihe sla tes. We can str:lJg 11 Of'
10. as shown in Figure 3.53(c).
" T." cOllvert the FSM wi th encoded states to a state table as
Creale Ihe slale lable. '" I t bo 0 d
For the
unused Slate 11. we have C lOsen to outpu = an
shown in Fi gure 3.53(d).
ret urn 10 00. . . .
.' " We derive the equ<ll lons for c:Jch combll13tlOnai
Impl ement the comblll a tlOnal logiC. .. h
. . . 353( and then creal C the fina l CIrculi as Sown.
logiC output. as shown III Figure. t:,
EXAMPLE 3.10 Sequence generator
Inpuls: none; Outputs: w. x. y, z
\Ve want to design a circuit wi th four output s: w. x,, y.
and Z The circui t should oenerate the followlIl g
of output pallems: 000 I. 00 II. 11 00. and
1000. After 1000. the circuit should repeat the
sequence. slarting m 0001 again. We wanl the circuit
to generate the next pattern only on a ri sing edge.
Sequence generators arc common in a range
of systems. For exampl e. we might want [0 blink a
of four lights in a particular paltcrn. such 35 in a festi ve
lights display. We might instead want to rolate an elec-
tric motor a fixed number of degrees on each cl ock
cycle by powering magnets around the motor in a spe-
cific sequence to attract the magnet ized motor 10 the
next position in the rotati on-such a motor is known as
a stepper motor. si nce the molor rOi ates in steps.
We can design the sequence generator controller
using our five-s tep process.
Step I :
Step 2:
Capt ure Ihe FSM. We capture the
system' s behavior as the FSM shown in
Figure 3.54. The FSM has four states. which
weve labeled A. 8. C. and D (though any
other four unique names would do j ust fine).
Creale Ihe a rchit ecture. The standard
controller architecture for the sequence gen-
erator wi ll have a 2- bit state regi ter to
represen t the four possible states. no inputs
10 the logic. and outputs w. x. y. z from the
logi c. along wi th outputs n I and nO. as
shown in Figure 3.55.
Step 3: Encode Ihe states. We can encode Ihe
states as 00. 8: 01. C: 10. D:
II. Any other encoding with a unique code
for each state would also do fine.
Step 4: Create Ihe stale tabl e. The Slale lable for
Ihe FSM with encoded states is shown in
Table 3.4.
cb--cb
wxyz=OOll wxyz= ltOO
Figure 3.54 Sequence generator FSM.
Wo
x C "TI
z in
clk
Figure 3.55 Sequence generalor
controller architect ure.
TABLE 3.4 State lable for sequence
generator controiler.
Inputs Outputs
sI sO w X y z nl nO
A 0 a 0 0 0 I 0 I
8 0 I 0 0 I I I a
C I 0 1 I 0 0 I I
D I 1 I 0 0 0 a a
Slep 5: Impl ement Ihe combinalional
logic. We derive the equati ons
for each output of the combina-
tional logic from the table. Afler
some algebraic simplification.
the equations arc as fOll ows:
w = sI
X sIsO '
y 5 I sO
z 51
nI 51 xor sO
nO sO '
The final circuit is shown in Figure
3. 56.
EXAMPLE 3.11 Secure car key controller (continued)
3.4 Controller Design 125
W
"TI
X
o
c
r----+- y -g
Figure 3.56 Sequence generator
control ler archileclurc.
z in
nl
Let"s complete the design for the secure car key controiler from Example 3.5. We already carried
oUl the Capture Ihe FSM step of the fi ve-step process. wi th the FSM shown in Figure 3.41. The
remaining steps arc as foll ows.
Step 2: Creale Ihe a rchileclure. Since the FSM has five statcs. wc' lI need a 3-bit state reg-
iSler. A 3bi t stat e regis ter can reprcsent eighl slates. so three Slates will be unu ed. The
input to Ihe logic is signal a. while the OutpulS are signal r and next SlalC oUlpurs n2.
n 1. and nO. The architcclure is shown in Figure 3.57.
Slep 3: Encode Ihe s lates. Let"s encode the states using a straightforward binary encoding of
000 through 100. The FSM with state encodings is shown in Figure 3.58.
Combinalional n2
logic
Figure 3.57 Secure car key
controll er archil cclurc.
o
r C"TI
in
Inputs: a ; Outputs: r
Figure 3.58 ecure car F M \I ith
encoded Sl3le ___ .
Slep 4: Creale the Slule lable. The FSM convened 10 a stote table is ,ho.I n in 3.: . For
Ihe unused :\ttHcs. we h:wc ch sen to SCI r - a 311d the nc\! 'tale 10 000.
126 Sequential Logi c Desig n- Controllers
Slep 5:
Impl ement the
logic. We call design four Cl fCUltS.
one for c;lch output. 10 implement
combinational logic. We Icave thi S
step as an exercise for the reader.
More on Controller Design
Converting a circuit to a n FSM . .
We showed in Secti on 2.6 Ihal a clrculL
Iruth table. and equat ion were al l ways 01
representing the same combinational fu nc-
tion. Similarl y. a circui t. state labl e. and
FSM are all ways of represenling Ihe same
sequenlial funcli on.. .
We have been converllng an FSM 10 a
circuit using a fi ve-step process. We can
also convert a circuit to an FSM by
applying Ihe five-slep process of Table 3.2
i n In general. converting a cirCUit
to an equation or FSM is known a,s re,_'erse
ell gilleerillg Ihe behavior of Ihe CirCUIt.
EXAMPLE 3.12 Converting a sequential circuit to an FSM
Given the sequential circuit in Fi gure 3.59. fi nd III
TABLE 3.5 State table for secure car key
controller.
InpulS
OUIPUI S
52 51 sO a
r n2 nl nO
o
Wait 0
o
KI 0
o
K2 0
o
K3 0
K'/
Unused
o 0
o 0
o
o
o
o
o 0
o 0
o I
o I
I 0
1 0
1 1
I 1
00000
I 0 00 1
00 0
1 0 0
1 0 1 1
1 0 1 1
o 0 0 0
1 0 0 0
o 0 0 0
I 0 0 0
o 0 0 0 0
I 0 0 0 0
o 0 0 0 0
1 0 0 0 0
o 0 0 0 0
1 0 0 0 0
:;
an equivalent FSM. x
We Slart from slep 5 of Ihe 5-Slep process :2 o
c
described in Table 3.2. The combinalional f2
circuit has already been implemented. and we
can proceed to step 4. where we create a stale
lable.
The combinati onal logic in the controller
architecture has 3 inputs: 2 inputs. 50 and s1.
repreo;;;ent the conlents or the Slate register. and I
input , x, is an eXlema.1 input. Thus slate
table wi ll have 8 rows Ince there arc 2 ::; 8 pas
sible combinat ions or inputs.
A ncr we set up the state tabl e and cnu
meratc al l pas ibl e combinati ons or inputs
(e.g .. ..... slsOx=ll l). lI'e
use Ihe lechniques described in Secll on 2.6 10
fill in Ihe values of Ihe OUlpUIS. For example.
con,ider Ihe OUI PUI y. From Ihe combinalional
Figure 3.59 A <C(IUenlial circlIil wilh
unknown behavior.
Z -ij
c
;;;
circuit. we see that y" 5] ' . Knowi ng Ihi \ , .
we add a 1 in the y column of Ihe \laIc lable In every row where S 1 O. and we add a 0 to
remaining ' pace. in Ihe y col umn. Now nO. which wc ,ee h'" Ihe Boolean
nO. S l ' sO ' X. Accordingly. we '01 nO 10 1 when S 1 = 0 and sO = 0 lind X = 1. We fill In tht
column\ ror z and n 1 u\ing a simi lar an::llylii\ and move on 10 the neX! \"cp.
111 step 3, we must encode the Natu-
ra ll y, the Sl ates have already been encoded. bUI we
can still name each Slate. We arbi trarily choose
Ihe labcls A. B. C, and D. secn in Table 3.6.
3.4 Controller Design
TABLE 3.6 State table for sequential
circuit
Inputs
OUlputs
5 I sO nl nO y
127
Slep 2 call s for Ihe creal ion of Ihe slandard
archit ecture. This step requires no work
Since the controll er architecture was already
defined. A
0 0 0 0
0
Finall y, in Slep I. we caplure Ihe FSM. Ini -
tiall y. we can set lip an FSM di agram with the
rOllr slates we've labeled in step 3, shown in
Figure 3.60(a). Nexl, we lisl Ihe va lues of Ihe
FSM outputs y and Z next to each state. For
example. in Siale A (51 sO = 00). Ihe OUlputs y
and z are 1 and O. respectivel y. so we list
"y l = 10" wilh Sial e A in Ihe FSM.
Outpuls: y, Z
0)
0
0 0
yz:10 yz: 10
0
0
0 0
yz:oo
YZ:01
(a)
(b)
0
B
0
0
C
D
0 I 0
0
0 0 0
0
1 I 0 0
0 0 0 0 0
0 I I 0 0
0 0 0 0 0
I 0 0 0 0
Inputs: x: Outputs: y, z
YZ: 10
yz:01
(c)
Figure 3.60 Converting a Slale lable 10 an FSM diagram: (a) inilial FSM. (b) with OUlputs
specified. and (c) FSM wi lh OUIPUI S and transilions specifi ed.
Art er li sting the outputs for Slares B. C. and D. shown in Figure 3.6O(b). \\C tum 10 the ,late
tmnsil ions specified in the slate tabl e by 111 and nO. Consi der the first row oflhe sttlte table. \\hich
says Ihal nlnO-OO when s1s0x=000. In olher words. when in laleA (s1s0=00). the nnl
Siale is Siale II (nlnO = 00) if X is O. We can represenl Ihi s in the FSM diagram b) dr.l\\ing an
arrow rrom slate A back to stal e A and labclin2. the new trnnsition " X ' ," No\\ consider the . nd
row of the stal e tabl e. whi ch indicates that Sl3lC A. we tr.msition to state B \\hen \ =- 1. \\'c add
a transiti on arrow from Slale A 10 B and label it "x." Arter labeling all the tr.lnsitions. \\ e are left
wilh Ihe FSM in Fi gure 3.60(c).
You mny nOli ce thut sl<He D cannOI bl.! reached from any OIht!r SlalC and transi tion, (0 stale -\
on any input. \Ve reasonabl y infer that (he origi nal F had onl) Ihree Slates and 'Iale D i"
:111 cXlrn. unused stat e. For completeness. it is preferable to Icave state 0 in lile tinal diJgram.
however.
Gi ven any synchronous circuit of logic gales and flip-flops. \\e ' :m
redraw the ci rcuit as of a state register and logi -{)ur st:mdard l'('ntroll r
arc hit eclUre-just by grouping all Ihe Oip-O ps logelher. Thus. the appfO.Ich dc>cnbnl
above works for any synchronous circuit. not j ust a circuit dra\\ n in the fonn ,I'
our siandard controll er archit eclure.
128 Sequential Logic Design- Controliers
ab=ll-
next state?
...
o:::X
a'b 0
whati'
ab=OO?
a'b' ...
Common Pi tfalls
Mi stakes are commonl y made when capturing an FSM, relating to regarding
the transiti ons leaving a state. In short, one and ollly one transitIOn condt tlOn should ever
evaluate to true during any ri sing cl ock edge. The propert ies are:
I. Only one condilioll sholiid be Irlle-For a given s13te, for any rising cl ock edge,
no more than one transition condit ion should be trUe. For example, consider an
FSM with inputs a and b, and a state SWle I with tWO outgoing transitions, one
labeled "a", and the other labeled "b." What happens when a = 1 and b 1-
which transiti on should the FSM take? The FSM designer must ensure that the
conditions are exclusive-only one could possibl y ever be true at one ti me. In the
example, the designer might label the transitions "a" and "a ' b" to solve the
problem. Actuall y, a particular type of FSM, known as a nondetermillistic FSM,
does allow more than one condi tion to be true and chooses among them in some
arbitrary way-but when designi ng circuits, we usuall y want detenninistic
behavior, so we don ' t consider nondetenninist ic FSMs further.
2. Olle cOlldilioll sholiid be Irlle-For a given state, for any rising clock edge, aile of
the transitions from that state must be taken. In other words, every input combina-
tion should be accounted for in every state. Designers sometimes forget to ensure
this. For example, consider an FSM with inputs a and b, and a state Slalel with
two outgoing transitions, one labeled "a", and the other labeled "a ' b." What
happens if the FSM is in Slatel . and a = 0 and b O? Neither of the two transi-
tions from Stale l has a true conditi on. The FSM is not full y specified-we need
to add a third transition, indicating what state to go to if a ' b' is true. With that
third condi tion, we have covered all possible values of a and b. A commonly for-
gotten transition is a transiti on pointing from a state back to itself.
We can verify the above two properti e using Boolean algebra. For the first property
of only one condition bei ng trUe, we can check that the AND of evelY pair of cOlldilions
all Iransiliolls of a stale always reslI lls ill O. For exampl e, if a state has two transitions,
one WIth condi tion a and the other with condition a ' b, using transfonnati ons of Boolean
algebra we obtain:
* a ' b
(a*a ' )*b
= 0 * b
o
For th: second si tuation of one condi ti on being true, we can check that the OR of all
Ihe condlllOlIS all l/'QnS/ll OllS of a stale always in 1. Considering the same example
ofa state that has two tranSit IOnS, one with condIti on a and the other with conditi on a ' b
uSlOg transfonnations of Boolean algebra we obtain: '
+ a'b
a * (1 +b ) + a' b
+ ab + a ' b
a + (a+a' ) b
c a + b
3.4 Controlier Design 129
Clearly. Ihe OR of Ihose Iw . . .
were bolh 0, neither condili on nOl l. bUI rather a+b. Thus. if a and b
specIfied 10 the FSM. Abov d be .Irue, ,lIld Iherefore the neXI Sl ate would
Checking yields: e, we fixed Ih, s problem by addi ng another transi ti on,
+ a ' b + a ' b '
a + a' ( b+b . )
a a + a ' *l
a + a'
- 1
. Analyzing Ihe equalions Illad f ..
ell her 1 or a is a 101 of work. TIl e of every stale and provi ng they equal
two slIuati ons and inform the d ,ere ore
t
,. ,I good FSM capture 1001 wi ll delecl the above
e Igner 0 Ihe SIl U3110n.
EXAMPLE 3.13 Verifying transiti .
. on properties for the code detector FSM
As evidence 'hat
lilis "pitfa" " is
ifldeed common,
we ad",il ,har 'he
mLfloke we made
in Figure 3.46wos
ge1ll1;1I. and lIof
just made for
educatiollal
purposes. A
reviewer of Ihe
book caugh, il. We
left the mistake ill
alld added this '
example. to stress
'he pOitllllzat the
misrake is
commo",
FIgure 3.46 shows an FSM
truc" . Or a code detector We \V'mtIO ,' f h '
( ,property for the transilions leavi ng '1: S' ven y I e 'only one condilion should be
a r +b+g). We Ihus have three pairs of S . llIrl. There are Ihree condilions: a r, a'. and
rollows: ' con( II IOns. which we AND and prove each equal 0 as
a r * a'
m( a*a')r
- O*r
D 0
a ' * a ( r ' +b+g)
- (a'*a)*(r ' +b+g)
O* ( r'+b+g)
0
ar * a ( r ' +b+g)
- (a*a)*r*(r ' +b+g)
- a*r*(r'+b+g)
- arr ' +arb+arg
- 0 + arb+arg
arb + arg
- ar(b+g)
It appears our FSM is not fu ll s cifi d
result in 0, which in IUm means cpe d' e , as Ihe AND of Ihe third pair of conditions does nOt
delerministic FSM (if bOlh d" on 1I10ns could be true at Ihe same time-resulti ng in a non
con Ill ons arc fmc Wh' l ' . h -
deleCtor problem descripll'on tllat ' . a IS t e nexl stale?). Recall from the code
we wan( to trans' t" f h
a bUllon is pressed (a - I) and Ih t b . I Ion rom t e Slarr slale 10 Ihe Redl Slate when
Th F
. a ullon IS Ihe red bUllon d h
e SM III Fi gure 3.46 has the c d" . ,an no Ot er colored bUllon is pressed.
should instead be arb' g' . h
on
Ilion a r. Our mIstake was underspecifying Ihi s condi lion' I ' t
- Ill ot er words a b tt h be .
(r) and Ihe blue bUllon has nOI been p d on as en pressed (a) and il is the red button
The transilion from Starr I back to and Ihe green bUllon has not been pressed (g ').
the same as in Figure 3 46 aft I ' all stale could then be wrillen as a (rb ' g' ) , (which is
verify the "only one DeMorgan 's Law). After this change, we can agai n try 10
and a (rb ' 9 , ) ': p operty for all paIrs of the three conditions arb' g'. a' .
arb'g' * a'
D aa '*rb'g '
O*rb'g'
o
a ' *a(rb ' g ' )'
O*(rb 'g' )'
0
arb ' g' * a (rb ' g ' ) '
= a*a*(rb ' g')*(rb'g')'
write rb ' g ' a Y for clarily .. .
D a*a*Y*Y'
= a*a*O
c 0
We would need 10 change Ihe [ 'f d' .
Ihe pairs of condilions for those Ion con It Ions of the olher slates si mil arly, and then check
ransltl ons too.
130
Sequential Logic Design- Controllers
a---o
a=O a=O
b=' b=O
c=0 c='
...
a---o
b=' c='
, SltIrr we OR the three conditions and
To verify the "one condition is mlc" property for stale '
prove they I:
arb ' g ' + a ' + a ( r b ' g', ) ', ) ' (write rb' g' as Y for clarity)
a ' + arb ' g ' + a(r b 9
a ' + aY + aY '
a ' + a(Y+Y' ) = a ' + a(l )
- a ' + a
1
We wou ld need to check the property for all other states toO,
. I' f' FSM Notations' Unassigned Outputs .. b ' . I"
SImp I yong . I . . ' . ' FSM otation of every transluon eong Imp .cnly
We already introduced the slmphfYlng h n commonl y used simplification involves
. . . lock edoe Anot er
ANDed wIth a ri SIng c <> l'stinO the assionment of every output in
. . If FSM has many outputs, I <> "
asslgno ng outputs. an I d ke the relevant behavior of the FSM hard to
every state can become cumbersome, an ma as follows-if an output is not explici tly
di scern. A COllllllon simpll fymg notati on IS . 0
in a state. the output is implicitly assIgned a .
- . . . , '. I li ci t Clock Connections
Simphfyong C,rcu, t Drawongs. v a si no Ie clock signal connected to all sequential
Many if not most sequential CorCUlt s a. e <>. I because of the small triangle input
k ' a component IS sequenua
component s. e no\\. k b I Many circui t drawings therefore use a simplifi.
drawn on the component S bloc. sym o. 'be connected to all sequential components,
cat ion wherein the clock sIgnal IS assumed to .
. I ltd wiring in the draWIng.
This simplificatIOn leads to ess c ut ere
. .' I d Sequenti al Circuit Design
Mathematical Formali sms on Combmatoona .an Bin functions and FSMs for
We have described two mathemati cal formahsms, 00 ea .'
. . .. I d lti al circuits respect ively. Note that we dId not halle to
deslgllln
o
combonatlona an sequel , b 'Id' th
e . d"t Recall that our first attempt at UI 109 a ree
use those formah sms to eSlgn CorCUI s. . .
. .' F 335 J'ust had us connecti ng components together on Lile
cycles-hl2h laser ti mer on Igure . . '.
-. I orkino circuit However, usong those formahsms provIdes for
hopes of creating a correct y w " .... . .
a structured and sound method of designong corcuns. Those fonnaiosms also proVIde Lile
basis for powerfu l automated tools to assist us wi th design, s uch as a tool that would auto-
. II h k C the common pitfalls described earioer on thIS secllon, tools Lilat
matlca y c ec ' lor . .' ' .
. II n Boolean equations or FSMs onto corcun , tools that venfy that tM aulomall ca y conve
circuit s are equivalent, tools that simulate our systems, etc. And, we have touched
on all the benefits of those mathematical formalisms relating to automatong the vanous
aspects of designing circuits. and verifying the circui ts behave properly. The Importanceo[
using sound mathematical formalisms to gUIde deSIgn cannot be overstated.
3.5 MORE ON FLIP-FLOPS AND CONTROLLERS
Other Flip-Fl op Tvpes
Today, designer generally use registers to implement their bit storage needs, and LilOSl
regi ters typically are built from D flip-flops. However, in the past, tran Istors were
more scarce than today. Thus, designer often utilized other types of flIp-flops, haVll\!
3.5 More on Flip-Flops and Controllers 131
more functionalit y than D flip-flops. to reduce the logic gates required out ide of the flip-
flops, and hence to reduce the number of ICs neces ary to implement a circuit. Those flip-
flop types Included SR. JK. and T flip-flops.
SR Fli p-Flop
The SR flip-nap is similar to the SR latch descri bed earlier. with additional logic to make
the CorCUlt tri ggered by the edge of a clock. rather than just the level of the clock.
JK Flip-Flop
The JK flip-fl op is simil ar to an SR flip-fl op. wi th J corresponding to S, and with K cor-
responding to R (I remember thi s by thinking of " K" standing for "Kl ear" or clear). The
JK flIp-fl ap's behavior differ from the SR flip-flop when both input s are I . Recall that an
SR flip-flop 's behavior is undefined when both inputs are I. A JK flip-flop. in contrast.
toggle when both inputs are set to I (at the next clock edge. of course). To toggle means
to change to the opposi te state, meaning if the present stored bit is I. the next stored bit
would be O. Likewise, if the present stored bit is O. the next stored bil would be I.
T Fli p-Flop
A T fli p- fl op acts like a JK flip-flop wi th the JK inputs tied together to form the T input.
In other words, whenever T is 0, the flip-flop maintains its current state. but whenever T
is I, the flip-flop toggles (think of "T" for 'Toggle").
Nonideal Flip-Flop Behavior
Clk---IL-
o-riL-
: :
:----:
setup time
I
, ,
, ,
, ,
t--')
hold time
Generally, when we first learn about di git al design. we assume ideal behavior for logic
gates and flip-flops, JUSt like when we first learn physics of motion. we as ume there' s 00
friction or wind resistance. There is. however. a non ideal behavior of flip-ftops-metasta-
biJity-that is such a common problem in real digital design practice, we feel obliged to
di scuss the issue briefly here. Digital deSigners in practice should study metastability and
possible SOluti ons quit e thoroughl y before doing serious designs.
Metastability comes from failing to meet fl ip-flop set up or hold times, which we now
introduce.
Set up Times and Hold Times
Flip-flops are built from wires and logic gates, and wire and logic gates have delays.
Thus, a real flip-flop imposes ome restri cti ons on when the flip-fl op's inputs can change
relative to the clock edge. in order to ensure correct operation de pite those delays. Two
important restriction are:
Setllp time: The inputs of a flip-flop (e.g" the D input ) must be stable for a
minimum amount of time, known a the setup time. before a clock edge arrives.
This intuiti vely makes sense-the input values mu t have time to propagate
through any internal logic and be waiting at the internal gate ' inputs before the
clock pul e arri ves.
Hold time: The inputs of a flip-flop must remain stable for a minimum amount f
time, known as the hold time, after a clock edge arrives. Thi at 0 makes intuitive
sense-the clock signal mUSt have time to propagate through the internal gate- to
create a stable feedback state.
132 Sequential Logic Design-Controllers
dk-----FL
o--t--L
1 ,
H
setup
violation
metastable
state
. . I k pul se width- the pul se must be wide
A related restrict ion is on the mlnllnum C oc . I I ' d
, tl ough the tnt erna oglc an create a
enough to ensure that the correct values propdgate lr
stable feedback state. , '
. . II 't h a datasheet describi ng setup li mes, hold limes, and
A flip-flop typlca y comes WI , ,
minimum clock pulse widths. . . I" D han cd 10 0 too close
Figure 3.61 ill ustrates an example of a setup lime Via all an. c g
10 the risino clock. The resul t is that R was not 1 long enough 10 create a stable feedback
" . Q b ' 0 I lead Q glitches to 0 bnefl y. That
in Ihe cross-coupled NOR gates With etng. ns., . . .
gli tch feeds back 10 the lOp NOR gale, causing Q' to gill ch to 1 Thai giltch feeds
back 10 the bOllom NOR gate, and so on. The oscillali on woul d ilkely conttnue until a
race condition caused the circuillo senle inlo a stabl e si luation of Q 0 or Q the
circuil coul d enter a melastable state, which we now descri be.
D lalch
C
D
S
R
Q'
Q
Figure 3,61 Setup lime violation: D changed 10 a (I) 100 close 10 the ri si ng clock, u changed 10 1
after the invener delay (2), and then R changed 10 I afler Ihe AND gale delay (3), BUI then the
clock pulse was over, causing R to change back 10 a (4 ) before a stable feedback situalion wi th 0-0
occurred in the cross-coupled NOR gales. R's change 10 I did cause 0 10 change 10 0 after the NOR
gate delay (5), bUI R's change back 10 a caused 0 10 change ri ghl back 10 1 (6). The glitch of a 0 on
Q fed back inlo the lOp NOR gate, causing 0' 10 glitch 10 1 (7). That glitch of a 1 fed back 10 Ihe
bottom OR gale, causing anolher gli lch of a a on 0, That glilch runs around Ihe cross-coupled
OR gale circuil (osciliali on}-a race condilion would eventually cause Q 10 ettle 10 1 or 0, or
possibly enter a metaslabl e stale (10 be discussed),
Metastabili ty
If a designer fails to ensure that a circuit obeys the setup and hold times of a Rip- fl op. the
result could be that the flip-flop enter a met astable state. A Rip-fl op in a metastable stall
is in a state other than a stable 0 or a stable 1. Metastable in general means that a system
is only marginally stable-the system has other states that are far marc table, A fli p-Hop
in a metastable state may have an output with a va lue thllt is not a Q or a L instead out-
putting a voltage somewhere between that of a 0 and that of ai , That voltage may nl 0
o<;cillate somewhat. That's a probl em. Since a flip-flop' output i< connected to other
components like logic gates and other flip-nap" that wangc vol LOge value may cause
other components to output strange value" and soon the V(Iluc, throughout our entire
circuit can be in bad
3.5 More on FlipFlops and Controllers
133
Why would we ever violate setup and hold times? After all, within a circuit we design
we can measure the longest possible path from any Rip-Rap output to any flip-Rap input
long as we make the clock period sufficientl y longer than that longest path, we can ensure
Our CirCUli obeys setup li mes. Li kewise, we can ensure that hold times are satisfied too
The probl em is that our circuit li kely has to interface to external inputs, and we
control when those inputs change, meaning those inputs may violate setup and hold times
When connected to Rip-fl op inputs, For example, an input may be connected from a
button bell1g pressed by a user-the user can' t be told to press the bunon so many nano-
seconds before a clock edge and to be sure to hold the button so many nanoseconds after
the edge so that setup and hold ti mes are sati sfied. So metastability is a problem
pnmanly when a Rip-fl op has inputs that are not synchronized with the circuit's c1ock-
such II1pUts are said to be asynchronous.
Designers usuall y try to synchronize a cir-
cuit 's asynchronous input to the circuit's clock
before propagating that input to components in
the circuit. A common way to synchroni ze an
asynchronous input is to fi rst feed the asynchro-
nOlls iI/pur imo a single D flip-flop, and then use
the output of that Rip-Rap wherever the input is
needed, as shown for the asynchronous input a i
in Fi gure 3.62. Using a si ngle Ri p- Rap as shown
al so eliminates a second probl em of different
values of the same signal appearing at the various
internal Rip-Raps at a clock edge, due to different
path delays.
"Hold on now! " you might say. Doesn' t that
synchroni zing Rip-Rap experi ence the setup and
hold time probl em, and hence the same metasta-
bili ty issue? Yes, that's true. But at least the
asynchronous input directl y affects only one Rip-
fl op, rather than perhaps several or dozens of Rip-
fl ops and other components. And that synchronizer
aj ----0>--"':.----1
,
"
aj
synchronizer
Figure 3,62 Feeding
external inputs into a single flip-Bop
can reduce melllSlJlbilit) problems.
Ri p-Rap is pecifically introduced for synchronization purpo es and has no other
whereas other Rip-Raps are bei ng used to store bits for other PllIpDSCS- We can !herefore
choose a fli p-flop for the synchronizer that minimizes the metasrnbilit) prohlem-we can
choose an extremely fast Rip-flop, andlor one with I'el)' small setup and hold times. and/or
one wi th special circuitry to minimi ze metastability_ That Rip-Rop may be bigger than
nonnal or can ume more power than nonnaL but there's only oe su h Hip-Hop per -yn-
chronou input. so those issues aren't a problem. Bear in mind that 0 matter what we 00_
though, the synchronizer flip-Rap could still become mc:tasrnble. but 3t Ie -t we can nuni-
mize the odds of a meta ' table state happening byeh -iog a good Hip-Hop,
Another thi ng to consider i that a Rip-flop will typicnll not ' (3) metast:lbl for
I ng, Event ually, the flip-flop will "t pple" mer to amble 0 or a tahle _ It e how 3
oi n tos cd onto the ground nm spin for a \ hi Ie (a mctustubl state) but will
topple over to :1 stable head or tail. Whm many designcn; th refore do IS IIltrodu:e til )/'
1110rc flip-flops in series for s nchronitation purposes, as ShOll'11 in Figure 3 63, '0 I n If
134 Sequentia l Logic Design-Controll ers
the first flip- fl op becomes met a-
stable. that fl ip-fl op will likely
reach a stable state before the
next clock cycle. and thus the
second flip-fl op is even less
likely to go metastabl e. Thus the
odds of a metastable signal actu-
al ly making it to our circuit"s
normal flip-flops are very low.
This approach has the obvious
drawback of delaying changes on
the input signal by several
cycles-in Figure 3.63. the rest
of the circuit won't see a change
Probability of flipllop being
metastable is:
al
synchronizers
very
very
low
Incredibly
low
Figure 3.63 Synchronizer flip-fl ops reduce probability of
melaslabil llY in our regul ar flip-flop .
on the input a i for three cycles. . ...
As clock periods become shaner and shaner. the odds of the firs t flip-flop
before the next clock cycle decreases. so metastability i becomIng a more chall engIng Issue
as clock periods shrink. Many advanced methods have been proposed to deal with the.
Nevenheless no malter how hard we try. metastability wlil alway be a posslblilly,
meaning our cir;uit lIIay fail. We can minimi ze the likelihood of fail ure, but we c.an' t
compl etely eliminate failures due to metastabilit y. De igner often rate their deSigns
using a measure called mean time between failures . or MTBF. DeSigner typically 31m
for MTBFs of many years. Many students find this concept-that we can' t design fail-
proof circuits-somewhat di sconcening. Yet . that concept i the real situati on in design.
Designers of serious hi gh-speed di git al ci rcuits shoul d tudy the problem of metasta-
bility, and modem soluti ons to the problem. thoroughly.
Flip-Flop Reset and Set Inputs
Some D flip-flops (as well as other flip-
flop Iypes) come with extra inputs that
can force the flip-flop to 0 or 1, inde-
pendently of the D input. One uch
input is a clear, or reset, input, which
forces the flip-flop to O. Another such
input is a set input, which forces the
flip-flop 10 1. Reset and set inputs are
very useful for initializing flip-flop to
an inilial val ue (e.g., initializing all flip-
flops to Os) when poweri ng up or reset-
yyr-y
(a) (b) (e)
Figure 3.64 0 nip-flop, with: (n) 'ynehronous
resel R. (h) a ynehronou rc et AR. and (e)
asynchronou; rescl and ... 1.
ting a system. These reset and set inputs hould not be confused with the Rand S inputs of
an RS latch or flip-Hap-the reset and set inputs are control to any type of
flip-flop (D. RS. T. JK) that take priority over the nomlal data of 0 nip-flop.
The resel and 5et inpull of a flip-flop may be either synchronol!\ or 0'> nchronou . A
synchronous reset input force the flip-flop to 0, regardlc \\ of the ,aluc on the D inpuL
during a rising clock edge. For the flip-flop In Fi gure I.M(a). ctllng R to 1 rces the
3.5 More on Flip-Flops and Controllers
135
to 0 on the next clock edge. Li kewise, a synchronous set inpul forces the flip-
. op to 1 On a ri sing clock edge. The reset and set inputs Ihus have priority over the 0
Input. If a flip-flop has both a synchronous reset and a synchronous set input. the flip-flop
datasheet must Inform the flip-fl op user whi ch has priority if both inputs are sellO 1.
An asynchronous reset forces the flip-fl op to 0 independently of the clock signal-
the clock does not need to be ri sing, or even be 1. for the asynchronous reset 10 OCcur-
hence the term "asynchronous." Likewise, an asynchronous set. also known as preset.
can be u ed to asynchronously force the flip-flop to 1.
We omit di scussion of how
synchronous/asynchronous reset/set
inputs would be internall y designed
in a flip-flop.
Sample behavior of a flip-fl op's
asynchronous reset input is shown in
Fi gure 3.65. We assume Ihe fl ip-fl op
initially stores 1. Selting AR to 1
forces the flip-fl op to O. independent
of any clock edge. When the next
clock edge appears, AR is still 1. so
the flip-fl op stays 0 even though the
input 0 is 1. When AR returns to O.
the flip-fl op foll ows the 0 inpul on
successive clock edges, as shown.
Initial State of a Controller
cycle 1 cycle 2 cycle 3
D
,
AR '
--LLr----L
Q : : ;
Figure 3.65 Asynchronou reset forces !be fl ip-Hop
to O. independent of c 1 or D.
Particularly observanl readers may have come up with a question when we implemented
FSM as controller in thi secti on-what happened to the indication of the initial tale of
an FSM when we designed the controller implementing the F M' The initial -mle of an
FSM i the state that the FSM starts in when the FSM is first a ti\1lted-or in ntroUer
temlS. when the controll er i firsl powered on. For example. the laser timer ntroller
FS 1 in Figure 3.39 has an initial state of Off. When we omened our graphi-al to
state tabl e in thi s section. we ignored the initial tale infonnation. Thus. all of our n-
troller circuits stan in some random stale based on whate,,:r \'alues happen 10 appear m
lhe state register when we power up the circuit. , ot kno\\;n" the initial -tale of J -ircuil
could pose a problem-for example. we don't want ur laser timer ntroller I ;!:lrt in
state lhat immediately turns on the laser.
One oluti on i to add an additional input. r eset. to e",,) L'OnlI'Olier. tting "ese:
to 1 should cau e a load of the initial state into the stnlC regber. Thi inioal 51 Ie ' W
be forced into the tate register. The re' et and set inputs of a flip-Hop ( OJ 10 \ ')
in thi situalion. We enn imply onnect the controller' - rese input I the ('e. l;md
input of the tate register" Hip-ft ps in a \\ ay that sets the Iltp-Il< s 10 the imtiJI 5t I
when rese i 1. For if the initial state of n sw regi, r . h,'Illd
lhen we could nneet the ontrollcr's re.cI inrut 10 re,et .ll1d set tnpU
flop . . as .ho\\ n in Figure 3.M.
136
3 SeqUential L .
ogle DeSign- Controllers
. Or cou"e. for thi; reset func-
tIonality to as desi red. the
deSigner must lhal the con-
lroll
er
', reset input is I when the
sYStem is fir>! powered up. Ensuring
lhe reset input is I duri ng power up
an be hnndlcd using an appropriat e
e leclronic circui t connected to the
On/off Swit ch. the descripti on of
\Vh,ch is beyond Our scope.
ate that. if the synchronous
re ' et Or set inputs of a flip- nap are
Used. then the earlier-discus ed
etup and hold times. and associ.
ated metastabil ity issues. apply 10
tho e reset and sct inputs.
elk
-
resel
....
b
-
Combinational
logiC
t sO
f;;o-
s t
State register
,---
D O' p..
t> Of- t> 0 f.-
S--
Nonideal Cont II B' .
ro er ehavlOr: Output Glitches
Figure 3.66 Threecycle high laser timer
with a reset input that loads the stale regi ster with
the initial "Iatc 0 1.
Glitching is the presence of temporary values on a wi re. typicall y caused by
delays of different logic paths leading 10 thm wi re. We saw an example of gluchll1g m
Figure 3. 13. Glitchino wi ll also often occur when a controll cr changes states, due to dlf-
ferem path lenOlhs each of lhe cont roller's state regi ter flip- fl op to the controller's
Consider lhe IhreccycJes-hioh laser timer design in Fi gure 3.50. The laser
should be off (output x=O) in Slat: 5150=00 and on (x- I) in . tates 5150- 01.
sIs 0 = 1 O. and 5 I 50= II. However. the delay from 5 I 10 x's OR gatc 111 thc figure could
be longer lhan the delay from 50 to that OR gate. The result could be lhat when the state
regi ster changes Slate from 5150=01 to 5150-10. the OR gate' input could momen-
taril y ee a 00. The OR gate wou ld thu output 0 momentarily (a glitch). In the laser
timer example. that glit ch could momentarily shut off the lascr-an undeSIred ituation.
Even would be glit che that momentaril y tum all a la;cr. .
Real deSigner must detenninc whether such glitching would reall y pose a 111
lheir pani ular tem. and if so. those designer\ should take action to avoid gluchll1g.
One solution in the laser timer example might be to insen a 0 nip-fl p after x s OR gate 10
Figure 3.50. would shift the x output later by I clock cycle (\till resulting 111 three
cycles high. however). but should eliminate glit che\ seen at the x output. as only the table
value appearing at the output would be loaded int o the fl ip-flop on a clock edge.
Active-Low Inputs (Negative Logic)
mil now, we have a \umcd acti ve hi gh input' on
flIp-flop, and other componelll' . An actil'e-iligil
i nput h a comrol Input who\c a"ociated operatIon I<
hy ,cll ll1g the '"put to I For examplc, If an
Input can rc'ct a fl,p-Oop. we '" umed that '"flut
rc'ct ",hen thc Input \ value Wi" I Hnv.c'er, a
-.., . _ ..._-----
D
o
figura J 67 f) Olr-O lp Wllh ad;'e
In\\- )n hrnnuwlo rr'Cl IOp"l
3.8 Product PrOfile-Pacemaker
137
component can instead have an active- low input. An active-low input (also known as a
/l egative logic input) is a control input whose operali on is aClivated by seuing the input to
O. Fi gure 3.67 depi cts a 0 Rip-Rap with an acti ve-low synchronous reset input-the circle
at the R input indicates that the R input is aClive-low. Thus. LO reset the flip-flop LO O. we
would set R to 0, whereas for nonnal 0 Rip-fl op operalion, we would set R LO 1.
Active-low inpulS can OCcur on any component with a control input. not just on flip-
fl ops. For exampl e, the enable control input on a decoder could be active-Iow-seuing that
enable to 0 (meaning the decoder is enabled) would cause nonnal decoder operation, while
selli ng the input to I (meaning the decoder is disabled) would result in all OUtpUls being O.
When di scussing the behavior of a component. designers wiIJ often use the Lenn
assert to mean setting a control input to the val ue that activates the associated operation.
Thus, we mi ght say that one must "assen" the R inpul of the 0 flip-flop in Figure 3.6 in
order to reset the Rip-fl op to O. Using the tenn assen avoids pos ible contu ion mal could
occur when some control inputs are active-high and others are acti\e-Io".
Acti ve-low inputs typi call y exist when the internal design of the component requires
fewer gates when implemented with an active- low input than with an active-high input
3.6 SEQUENTIAL LOGIC OPTI MIZATI ONS AND 'TRADEOFFS
(SEE SECTION 6.3)
The earli er secti ons described how to design basic sequential logic. Thi section. "hicb
phys icall y appears in thi s book a Secti on 6.3. describes how to create bmer sequential
logic (smaller. fas ter, etc.) using optimi zation and tradeoffs. One use of !hi boo '
describes sequenti al logic design optimization and tradeoffs after inrro-
ducing basic sequential logic design. meani ng now. An altemati\'e use describes
sequenti al logic design optimizalions and tradeoff later. after completing the introduc.
lion of basic datapath components and RTL de ign (Chapters -4 and -).
3.7 SEQUENTIAL LOGIC DESCRIPTION USING HARDWARE
DESCRIPTION LANGUAGES (SEE SECTION 9.3)
This secti on. which phy icalJ y appears in thi book as Se ti n 9.3, lI1trodu . the_ use oi
HDLs for describing equenlial logic. One use of this book imrodu uch use ot
immediately after int roducing basi equential logi design. meaning nO\\ . An altemat]\e
use introduces such HDL use later.
3.8 PRODUCT PROFILE-PACEMAKER
A pn emaker is nn electronic devi e that pnl\ ides electrical stimulati n t hem to help
regulate 3 hean ' beating . .. teau ing 3 heart \\ hose natural
not worki nc properh . perhaps due to di.ease. ImplantJble pa III 'e '_ =
, . . ' h . F' an' \\ 'rn b\ el\ r I :: mill" ally placed under the '''' 0 \\ n III 'll"ni _' . . l .
mcricUlt'. The) nrc pl.l\\en:d b) J bJllcl) thm t,st tcn af' r nh:n!. Pa.: nl _
illlpnl\cd the qt;nlit) (1f hfe II.' \\ ell ,h l'llgth ned the li \c', f mJn\ nil II I,,", .'1
138
3 Sequential Logic Design-Controllers .
. I - (left and right) . The ve nlrt cles
. I ) 'md I WO venlrl C cs .
A heart has two atria (left and ng 11 , . ,' , the blood fr0111 the vein. A very
. "I he utna rece"e , .
ush the blood out to the artenes. whl e t I contraction in the heart s rt ght ven-
p detect a nalll ra . ' I ' f th
simple pacemaker has one sensor 10 " I t' 1111ulation to thm nght veOlnc e I e
" d I' er electnca s' . II ' d
. I d one output wire to e IV " d ' ' period- tYPIC" Y JU t un er one
tnc e, an . h" peclfi e li me "
nalUral contracti on doesn' t occur WIt In , s ct',on nO! only in the nght ventncle,
. . ses a cont ra .
second. Such electrical Sll mulallon cau
but also the left ventricle.
. localion under the skin (right). Counesy
Figure 3.68 Pacemaker with leads (Ieil ). and pacemaker
of Medtronie. Inc.
. . fa sim Ie pacemaker's control ler usi ng the FSM in
We can descnbe the behaVIOr 0 h P h pac' maker con i ting of a controller and
69 Th I ft ' de of the figu re sows tee .
Figure 3. . e e Sl . h the timer when t - 1. pon being reset. the
. Th " h n input t wh,c resets .
a umer. e umer as a . 8 d If the timer counts down to O. the lImer
timer begins counting down from id ;befOre rcaching O. in which case the timer
sets its output z to 1. II mer COhU . re t:rt counting down from 0.8 seconds again.
d t z to 1 and Instead t e lImer . . . h
ocs not se. h' h ' 1 when a contraction In the ng t ven.
The controller has an input s. w IC IS h' h the controller sets to 1 when the controller
tricle. The cOOlroller has an output p. w IC
wants to cause a paced contraction.
F,gure 3 69 A ba'lt pacemaker", ."nlloller
psI
t. o
3.8 Product Profile-Pacemaker 139
The ri ght side of the figure shows the controller's behavior as an FSM. Initially. the
COOlroll er reset the timer in state ReselTimer by setting t = 1. ormally. the controller
wa its in state Wail , and stays in that state as long as a contraction is nO! detected (5 ') and
the timer does not reach 0 ( z '). If the cont roll er detects a natural contraction ( 5), then the
controll er again resets the timer and returns to waiting again. On the other hand. if the
cOOlroll er sees that the timer has reached 0 (z = I), then the controller goes to stale Pace.
whi ch paces the heart by setting p= 1, after which the controller returns to waiting again.
Thus, a long as the heart Contracts naturall y. the pacemaker applies 00 stimulation to the
hean. But if the heart doesn' t contract naturall y within 0.8 econds of the last contraction
(natural or paced), the pacemaker forces a contraction.
The atri a receive bl ood from the veins. and contract to push the blood iDlO the "eotri-
c1es. The atri al COOlracti ons OCcur jusl before the ventricular contractions. Therefore.
many pacemakers. known as "atri oventri cul ar" pacemakers. sense and pace nO! just the
ventri cular contractions, but also the alri al contracti ons. Such pacemakers thus bave two
sensors, and two output wires for electri cal stimul ation. and may provide bener cardiac
output, with the desirabl e re ult being higher blood pressure (Figure 3.70).
Inputs: sa, za, SV, zv
Outputs: pa, la. pv. tv
la=1
Figure 3.70 An atriovenlrieular pacemaker'S contrOller FSM (usi ng the comenoon thaI FS)\
OUIPUIS nOI explici lly sel in a Slale arc implieili, sel 10 0).
The pacemaker has two ti mers. one for the right atrium (TimerA) and ne for th ...
ri ght ventricle (TilllerV). The comroller initiall) resets TimerA in tate Re etTunuA. and
then wailS for a nat ural atrial contra tion. or for the timer 10 reach O. If the xmuoller
detects a natural at ri al contracti on (sa). then the ontroller skips pacing of the On
the other hand. if Tilll erA rea he 0 first. Ihen th ... :omroller gO<!' to 'tate Po -eA. hich
causes a contraction in the atrium bv setting pa- l. After no atrial 'ontra -non ( ... trW
natural or paced). the c ntro/ler reset' Timer! ' in ' Iate ResnTimul: and then \\ail> for
nat ural ent ricular contraction. or for the timer to O. If a n"rural , ... ntnculJ.r u'fltr.lC-
tion occurs. the contmllcr skips plICing of the \enmde. n the other hand. If n a\
reaches 0 first. then the controlla gO<!, to ,t.ue Pace I : \\ hich .IU_ , a :"'ntr. ' tbn '"
ve nt ricle b sett ing pv - 1. The ontroller then to th am,ll ,tat" .
lost modcm -an h:1\ e the tim'r pam111erel"' pn,'gr.lflUlk-J 1 I
thrOl lch r:ldio sielln" ,0 that JoctOI"' can u: Jlfli.'rcnt \l1thL'Ut tl' ' u :1-
call) ;c11Io\,e. and the
d
Sequential Logic Design-Controll ers
This example demonstrates Iho usefulness or FSMs in describing a com rOller's
behavior. Real pacemakers have controllers wilh lens or even hundreds or Slal eS 10 deal
wil h ""ri ous details lhal we left Oul of Ihe exampl e ror simpli cil Y
With Ihe adve nl of vcry low-power mi croprocessors. a trend in pacemaker design is
lhm or implemenling Ihe FSM on a mi roprocessor ralher than wllh a custom scquenllal
ci rcui!. Microprocessor impiel11clll ali on yields Ihe advanwge or easy reprogramming or
lhe FSM. expanding the range of treatmenl s Ihat a doclor can expeflmenl wllh.
3.9 CHAPTER SUMMARY
Secli on 3. 1 introduced Ihe concepl or sequenlial circuilS. namely circuil s thai slore bits,
meaning the circuils have memory. known as 5(3IC. Secti on 3.2 developed a series of
increasingly robusl bil storage bl ocks. including Ihe SR lalch. D lalch. D nip-nap. and
finall y a register. which can store muliipl e bil s. The seclion al a introduced the concept of
a clock. whic h synchronizes Ihe loads or registers. Seclion 3.3 introduced fin ite-state
machines (FSMs) for capluring the desired behavior of a equenlial circuit. and a slan-
dard archileclure abl e 10 implemenl FSMs. Wilh an FSM implemenl ed using the
archi lecture known as a controll er. Seclion 3.4 then descri bed a fi ve- tep process for con-
"ening an FSM 10 a cont roll er implementati on. Secli on 3.5 highli ghl ed some types of
flip- fl ops Olher lhan Ihe D flip-fl op. Ihose olher Iypes being popul ar in the past. Thai
seclion also desc ribed several liming issues related 10 Ihe use or flip-fl op . including setup
lime. hold lime. and metastabilil Y. The secli on introduced asynchronous clear and sel
inputs to nip-flops. and described their usc for inili alizing an F M to il initial tate.
Secl ion 3.8 highli ghled a cardiac pacemaker and illu trated the u e of an FSM 10 describe
lhe pacemaker' s behavior.
Designi ng a combinational circuil begi ns by capluring Ihe desi red circuit behavior
using either an equalion or a lrulh table. and lhen foll owing a everal slep process 10
convert Ihe behavior 10 a combi nalional ci rcui!. Designing a equenlial ci rcuil begins by
caplUring the des ired circuil behavior as an FSM. and then foll owing a cveral step
process to convert the behavior lO a circuil consi ling or a register and a combi nali onal
circui!. known as a controll er. ConceplUall y. then. wi th the knowledge in Chapler 2 and
3. we can build any digital circui!. However. many digital ci rcuil deal wilh inpul data
many bits wide. ; uch as five 32-bit inpul5. Imagi ne how complex ur equal ion . lruth
tables. or FSMs would be if they involved 5"32 = 160 inpul' . Fortunalely. components
have been developed specifically 10 deal wi th data inpuls and Ihus the de ign
process--components Ihal will be described in lhe chapler.
3.10 EXERCISES
Any problem nOled wilh an a\icri,k (0) reprc,enL' an e pecl3l1) chnilenglllg problem.
SEc.-no ' 3.2: STORI NG ONE BIT- FUr', f'LOl'S
3. 1 Compule Ihe clocl period for Ihe folil)wlIlg cJocllrequcnClc,
(J) ')0 lHI (Cilfly compule"l
(hi lfJO MHI (Son} Pld)'IJ"'1Il 2 pre""'''''1
Ie) 1 Glil ({nl el Pcnllum 4 prll"t Or)
(d) 10 GHz (PCs of Ihe earl y 2000s)
(e) I THz ( I lerahcnz)
3.2 Compule Ihc clock . r'
(a) 32.768 kHz pe ,od for the foll owing clock frequencies.
(b) 100 MHz
(c) 1.5 GHz
(d) 2.4 GHz
3.3 Compute Ihe clock fr
(a) I s equency for the foll owing clock periods.
(b) I ms
(c) 20 ns
(d) I ns
(e) 1.5 ps
Compule Ihe clock r "
(a) 500 ms requency or the following clock periods.
(b) 400 ns
(c) 4 ns
(d) 20 ps
3.1 0 Exercises 141
3.5 *Assume scienli sts have devel oped a t hO h -
lance, meaning signal s w'lh' h' . Ip 3vmg perfect transi tors and "ires 'With no resis-
I In t IS chip can tra\'el at lh peed f . '"
second. Assuming OUf digital circu't h 'dth e 5 0 hghl. or 3xlv- meters!
th I k ' as a w, oP - mm and a h'!!h f -
e c oc . period and cl ock frequenc 'h th - . e,_ t 0 mm. compute
a single cl ock period is: y. \\ ere e longest dl lance an) signal must r3\"cl
(a) one-eighlh of the wi d,h of the circuil -
(b) one-half the hei gh, of the circuit
(c) lhe widlh of Ihe circuit
(d) diagonally across the circuit
(e) Ihe perime,er of lhe circuit
3.6 Trace Ihe behavi or of an 5R latch for lhe followino . .
for a long time. then 5 chanaes 10 I and Slluaoon. Q. . and Rare 0 and !la,,, be<on
Using a liming diagram. the \'3Jues there for :1 (i,me. then ch:mg ck to O.
Assume logic gates have a tiny but nonzero on e\er) "lre for c\el') change 00 3 \\"Ire..
Qs 3.7 Repeal Exerci se 3.6. but aSSume thai S "'as .'
P
rop h hanged to I just long enough for " 1!OJJ _ 10
ag.atc l rough one logic gate. after \\ hich -
nOI sall sf), Ihe hold ,ime of the lalch. \\as changed back to O--in other \\ords. did
Gs 3.8 ;f"JCC the behavior of a level-sensili'e 5R la'ch (see Figure , th .
"gure 3.7 1. Assume 51. RI. and Q are inilialh 0 Co - .'. or e '"pllt p"ttem m
logic g3l cs have a tiny but nonzero . ' mplete the nmmg
c
__ ______ _
A ____ ____
SI '-----
Al
Q
Figure 3.71
142
. I L 'c Desig n- Controll ers .
Sequentl. ogl . > 3 for the input patt ern 111
.. SR I teh Figure. . .' . .
T ,. h behavior of a level-scnslt l\'C ... a 0 COI npktc the timing diagram. assummg
.\.9 ract: I e Rid Q arc .
Fiourc 3.1'1. S I. . an I
gates have a tiny but nonzero de la).
----,
C
S
n
n
R
Sl
R1
Q . . d' om!1l fo r Exercise 3.9
Fi ure 3.72 SR latch input pattern tIIll1ll g 13,::='
g F' ] for the "'put pattern 111
. '. SR latch (sce Igure.. . .
T h behavior of ;1 levcl-sensIU\ C C Ictc the ti min" dIagram. assuming
,\.10 race t e , 51 RI and Q are ini tiall y O. OI1lP c
Figure 3.73. Assume .' dela '
.1. 11
logic gates have :l liny but nOll lcro ).
C
S
R
Sl
R1
a
n
n
n
. . di agrnm for Exe ise 3. 10
Figure 3.73 5R lalch input pattern II mlllg
ure 3 I ) for the input pattern in Figure 3.74. Assume Q
Trace the beha"ior of a D latch (see Fig . . I gic g.te haH'" ti n), but nonzero delay.
is inilially O. Complete the liming diagram. assuming 0
I I
C L--J '-----
____
S
R
a
Figure 3.74 D latch input pattern timing diagram for E<crcl'e .1 II
Fi J 18) (or the IIlPUI p.llern III Ftgure J.75. ume Q
C' .1.12 Trace the behavior of a D latch (\ee Igure : logIC gate, h",c ,I tin) but nonlero del.).
P L U'S " initiall) O. Complete Ihe IImlllg dtagrnm. as\um"'g
C
D -.fIL-____
S
R
o
Figure 3 75 0 lilkh ",put pJttern IlIning diagram f .. r r:"",'C 1 12
t
3.10 Exercises 143
3.13
Trace the behavior of an edge- tri ggered D Ri p-Rop using a master-servant design (see Figure
3.24) for the input pattern in Figure 3.76. Assume each internal latch initially stores a O. Com-
plete the timing di agram, assuming logic gates have a ti ny but nonzero delay.
L-J ____ _
D/Dm r--J n
Cm
Orn/Ds
Cs
as
__________ ______ _
Figure 3.76 Edge- triggered D Rip-Rap input pattern timing illagram for Exercise 3. 13
3. 14 Trace the behavior of an edge-triggered D Rip-Rap using the master-servant design (see Figure
3.24) for the input pattern in Figure 3.77. Assume each internal latch irtitially stores a O. Com-
plete the liming diagram. assuming logic gates have a tiny but nonzero delay.
3.1 5
C
D/Dm
Cm
Orn/Ds
Cs
as
Figure 3.77 Edge- triggered D Hip- Hop inpul pattern timing diagram for Exercise 3.1 4
Compare the behavior of D lalch and D Rip-Rop devices by completing the timing illagram in
Figure 3.78. Assume each device initiall y stores a O. Provide a brief explanatioo of the
behavior of each device.
C--.J L-1
__
a (D latch)
a (D fli p-flop)
L
Figure 3.78 D I31Ch and D flip-Rap input pattern ti ming illagram for E.lereise 3. 1
3. I 6 Compare the behavior of D latch and D Hi p- Rap de' ice by completing the timing di8gram in
Figure 3.79. Assume each device initiall) stores a O. Provide a brief explanation of the bdla,"1OI"
of each de ice.
C
D ____ ---'
a (D latCh)
a (D flipllop)
Figure 3.79 D latl'll und D tli p-Ih,p '"I ut p,mern ttnllng dl'l!)rnm f ..... E.n:" _1, I
s
3 Sequentia l Logic Design- Controllers .
. . ni ches connected in ( the output of one IS can
C
. , ' , of three Icvcl-senslll vC D I. ' . h long hi gh-li me can cause the value
3. t7 r ealc a Cl feUI h how 3 cl ock Wi l a . I k
t d 10 the input of the next ). ow h orc th'tn one Intch dUring the same c oc
nee e h 'cklc throug III
at the input of the fi rst D late 10 Ln
cycle. . fl . ' lid , how how the input of the first D
. d t ."' red D fli P' op, ." . I k ' I ' h' h
3 18 Repeal Exercist: 17 uSlIlg c gc- . f1 0 maller how long the C OC signa IS Ig.
. latch does nol tri ckle through to the next fllp- op n
3. 19 sin2 D fl ip-flops. creatc tI circuit a3 a2 a 1 aO
wi th input X and an output Y. such
that Y always equals X ddayed by
"I Ti l
t wO clock cycles.
.'.20 Using four registers. design a
lhal stores the previous four
seen at an 8-bil input D. The circuit
should have a single Sbit output that
can be configured using IWO inpu tS 5 I
and sO to output anyone of four
registers. (Hint: use an 8-bu ..h: 1
mux.)
3.21
c
-
13 12 II 10
t>
reg(4)
03 0201 00
b3 b2 b1 bO
13 12 11 10
reg(4)
030201 00
II I I.
c3 c2 cl cO
I I I 1
13 12 11 10
reg(4)
03 02 01 ao
J J J 1
d3 d2 dl dO
Consider three registers con-
nee ted together as shown in
3.80. Assume the initial values In the
registers are unknown. Trace the
behavior of the registers by com-
pl eti ng the Liming diagram of Fi gure
3.81.
Figure 3.8lI Register confi guraLion.
C
b3 .. bO
c3 ..cO
d3 .. dO
Figure 3.81 4-bit reg"ter input pattern timing diagram for Exerci,e 3.21
. d ether a< ,ho" n In Figure 3.83. Assume the initial
3.22 ConSider three 4-bit registers the behaVIOr of the reg"te" by ompleting the
vaJues in the regl.sler) arc un nOwn.
liming diagram of Figure 3.82.
C
b3 bO
c3 ..cO
d3 .dO
Figur.3.82 4-DIl reg"tcr Input pallern IImln8 ding".", rnr F\c""" \ 22
3.10 Exercises 145
SECTION 3.3: FINITE-STA TE MACHINES (FSM) A D CONTROLLERS
3.23 Draw a state diagram for an FSM thai has an input X
and an OUlput Y. Whenever X changes from 0 to I , Y
should become I for two cl ock cycles and then return
to O-even if X is sti ll I. (Assume for thi s problem
and all other FSM problems that an implicit ri si ng
clock is ANDed with every FSM transition condiLion.)
3.24 Draw a state di agram for an FSM with no inputs and
three outputs, x, y. and z. xyz should always follow
the foll owing sequence: 000. 001, 0 10. 100. repeal.
The output shoul d change onl y on a ri sing clock edge.
Make 000 the initial Slate.
3.25 Do Exercise 24, but add an input I that can stop the
sequence when sel to O. When input I returns to I . the
sequence resumes from where it left off.
3.26 Do Exerci se 25, except the equence starts from 000
whenever I returns 10 I.
3.27
A wriSlwatch di splay can show one of four it ems: the
time, the alann. the stopwatch. or the date. controlled
by two signal s s I and sO (00 displays the Lime. 0 I the
alarm. 10 the stopwatch, and II the date-assume
a3 a2 a1 aO
c
d3 d2 dl dO
figure 3.83 Regi ter configurntioo.
s I sO control an -bit-wide mux that passes through the appropriate regi ter). Pressing a
butt on B (which sets B = I ) sequences the di spl ay to the next item (if the presentl) dis-
played item is the date. the next item is the current time). Create a state dia!!J'llID for an
descri bing thi s sequencing behavior. having an input bit B. and 1"0 oUlp;t bilS 1 and sO.
Be sure ( 0 onl y sequence forward by one item each Lime the bUllon is pressed regardl of
how long the bUllon i pressed-in other words. be sure 10 wait for the bunoo to be relea..'>ed
afrer sequencing forward one item. Use shan but descriplh-e names for each ute. :\.faki!
di splaying the time be the initial stale.
3.2S
Extend the state diagram you created in Exercise _7 by adding an input R. R= I
FSM to return to the state Lhat displ ays the Lime.
3.29 Draw a slate diagmlll for an FSM with an input 'em and three outputs. ..t' and :. The t:' ..
outputs generate a sequence called a Gray code in \\ hi b exactly one of the three oulpUlS
changes from 0 to I or from I to O. The Gray code sequence that the FSM should ""tpUt is
000. 0 I O. 0 II. 00 1. 10 1. I II , 11 0. 100. repeal. The output should bange 001) on 3 rb'J11g
clock edge when the input gem = I. Make the initial tate 000.
3.30 Trace through the exccution of the FSM ),ou created in E,<ercise 19 b) mpletil1 the nnun);
diagmm in Figure 3.84. where C is the lock inpul and is the o-bit f'e!!lSttt. AssUlDe'
is initially 000.
genl
c
s
Figure 3.84 F M input pattem tinlln!! fN \ <1\'1 .. . \0
c
Sequential logic Design- Controllers
", " FSM in Fi ,ure 3.85. >ueh that the FSM ;tart; in state Wail.
, H Dr.1\\ a t!ll1m, di agram lor tht: I I bch'lVior of the circuli III Engli sh.
.. S13h: EN, and returns to \\'ail. Dt:scn c I le
a;1
en;O
a;O
en;O
Figure 3.85 FSM for Exerci se 3.31
Inputs: s,r
Oulpuls: a.en
en=1
. . be- f tates indicate the srnall est possible number of bilS
"\ l' For FSi\I s with the follOWing num rs 0 5 .
. .. - for:l st.:lIe register representing those stJtes:
(a) 4
(hI 8
(c) 9
(d) 23
(e) 900
3._'3 How many possibl e states can be represenled by a 16-bi t register? . . .
3 If an FSM has N tates. what is the maximum number of tranSlllons thai
.. in the FSM (assuming there are a large number of inpuls. meaning the number of lranSlllOns IS
nol limited by the number of inputS)?
3.35 .Assuming one inpul and one output. how many po sible four-statc FSM exist?
3.36 . Suppose you are given twO FSMs that execule .. an approach for
merging those two FSM into a ingle FSM with identical funclionalllY as the two epara"
FSM . and provide an example. If the fir.it FSM has , Iates and the sccond has M states. how
many tate will the merged FSM have?
3.37 Sometimes dividing a large FSM into t,,o , mail er
FSMs resul tS in si mpler circuitry. Di vide the F M
shown in Fi gure 3.88 into two FSMs. one contaming
GO-G3. the other containing G4-G7. You may add
addilional Mates, transitions, and inputs or outputs
between the two FSMs. as required. Hint: you will
need to mtraduce signal; between the FSM, for one
FSM to tell the other FSM to go to some state.
SECTION 3.4: CONTROLLER OESIGN
UX U"ng the fi,e-step processor for de"gl1lng 2 con-
troller. con, ell the FSM of Figur. 3.86 10 a
controller. Implemenlmg the controller u<lng a lUte Fi gure 386 F
regISter and logiC gate .
3.39 Using the five-step processor for designi ng a con-
troll er. can veil Ihe FSM of Figure 3.87 to a
impl ementing the controller using a stale
regISter and logic gates.
3.40 Using the five-slep process for designing a con-
troller. can veil the FSM you created for Exercise 24
to a implementing (he controller using a
stale register and logic gates.
3.41 Using the five-slep process for deSigning a con-
troller. convert the FSM you created for Exercise 27
3.10 Exercises 147
y=l
to a controller. implementing the controller using a Figure 3.87 FSM for Exercise 339
Siale register and logic gales.
3.42 Using Ihe five-step process for designing a controller. canvell the FSM you created for Exer-
29 to a COntroll er, implementing [he controller using a stale register and logic gates.
3.43 Usmg the five-Slep process for designing a controller. convell the FSM in Figure 3.88 to a
.. Slopping once you have created the state table. Note: your state table will be quite
arge. havmg 32 rows-you might therefore want to use a computer tool. like a word pr0-
ceSSOr Or spreads heel. to draw !he table.
xyz=110 xyz=OlO xyz=Oll X}'Z=111
Figure 3.88 FSM for Exercises 3.37 and 3,43.
Create an FSM Ihat has an inpul X and an output Y.
Whenever X changes from 0 to I. r should become I
for five clock cycles and then relurn to O--even if X is
slill I. Using the five-step process for designing a
controll er. convell the FSM to a controller. stopping
once you have crcnred the Siale table.
3.45 The FSM in Figure 3.89 has two problems: one state
hn two lr.lnsitions whose condition ('Quid
neausly c\'nlu3Ie 10 lllIc. and another states has
lransistions that aren't gunrnnleed (0 hu\'c at leas( one
of Ihe tmnsition conditions true. By ORing and
ANDing Ihe condi tions for each stnte's tr.lnsitions.
prove that these problems exist. Then. fix these prob-
lems by refining the F M. taking your best gue. < .s
(0 whnl the F creator's imcllt.
xyz=101
lnputs:g.r
Outputs: x.y.z
xyz=001
148 Sequential Logic Design-Controllers
. I circuil shown in Figure 3.90.
3..16 Reverse engineer the behavior of the sequcnlla
Combinational logic
o
COl
Ul
51 sO
(al ci rcuit 10 be reverse engi neered.
Figure 3.90 A sequen I
SECTION 3.5: MORE ON FLIP FLOPS AND CONTROLLERS '.
. d shown in Figure 3.92. Trace lhe behavIor of the flIp-
3.47 Consider lhree T fllp-flopsconnecl.e e as in Fi eure 3.91. Assume all the flip-flops initially
flops by compleLing the umJng dlCloram 0
contain 0 5, ________________________ _
T
C
01
02
03
Figure 3.91 T flip-fl op input panem timing di agram for Exercise 3.47
3.48 Show how to conneCl four T fl ip-fl ops
together to create a circuit that T
o to 15 in binary and back to 0 agaJO- JO
other words, that counLS 0000. 000 I, 00 I 0,
.... 11 11 , and back 10 0000 agai n. Hint: con-
T
001 T
sider usi ng the Q OUlput of a flip-fl op as the C ___ -<l>-___ ....J
clock input of another flip-fl op. Assume all Figure 3.92 Three T flip-fl ops.
lhe flip-flops in itially contain Os.
3.49 Define metastabi lity.
O II ' h 4 b' t state register that gets synchronously initi alized to state 1010 3.5 DeSign a cantro er wll a - I
when an input resel is SCI to 1.
3.51 ' Design a D nip-fl op with asynchronous reset, AR. and a,ynchronous set, AS, inputs using
basic logic gates.
DESIGNER PROFILE
Brian got hi s baChelors
degree in Electri cal
Engineering and then
worked for several
years. Realizing the
future demand for digi tal
design targeting an
increasingly popular
type of digi tal chip
known as FPGAs (see
Chapt er 7), he returned to school to obtain a masters
degree in Electrical Engineering with a thesis topic
targeting digital design for FPGAs. He has been
empl oyed at two different compani es, and is now working
as an independent di gi tal design consultant.
He has worked on a number of projects. including a
system that prevents house fires by tripping a circui t
breaker when current running in the circuit indi cates
arcing is occurring, a microprocessor architeclUre for
speeding up the processing of di giti zed video, and a
mammography machine for precise location detection of
tumors in humans.
One of the proj ects he has found most interesti ng was a
baggage scanner for detecting explosives. "In that system.
there is a lot of data being acqui red as well as motors
running, x-rays being beamed, and other things
happening. all at the same time. To be successful. you
have to pay anent ion to detai l, and you have to
communi cate wi th the other design teams so every one is
on the sa me page." He found that proj ect parti cularl y
interesting because "1 was worki ng on a small part of a
very large. complex machine. We had to stay focused on
our part of the design, while at Lhe same time being
mindfu l of how all the part s were going to fit together in
3. 10 Exercises 149
the end." Thus, bei ng able to work alone as weil as in
large groups was imponant. requiring good
communicati on and team ski lls. And being able to
understand not onl y a part of lhe system, bUl also
important aspects of the other parts was also important..
requiring knowledge of diverse topics.
Brian is now an independent digital design
something that many electrical engineers, computer
engineers. and computer scientists choose to do after
getting experience in lheir field. "I like the flexibility that
bei ng a consultant offers. On the plus side. I get to work
on a wi de variety of projecLS. The drawback is that
sometimes I onl y get to work on a small part of a
rather than seeing a product through from stan to finish.
And of course being an independent consultant means
there's less stability than a regular position at a company,
but I don' t mind that "
Brian has taken advantage of lhe flexibi lity provided by
consulting by taking a part-time job leaching an
undergraduate digital design course and an embedded
systems course at a university. "I really enjoy leaching
and I have learned a 10l through teaching. And I enjoy
introducing students to the field of embedded systems."
Asked what he likes most aboul the field of digital
design, he says. "I like building prodUCLS that make
people's lives easier, or safer, or more fun. That's
sati sfying."
Asked to give advice to students. he says that ODe
imponant lhing is "to ask questions. Don'l be afraid of
looki ng dumb when you ask questions .t a new job.
People don't expect you to know everything, bUl they do
expect you to ask questions when you are unsure.
Besides. asking questions is an importanl part of
learning."
150
4
Datapath
Components
4.1 INTRODUCTION
. . increasinoly complex building blocks Ihat can be used to
Chaplers 2 and 3 II1lroduced . 0
1
d diaDic o'lles mul!iplexors, decoders, basic
build digilal circui ls. Those blocks IOC u e d fa; implementing systems havi ng
. d fi II lLroliers Controllers are goo
reglSlers, an na y cal. .' 1 d eneralino some number of control output sig-
b f antral Inpul Slona S an go.
some num er a co. I . I become 1 (correspond 109 perhaps
F I
'f see a part icul ar conlro II1pU
nal s. or examp e. I we ate a 1 on a control output (corre-
b
. . . d) Ihen we may want 10 gener, ,
to a bUllon ell1g plesse,. I ' ' h ler we inslead focus on creating
. I liohl !lIrnln0 on) In I liS C ap ,
spondll1g penaps 10 a 0 d I havi no dara inputs and outputs. In general,
bui lding blocks Ihat are goo or sys el . '0 II )'
digital ;ystems have IWO Iypes of inpuls (and oUlpUIS as we .
I
. . Iypi call y one bil, representing a part"icular event
Control' A contra InpUI IS ' . .
OUlside Ihe system. li ke a bUllon being pressed,. or representing a panic-
o h' 'de the system like a door being closed or a car bemg
ular state of samet mg OUtSI , " .
at an intersection. Control inputs could sometimes be grouped 11110 mullJple bus-
.. . h' h f 16 bUllon is pressed, or 2 bits representing each
ilke 4 bits represenllng w IC a
of 4 possi ble states of a door (closed. open 113rd, open 2/3rd,. or fu ll y open),
. . II used directly to influence a controll er s present state.
Control II1pUtS are typlca y
Data: A data input is typically multiple bi ts, coll ect ively a single
. F I 32 b'lt input may represent a temperature In binary, A 7-bu
entlly. or examp e. a - . . 00 ft '.
. tthe present floor locati on of an elevator In a I - oar bUlldmg,
Input may represen . . "
d
b s gle bit differino from a slOgle-bll control Input 111 that we
A ala Input may e a In I 0 ,
don' t directly rely on that bit's value to influence the controller s present state.
Not all input can be strictly classifi ed as ei lher comrol or are some inputs
thai fall somewhere on the border in belween the IWO Iypes. BUI most Inputs can be clas-
sified as one or the other. (And. of course, a digi tal ystem also has power Inputs, ground
inputs, and clock inpuls too, in addition to conlrol and data inputs.) .. .
Coni rollers are a good building block for buildi ng systems conslstll1g mall1ly or
comrol inputs and cOlllrol OUlputS. But we also need building block. for systems con
si . ting of data inpuls and OUlpUIS. In particular, we need registers 10 hold the data, and
functional unilS to operale on (e.g. add or di vide) Ihe daw. Such component are known
4.2 Registers 151
as register-transfer level (RTL) components, also known as datapath components. and a
Circuit composed of such componenls is known as a datapalh.
Datapalhs can become quile compl ex, and Iherefore il is crucial to build datapaths
:rom. a SCI of dalapalh componenls Ihal each encapsulale an approprialely hi gh level of
uncllOnalll y. For example, if you were asked whal components make up an aUlomobile.
you wou ld probably li sl components like an engine, tires, a chassis. a body, and so on.
Each of Ihose componems encapsulares a high-level function of the automobile. You
thought of a tire, nOi of Ihe rubber, slee! wires, valve stem, valve, sidewalls, and oiller
parts thai make up the lire. Those delai led pans make up Ihe design of a lire. nOI an aUIO-
mobi le. A tire is an appropriately hi gh level of componem when thinking of a car; a valve
stem IS nol. Likewise, When we design dalapar hs, we mUSI have a set of dalapath compa-
nems aI Ihe appropriately hi gh level- logic gales are 100 low-level.
This chapl er defines such a sel of datapalh componenlS. and also inLroduces simple
dalapal hs. In Chapler 5, we' ll see how 10 create more advanced darapalhs. and how 10
combine datapat hs and Controllers 10 build an even higher-level componem known as a
processor.
4.2 REGISTERS
An N-bit register is a sequemi al componem able 10 store N bils. Typical regi ter width
(the number of bit N) are 8, 16, and 32 bits, though any width is possible. The bilS in a
register often represenl data, such as 8 bils represeming a lemperature as a binary number.
The common name used for storing data imo a register is loading, although tbe words
writing and storing are also used. The opposile aClion of loading a regi ler is known as
reading a register's coments. Reading consisls merely of connecting to the regi ler's
outputs-note thai reading therefore i not synchronized with Ihe clock. and funherrnore.
nOle Ihal reading does nOi remove the bils from the regi ter or change them in any way.
Regislers come in a variely of slyles. We'll introduce some of the mOSI common
slyles in thi s secli on. Registers are perhaps the most fundamemal dalapath campanelli. a
we will provide numerous examples of their design and their use.
Parallel load Register
The mas I basic type of regi ster, shown in Fi gure 3.30 in Chapter 3. cons iSIs of a
set of flip-flops that gel loaded on every clock cycle. Thai ba ic regi ler is useful as the
stale regi ter in a coni roller, since the state register is loaded on every clock cycle. Ho\\-
ever, for most other uses of registers, we walll ome way 10 control whether or nOI a
regisler gets loaded on a particular clock cycle--{)n some cy les we wanl 10 load.
whereas on other cycles we j usl wanl 10 keep Ihe previous value.
WHY THE NAME "REGISTER"?
Hi sloricall y, the term "regisler" referred 10 a sign or
chalkboard 01110 which people could lemporarily wrile
OUI cash lransactions. and later perfonn bookkeeping
using those transactions. The tenn generally refers to n
device for sloring dntn. In Ihis contex!. -inee 3
collection of Hip-flops stores data. the register
seems quile nppropri3le.
152 Oatapath Components
o
11
'0
'" !2
. ( ) . al desian (b) palhs when 1 oad=O and 10ad=l ,
Figure 4.1 4 bil parallel load register: a mtern 0
and (e) regi ster block symbol. .' .
. I I d' g of a reoister by adding a 2x I rnuluplexor In front
We can achieve contra over oa In .0 . 1 d' .
c h 4 b't reoister In F,oure 4.I(a). Whe n the oa sIgnal IS 0
of each flip-flop as shown ,or t e - I 0 " 0 I h . F'
. ' . I fl' fl 0els loaded with its own va ue. as sown m Igure
and the clock signa l rISes, eac l iP- op" ., d
. h fl ' fl . resent contents. the register S conte nts a not change
4 I (b) Because 0 IS t e IP- op s P . . . h fl ' fl
. '1 d ' 0 Wh the load sional is I and tile clock signal nses, eac [P- op gets
when oa IS. en 0 . I d d .
loaded with one of Ihe data inputs 10. I I , 12, or I 3- thus, the regIster gets oa e With
Ihe data inputs whe n loa d is 1. ".
A reoi ster with a load line that control s whether the register IS loaded With mputs,
. II h . b' I aded in parall el is known as a parallel load reglSler. Figure
Wllh a t ose Inputs elflg 0 .
4. 1 (c) provides a block symbol for a 4- ...----------- - - --,
bil parall el load register. A block
symbol of a component shows a compo-
nent' s inputs and outputs. wi thout
showing the component 's internal
detail s.
Because regi ster are such a funda-
mental component in datapaths, we
present a number of examples
involving registers. to ensure the reader
gai ns suffi cient comfort with registers.
EXAMPLE 4.1 BaSIC example uSing registers
Figure 4.2 show, a simple conneclion of
Ihree regi slers RO. R I. and R2. Suppose we
are laid Ihal Ihe inpul values on a3 .. aO
have Ihe values shown in Ihe liming Figure 4.2 Bn,ic regisler example.
EXAMPLE 4.2
4.2 Registers 153
in Figure 4.3(a). We can Ihen delermine the values in regislers RO. RI. and R2. as shown in
-'gurc 4.3(b). Before the fi rst clock edge. we do not know the values in the registers, so we show the
registers' contents as "????" The Contents are actuaJly some combination of four 0 and 1 vaJues but
we don't know what those parti cular values are.
Before the fi rsl clock edge, we are given that a 3 .. a a become 11 11. Thus. on the first clock
edge, RO will be loaded with III I . AI the same momenl, RI and R2 wi ll be loaded with the value
in RO, whi ch is " ???? ," so R I and R2 will still have contents of"? ?? ?."
n n n n
(a) 11 :2
.& a3 .. aO --l-l-l- l --i.I-X 0001 i X 1010 !
------------ --- --- i------- --- ----'--- ----------.t- , _-_- _ _ - _- _- __ -_- __-_-_- __-_-_- __ __-_-_- __-_-_- __-_-_
RO ????
1010
Rl ????
1010
R2 ????
0101
(b)
: 1010
0101 :1101011 0101 1
c.;,:""" "'R O,2"-'-'! Rl R2
Figure 4.3 Basic register example: (a) timing diagram. and (b) the contents of each register.
Before clock edge 2. we are given that a3 .. aO change to 0001. Thus, on the second clock
edge. RO will be loaded wi th 0001. Simult aneously, RI wil l be loaded with the value of RD. which
was 1111, and R2 will be loaded with the value of RO inverted, meaning 0000.
Before the third clock edge, we are gi ven that a3 .. aO change to 1010. On the third clock
edge, RO wi ll be loaded with 1010, while simultaneously RI gets 0001. and R2 gets Ilia.
We are given Ihat a3 .. aO stay at 1010 before the fourth clock edge. On the fourth edge. RO
again will be loaded with 1010. while simult aneously RI gets 1010 and R2 gets a 10 I.
As a 3 .. a a stay at 1010 before the fifth clock edge. then on the fourth edge. RO again will be
loaded with 1010, whil e R I again gets 1010 and R2 agai n gets a 10 I.
The important feature 10 notice in this example is that the RO. RI . and R2 registers all ger
loaded siml/lralleol/sly. Thus, even though RO gets loaded with a new value on a clock edge. RI and
R2 gel the previous value. not the new value. on that same clock edge.
Weight sampler
Consider a scale at a grocery store used to weigh fruit. The scale may have a di splay that shows the
present weight. We want 10 add a second display. and a bunon that the user an press to remember
the present weight (sometimes called "sampli ng"). so that when the fruit is remo,'ed. the remem-
bered weight continues to be displayed on the second di splay. A diagmm of the system is sbown in
Figure 4.4.
154 Oatapath Components
Assume the scale the
present weight as n -I.-bit
!lumber. and the "Present weight
Jnd "Saved weight" di spl nys
matically convert their blll ary
number to the proper di splayed
v;J l ue. We cun design the Weight
Sampfa block using a -lobi! parallel
load rce islcr. \Ve connect the button
signal b to the l oad inplll of the reg-
The OUlplil connects to the
"Saved weight"' di splay. Whenever b
is L the weight va lue gelS loaded
into the register. and thus appears on
the second display. When b retunlS
to O. the register kee.ps its value. so
the second display conti nues 10 show
the same weight. even i f other items
are pJaced on the scule and the first
display changes.
Weight Sampler
Figure 4.4 Weight sampler implement ed using a 4-bit
parall el load register.
EXAMPLE 4.3 T m erature history display using registers (again) .
e p . . ,eraled a pulse on an Input C every hour. We
3 . whi ch a Ilrner gel ,
Recall Example 3.2 of Chapter . 111 t s and those registers were connected such
I k ' UIS of three regis er .
connected that input C to the c oc II1p 'perature the second register would get the
I d d / th the present ten .
that the first register woul d be oa e ".1 Id the te mperature before that one, on the rising
d h thi rd reolSler wou get .
previous temperature. an t eo. d connect any input other than a clock signal
. . e typIcall y a not
edge of C. However. In pracllce. w . W therefore redesign the system LO use a clock
. . clock IIlput e can
(from an oscillator) 10 a register S . . II 11 ad reoister. We could then connect the input C
signal as the regi ster clock inpul. by uSlllg e a 0
..... . shown III Floure 4.5.
to the load inputs of the registers. as 0 I er hour In fact due to the nature of how
. f . n be faster Ihan I pu se p . ,
The oscli lator requency ca ? Q rr Oscill ators o n page 102 in Chapter 3),
oscillators are made (see "How Does It Work .- ua z .
oscillalOr frequencies are usually at least in the k.ilohertz range.
b4 b3 b2 b1
b0r-
c4 c3 c2 c1 cO
a4 a3 a2 a1
ao __
,----
04 14 04
I-
--
r.;;.. 14 04 14
03 03 13 r---
--
03 13
12 Rc 02 -
--
02 12 Rb02
Ra
01 11 01 11 01
-- 00
00 10 00 10 .....
to
IT
C 1
newline Temperature History Slo,age
Figure 4.5 Internal de,ign of the TempemlllreNislorySlorag" componeni. using parallel load reglslers.
EXAMPLE 4.4
4.2 Registers 155
We must ensure that when the timer generates its hourl y pul se on C, the pulse is 1 for onl y one
clock cycle. Otherwise, the regi sters would gel loaded marc than once during a single pulse
(because during that pulse. multipl e rising clock edges would occur. and regi sters get loaded on
each rising clock edge). and so the present temperature would get loaded into two or even aU three
regi sters. We Can accomplish a single-cycle hi gh output by using the Same clock as input to the
timer, and then deSigning the timer's internal state machine to only sel C"'l for one slate-similar
10 how we set an output to 1 For exactl y three stales in Example 3.7 in Chapter 3.
Automobile above-mirror display using parallel-load registers
In Chapter 2, we described an exampl e of a system above a rearview mirror that could di play one of
four 8- bit inputs, T. A. I, and M. In that example, we ass umed the car's central computer was con-
nected 10 the above-mirror system usi ng 32 lines (4*8). Thirty-two wires is a lot of wires to have 10
connect from the computer to above the rearview mirror_ Instead, assume that the computer connects
to the above- mirror syslem usi ng 8 data lines (C), 2 control lines a 1 aD that specify which data item
presentl y appears on C (being T when alaO-OO. A when alaO-Ol. J when alaO-lO. and M
when a 1 a 0-11), and a load control line load, For a total of II line . ratherthan 32 lines. The Com-
puter can send the data items in any order. at any lime. The above-mirror system should simpl y SlOre
dal a items in Ihe appropri ale regisler (according to a laO) when the data items anive. and thus lhe
syslem needs four parall el-load registers in which to store each data item. The control lines a 1 a 0
wi ll therefore serve as the "address" thai tell s us which regis ter 10 load. As in the earlier example.
input s xy determine whi ch value to pass through to the 8-bit display OutpUt 0 (wi th xy sequenced
by the user pressing the mode button).
We can design the system as shown in Figure 4.6. The fi gure uses a popular "shonhand" nota-
tion that replaces a group of wires by a si ngle thicker wire having a slanted line and number
indi cating the number of wires in the group.
iO
8-bit
4x1
o
Figure 4.6 Above-mirror dis pl ay design. a I a O. set by the car's central computer. delennines
whi ch register to load Wilh C. whil e 1 oad-l enables such loading. y . which are !Odependent of
a 1 a 0 and are sel by the user pressing the mode button. dctennine \\ hkh register to output to the
di splay D.
The decoder decodes a 1 a 0 to enable exactly one of the four regbten;. The load line en3bl<$
the decoder- if 1 oa d is O. 110 decoder OUlput is I and so no register get. loaded. The multIpk\ 'r
pUrl of the system is the same as in the earlic.!f example.
156 Datapath Components
Let's !<iCC how thi s system works far a sample sequence of inputs. Suppose init il.l ll y that all reg-
. . , 0 and xy=OO. Thus. the di splay wi ll show O. Ir the user presses the mode button fo.ur
,Sters store s '. h I 0 I 10 II and back to 00. ror each press still dlS-
times the inputs xy wlil sequence t roug 1
la 0 (..:: ince all registers are Os). Now suppose that during some clock cycl e. car s computer
po) e ' 1 d-l d C=000010 10. Then register 1 wi ll be loaded wnh 0000 1010.
sets a 1 aO=01. oa - . an . . .
Since xy=OO. the di splay will still show the contents or regISter O. and thus the dISplay wil l show
O. Now. ir thc user presses the mode button. xy wi ll become OL and the dis play 11' 111 show the
decimal value of ree.i ster I 's 0000 1 010 value. whi ch IS len III deCimal., mod e WI))
clWI1e.c xy to 10. the display will show the cont ents of regi ster 2: whIch O. At any tlme ,ln the
fUlUr;. the car's computer can load the other registers. or reload I . with new val,ues. In any
order. Note that the i03ding of the registers is independent from the display of those reglsters.
EXAMPLE 4.5 Computerized checkerboard
Checkers (known in some countries as "draughts") is ont! of the world's most popular board
A checkerboard consists of 64 squares. formed from 8 columns and 8 rows. Each player starts
12 checkers (pieces) on the board. A computerized checkerboard may replace the checkers by uSing
an LED di ode) in each square. An on LED a checker 111 a square;.an
LED represents no checker. For si mplicity of the example. ignore the Issue of each player havmg hIS
own color of checkers. An example board is shown in Figure 4.7(a).
Figure 4.7 An el ectronic
checkerboard: (a) eight 8
bit regi !>lers (R7 through
RO) can be used to dri ve
[he 64 LEDs. using one
per column. and
(b) detail or how one
regi ster connects to a
column', LEDs and how
the value 10100010
stored in that register
would li ght three LEDs.
O LEO
(a)
e lit LEO
from from
microprocessor decoder
(b)
A computerized chcckerboard typically has a mi croprocessor that keep' track or where each
pi ece is located. moves pieces according 10 user cOlTImnnds or according to a checker-playing
program (whcn playing against the computer), keeps score, etc.
4.2 Registers 157
Notice that the mi croprocessor must set values for 64 bits one bit for each square However
inexpensive type of mi croprocessor used in such a device does not have pins.
IllI Croprocessor needs ex ternal registers to store those bits that drive the LEDs. and will write to
those registers One at a time. The microprocessor writes to the registers so fast.. though. that an
observer would probably see all the LEDs change at the same time. not noti cing that some LEDs
are changIng rmcroseconds earlier than others.
Let 's use one register per column. meaning we' lI need eight registers tOla!. as shown below
the checkerboard in Figure 4.7(a), with those registers named R7 through RO. Each register'S 8 bits of
to a parti cul ar row in the register's column. indicating whether (he respecti ve LED
IS .on or off , as shown in Figure 4.7(b). The eight regi sters are connected to the microprocessor. The
mi croprocessor uses eight pins (D) for data, three pins (i 2, i 1. i 0) for addre sing the appropri ate
register (whi ch is decoded into a load line for each of the 8 regi sters). and one pin (e) for the register
load line (linpl emented using the decoder's enable), ror a total or 12 pins-a number much more fea-
sible than 64 pins. To configure the checkerboard ror the beginning of a game. the mi croprocessor
would create the foll owing sequence of register wri tes shown in Figure 4.8.
clk
Figure 4.8 Timing diagram indicating an input sequence that can be used to initia1ize.
HOW DOES IT WORK? COMPUTERIZED BOARD GAMES,
Many of you have played a computeri zed board game,
like checkers, backgammon, or chess. either using
boards with small di splays to represent pieces, or
perhaps usi ng a graphics program on a personal
computer or website. The main method the computer
uses for choosing among possible next moves is called
lookahead. For the current configuration of pieces on the
board, the computer considers all possible single moves
that it might make. For each such move. it might also
consider all possible single moves by the opponent. For
each new confi gurati on resulting from possibl e moves,
the computer evaluates the configurati on's goodness, or
quality, and pi cks a move that may lead to the best
configuration. Each move that the computer looks ahead
(one computer move. onc opponent move, another
computer move, another opponent move) is cal led the
lookallead amount. Good programs might lookahead
three, four, five moves, or more. Looking ahead is costly
in terms of compute time and memory-ir each player
has 10 possible moves per tum. then looking ahead two
moves results in 10' 10 = 100 configurations to evaluate:
three moves in 10' 10' 10= 1000 configurations. four
moves in 10,000 confi gurations, and so on. Good game-
playing programs will "prune"' configurations that
appear to be very bad and thus unlikely to be chosen by
an opponent, just as humans do. [a reduce the
confi gurati ons to be considered. can examine
mi llions of configurations. whereas humans can onJy
mental ly examine perhaps a few dozen. Chess. being
perhaps the most complex or popular board games, has
attracted extensive attention since the early days of
computing. Alan Turing. considered one of the fathers of
Computer Science, wrote much about using computers
for chess. and is credited as having written the first
computer chess program in 1950. Howe\'er. humans
proved better than computer chess programs until 1997.
when IBM's Deep Blue computer defeated the reigning
world champion in a classic chess match. Deep Blue had
30 lllM RS-6000 SP processors connected to -I Ospecial
purpose chess chips. and could evaluate 200 million
moves per second, and hence many billi ns of m \ 'eS in
[) few minutes. Today. chess toumamenlS nOt only mat h
humans against computer but also progT':l.l1lS
against programs, many hosted b. the lnrern:ltionaJ
Computer Games Association.
(SoufC'e: Chess Hislf'I)', 8dt WaH),
158 Datapath Components
Shift Register
On (he first rising clock edge. RO
with 1010001 D. On the second nSJllg clock
cdoc. R I gets loaded wi th 01000101. And so
on- Arter eight clock cycles. the registers would
the desired values. and the board's LEDs
would be lit. as shown in Fi gure 4.9.
One thing we might want to do wi th a reg-
ister is shift the register's contents to the left
or to the right. Shifting to til e ri ght means to
move each stored bit one nip-nap to the
right. If a 4-bit register originall y stores
11 01. shifting ri ght would result 1tl 0110,
as shown in Fi gure 4. 10(a). We dropped the
rightmost bit (in thi s case a 1), and we
shifted a 0 into the left most bit. To bui ld a
regi ster capable of shi fting to the
conceptually need to connect the regtster s
Aip-Aops in the manner similar to that
shown in Figure 4. 10(b).
O lED
. litlED
Figure 4.9 Checkerboard after loading
regi sters for init ial checker posit ions.
Figure 4.10 Ri ght shift example:
(a) sampl e conteOl S before and
after a nght shift and (b) btl-by-btl
Reglslercontents
o 1 1 0 1 before shllt nght
Register contents
o 1 1 0 after shift fight
(b)
view of the shi ft.
(a)
W o' t able to shi ft to the ri ght as shown in Figure 4. 11. The register
e can create a re"ts cr . h s a ri ght shift on a risi ng
includes two control mput s, S h rand 5 h r _ , n. 5 r cause . .
clock cdoe whi le s causes the register to maintain its present value. 5 h r _, n tS the
bit that :e'want to shift into the leftmost register bit during a shift operati on.
Figure 4.11 Shirt regi' ter: (a) implementation.
(b) path' when S h 1. and (e) block symbol.
I>
03
I
02 01
I
(c)
00
I
EXAMPLE 4.6
4.2 Registers 159
Rotate Register
A rotate register is a Sli ght vari ation of a shift register in whi ch the outgoing bit gelS
shi fted back in as the incoming bit. So on a right rotate, the rightmost bit gets shifted into
the leftmost bit, as seen in Figure 4. 12.
0 1 Register contents
before shih right
1 1 1 0 Register contents
after shift right
(a) (b)
Figure 4.12 Right rotate exampl e: (a) register contents before and after the rotate. and (b) bit-by-bit
view of the rotate opcral ion.
Impl ementing a rotate register is achieved by modifying the design of Figure 4.11.
feeding the rightmost nip-nop output , rather than the 5 h r _ i n input. into the leftmost
mux's i 1 input. A rotate regi ster needs Some way to get va lues into the register--either
via a shift, or via parall el load.
Above-mirror display using shift registers
In Example 4.4. we redesigned the connecti on between a
car's central computer and an above-mirror di splay system
to reduce the number of wires from 32 down lO 8+2+ 1= II .
This bundle
should be
However. even II wi res is a JOI of wi res to have to run f eU' wires.
from the comput er to above the mirror. Let 's reduce the lIot ele\lefl
wires even further by using shi ft registers in [he above- wires.
mirror system. The inputs to the above- mirror system from
the car' s computer wi ll be one data bit C. two address lines
a 1 a D. and a shift line S h i ft. for a total of onl y 4 wires.
When the computer wants [ 0 wri te to Oll e of the abovc-Illjrror system' s registers. the computer will
set a 1 a 0 appropriately and will then set 5 h i f t to 1 for exactly eight clock cycles.
For cHeh of those ei ght clock cycles. (he computer wi ll set c to one bit of the -bit dara to be
loaded. starti ng wi th the least-signifi -
cant bit on the firs t clock cycle. and
ending with the Illost-significant bit on
the eighth clock cycle. We can thus
design the above-mirror system as
shown in Figure 4.13.
Note: this tine is 1 bit, rather than 8 bits like before
x y
t t
51 sO
2x4
8
iO
4",
dl
aO-... iO
il
Figure 4.13 Above-mirror di splay design using shift
regi sters to reduce the number of li nes coming from the
car' s computer. The computer sets a 1 a 0 to the desired
register to load. and then holds S h i 1 for eight
clock cycles, with C equaling the register contents bit-
by-bi t, one bit per clock cycle. resulting in the desired
register being londed with the sent 8-bit value.
al -... il
d2
e d3
8
8
8
d D
8
i2
i3
160 4 Datapath Components
HOW DOES IT WORK? COMPUTER COMMUNICATIONS IN AN
AUTOMOBILE USING SERIAL DATA TRANSFER.
Modem automobiles cont ai n dozens of computers
distributed throughout the car-some under the hood,
some in the dash, some above the mirror. some In.the
door. some in the trunk. etc. Running wires
throughout the car 50 those computers
communi cate is a chall enge. Thus. most aUlOma,bll e
computers communicat e seriall y. meani ng one bit at
a time, like the in 4.6, to
reduce the number of wires. A
serial communicati on scheme I n automobil es IS
known as the "CAN bus." short for Controll er Area
Network. whi ch is now 3n standard
defi ned by ISO (Int ernational Standards
Organizati on) standard number I 1898.
. riate reoister gels a new value shifted in during the next
When Shl ft-l. the approp 0 arallelload from eight separate inputs. but uti'
clock cycles. Thi s method achieves the same as a p
lizes fewe r wi res. nn of communication between di gital circuit s known as serial
Thi s example a .0 . al e data by sending the data one bit at a lime.
communication. in which the CircUi ts communl C
Multifunction Registers . ' "
. nn a variety of operations (al so call edjimcll olls), li ke load, shtft
Many registers can pe 0 ft Th egister user selects the presentl y demed operatIon
h'f I ft t t ri oht rotate Ie etc. e r . . .
S J t e . ro a eo, _ .' , now introduce some multifunctI on regIsters.
by setting the register' s control mputs. We II
Re 'ster with Parallel Load and Shift Right .
gJ .' . a reoister is that of both parall el load and shIft. We can
A popular combmauon of operatIOns on 0 " .
. . f II I load and shift right , the detail s of whIch are shown 10
design a 4-bll regIster capable 0 para . e .
Figure 4.14(a). Figure 4. 14(b) shows a block symbol of the regIster.
to
(a)
Figure 4.14 4-bi l register with parallel load and shift right
operations: (a) internal design. and (b) block symbol.
Notice that we used a 4x I mux, rather than a
2x I mux, in front of each flip-fl op, because each
flip-fl op can now receive its next bit from one of
three locations (the fourth mux input is unu ed).
The register has two control inputs, with the
control behavior shown in Figure 4. 15.
(b)
.1 .0 Operation
0 0 Maintain present value
0 t Parall el load
0 right
(unused - let's load Os)
Figur. 4.15 Operation labl e of a 4-bil
register wi lh parallel load and shift
right operali on\ .
4.2 Registers 161
HOW DOES IT WORK? WIRELESS AND USB COMMUNICATION
BETWEEN DIGITAL DEVICES.
Serial communi cati on between di gital device. such as
between personal computers. laplops. printers.
cameras, elc., is ubiquitous. The popular USB
interface is a serial communicati on scheme (USB is
short for U"i.'ersal Serial Bus) lIsed to connect
personal computer and ot her devices together by wire.
Furthermore. nearl y all wireless cOlllmuni cati on
schemes, such as WiFi and 81ucTaolh. use serial
communi cal ion. sending one bit 31 a lime over a radio
frequency. While data communicati on between devices
may be serial. compulations inside devices are
typicall y done in parallel. Thus. shift registers are
commonly used inside circuils ( 0 convert internal
parallel dal a into seri al data to be senl 10 another
device, Jnd to receive seria l data and convert that data
into parall el data for inlcrnal device use.
Let 's examine the mux and flip-fl op of the ri ghtmost bi t. When 5 I s0:00. the mux
passes the present fl ip-fl op value back to the flip-fl op, causing the flip-fl op to get reloaded
with its present value on the next ri sing cl ock, thus mai ntaining the present val ue. When
51 S 0:0 I , the mux passes the external 10 input to the flip-fl op, causing the flip-flop to get
loaded. When 51 S 0: 10, the mux passes the present value of the flip-fl op output from the
left, Q I, thus causing a ri ght shift. s i s 0: 11 is not a legal input to the register and thus
should never occur; the mux passes Os in thi s case.
Register with Parallel Load, Shift Left, and Shift Right
Adding a shift left operati on to the above 4-bit register is straightforward. and is hown in
Fi gure 4. 16. Instead of connecting Os to the 13 input of each 4x I mux. we instead
connect the output from the flip- fl op to the ri ght. The ri ghtmost mux's 13 input would be
connected to an addi ti onal input 5 h 1_ in.
13 t2 t1 to
(a)
shUn
shr_in
51
sO
(b)
Figure 4.16 4-bi l regisler wilh parall el load. shift lefl. and shin righl operations: (a) internal
design. (b) bl ock symbol.
UNUSED INPUTS,
The example in Figure 4. 14 included 3 mux wi th 4 inputs
of which we onl y used 3 inpuis. Notice that we aClually
sel the unused input to a parti cular value. rather than
simply leaving the input unconneclcd. Remember that
the input is controlling lransistors inside the
component- if we don' t <.15sign n value to the inpul. will
the internal [fUn istors conduct or nOI conduct? \Vc:: don't
really know. and so \\ e C' uld get undesired beh:l\ iar
from the mIL' . Leaving inputs unconnected should not be
done. On Ihe other hand. lea\'ing outputs unconnected is
no problem-an unconnccted output ha\ e a 1 or n
thai simply doesn't control anything clse.
162 4 Datapath Components
The register has the operat ions shown in
Figure 4. 17.
Load/Shirt Register with Separate Control
Inputs for Each Operati on
Registers Iypicall y don' l come wilh conlrol
inpulS Ihal encode Ihe operation inlO the
minimum number of bil s li ke the conlrol
inpulS on Ihe regislers we designed above.
Inslead. each operali on usuall y has ils own
cOlll rol inpul.
So a register wilh Ihe
operati ons of load, shi fl lefl.
and shift righl. mighl have Ihe
inpulS and operati on labl e
shown in Figure 4. 18. The
four poss ible operati ons
(mainlain, shilt left, shifl ri ght
and load) reall y onl y require
two control inputs, but the
figure shows that the register
has three control inputs-l d,
Id
o
o
o
o
shr
o
o
1
o
o
shl
o
o
1
o
1
o
sl sO Operation
0 0 Maintain present value
0 1 Parallel load
1 0 Shift right
1 1 Shift left
Figure 4.17 Operation table of a 4-bit
register with parallel load. shirt left,
and shin ri ght openlli ons.
Operation
Maintain present value
Shift left
Shift right
Shift right - shr has priority Over shl
Parallel load
Parallel load - Id has priority
Parallel load - Id has priority
Parallel load - Id has priority
shr, and shl.
NOli ce that if Ihe user
sets more than one control
inpul 10 1. we muSI decide
Figure 4.18 Operat ion tabl e or <I 4bil register wit h separate
control inpul s ror parallel load. shifl lefl. and shift right.
what operation 10 perform. If
the user sets both s h r and s h 1. we' lI give priority to s hr. If the user asserts 1 d and
either or both of s h rand s h 1. we' ll give priority 10 1 d.
The internal design of such a regi ter is similar to the load/shift register designed above,
except that the three control inputs of 1 d, shl, and shr need to be mapped to the two
control inputs S 1 and sO of the earli er register, using a simpl e combinati onal circuit, as
shown in Fi gure 4. I 9.
I I I I
shr in
' 3 / '2 / '1 /10 /
L shein
13 12 11 10
-
j-- s l
shUn
shl in
combi-
r- sO
- national
ci rcuit
t>
030201 00
-
03 /0 2 /01 /00 /
t>
I I I I
Figure 4.19 A small combi national circui t maps the control inputs 1 d. shr. and shl to the
mux ,elect inputs S 1 and sO.
4.2 Registers 163
Figure 4.20 Truth tabl es
Inputs
Outputs
describing operat ions of a Note
regi ster with lert/right
Id shr shl sl sO Operation Id shr shl Operation
0 0 0 0 shirt and parallel load 0 Maintain value
-
0 0 0 Maintain vaJue
along wit h the mappi ng of
0 0 1 1 1 Shift left
0
-
0 1 Shift left
the register control inputs
0 1 0 1 0 Shift right
0
1 X Shift right
to the inlcmal 4x I mux
0 1 1 1 0 Shift right
f1
1 X X Parallel load
select lines: (a) complete
1 0 0 0 1 Parallel load
operat ion ta ble defi ning
1 0 1 0 1 Parallel load (b)
the mapping or 1 d, s hr .
1 1 0 0 1 Parallel load
and shl to sl and sO.
1 1 1 0 1 Parallel load
and (b) a compact version
of the opcn:lli on tabJe. (a)
We can design that combinati onal circuit starti no from a simpl e truth tabl e shown in
Fi gure 4.20(a). 0
We th us obtain the fOll Owing equati ons for the regi ster's combinational circuit:
sl = ld'*shr ' *shl + ld ' *shr*shl ' + ld'*shr*shl
sO = ld'*shr'*shl + ld
Replacing the combinati onal circuit box in Fi gure 4. 19 by the gates described by the
above equati ons would compl ete the register's design.
. Register dalasheets typi call y show the register operation table in a compact form.
takll1g advantage of the priorilies among Ihe control inputs. as shown in Figure 4.20(b). A
sll1gle X 111 a row means that row is actuall y two rows in the compl ete table. with one row
havll1g 0 111 Ihe positi on of the X, the other row having I. Two Xs in a row means that row
IS actuall y four rows in the complete table. one row havi ng 00 in the positions of those
Xs, anot her row having 01. anO,ther 10. and another 11. And so on for three Xs. repre-
sentll1g 8 rows. Note lhat pUlling hi gher priorit y control inputs to the left in the table
keeps the table' opera li ons ni cely organi zed.
Register Design Process
Tabl e 4. 1 describes a general process for designing a register with any number of functions.
TABLE 4.1 Fourstep process for designing a multifunction register.
I.
2.
3.
4.
Step Descri ption
Determine
mllX size
Create mllx
operaTion fable
COl/fl eet mll.X
inplllS
Map cOllfrol
lili es
Count the number of operations (don't forget the maintain present vaJue
operati on!) and add in rront of each flip-Rop a mux "ith at least that
number of input s.
Crc:uc an operati on table defi ning the desired operalion for each
possibl e value of the 1ll1lX select lines.
For each operation. connect the corresponding I1lUX data. input to Lhe
appropriate external input or flip-fl op OUlput pa..-.sing through
some logic) to achicve the desired operat ion.
Create a lnllh table that Illaps external control lines to the internal mu,
select lines. with appropriate priori Lies. and then design the logi to
achieve lhnl mapping
We' ll illustrate the regi ster design process \ it h another example.
164 4 Datapath Components
EXAMPLE 4.7 Register with load, shift. and synchronous clear and set . .
. . foll owing operations: load. shift lelt. synchronous cl ear, and
We want 10 design a register with the r h on<>ration (1 d. 5 h 1. c 1 r. set). The s)' l/chro-
. h . controllOpUIS l or enc .. -
synchronous SCI. wit unique . I d all Os into the register on the next rising clock
nOll s clear opermi on on :1 means to nil 15 into the register on the next ri sing cl ock
Th I s set opernuon means to
edge. e S) ' IIC tTOIl Oll .' cd because some registers come wilh asy"chronous clear or
edge. The lerm synchronous IS Incl ud h 'gister design method of Table 4. 1. we perform the fol-
asynchronous set operations. Foll owIOg I e rc
lowing sleps:
. . , . . There arc 5 operati ons- load, shift ,left : synchronous clear.
Stcll l. Determlilc mux Size D ' ,rorget the mmntaIn present va lue operat ion as
Il aus set, and maintain preselll I'a/ll e. on ,
that opcnl1i on is impli cit.
Step 2: Create mux operation table. We' ll use
the fi rst 5 inputs of an 8x I mux for the
desired 5 operations. For the
3 mux inputs. wc' lI choose to mmnlam
Ihe present value. though those mux
inputs should never be utili zed. The
,able is shown in Figure 4.21.
Figure 4.21 Operat ion lable for a register
wit h load, shift , and synchronous clear
and set.
Step 3: Connect mux inputs" We connect Ihe
mux input s as shown in Fi gure 4.22.
whi ch for simplicil y shows onl y the
Illh nip-nop and mux of the register.
Figure 4.22 Nth bit-sli ce of a register with
the foll owi ng operations: maint ain present
value. parallel load. shirt lefl. synchronous
clear. and synchronous sel.
s2
o
o
o
s1
o
o
1
o
o
sO
o
1
o
1
o
o
Operation
Maintai n present value
Parall el load
Shih leh
Synchronous clear
Synchronous set
Maintain present value
Maintain present value
Maintain present value
In
....
... On-l
D
o
On
Step Map control lines. We' ll give c 1 r highest pri ority, foll owed by set' .l d. and S h 1,
the register control input s would be mapped to the 8x J mux select hnes as shown In
Fi gure 4. 23.
Inputs Output.
elr set Id shl s2 .1 sO Operation
0 0 0 0 0 0 0 Maintain present value
Figure 4.23 Truth table
0 0 0 1 0 1 0 Shih leh
fo r the control lines of
0 0 X 0 0 1 Parallel load
a register with the Nth
0 X X 1 0 0 Set to all i S
bit-slice shown in
X X X 0 Clear 10 all Os
Fi gure 4.22.
4.3 ADDERS
4.3 Adders 165
Looki ng at each output in Figure 4.23. we deri ve the cqu3Iions describing the circuit that maps
the external comrol input s to the 1l11IX select li nes as foll ows:
52 c1r ' *set
51 c1r " set ' *ld "'sh1 + c1r
sO c1r ' *set ' *ld + c1r
We could then cre3(e a cOlllbin:ll ional circuit implementi ng those equations, to map the external
register control inputs to the mux select li nes. and hence. complcling thc register' s design.
Some registers come with asynchronous clear and/or asynchronous set control
inputs. Those inpulS could be impl ement ed by connecting them to asynchronous clear or
asynchronous set inputs thm exist on the ni p-nops themselves.
Addi ng two bi nary numbers is perhaps the most common operat ion perfonned on data in
a di gital system. An N-hil adder is a d:ltapath component Ihat adds two N-bi t binary
numbers A and B, and generates an N-bit sum S and a I-bit carry C. For instance, a 4-bit
adder adds two 4-bilnumbers. like DIll and 0001 , result ing in a 4-bit sum. li ke 1000.
with a carry of O. 1111 + 0001 would resull in a carry of I and a sum of 0000 (or
10000 if you treat the carry bil and sum bits as one 5-bit result). N is oft en referred to as
the \Vi dlil of the adder. Designing fasl yet size-effi cient adders is a subj ect that has
received considerabl e att ent ion for many decades.
Although it appears that we coul d design an N-bil adder by foll owing the combi -
nati onal logic design process of Table 2.5, it IUrns out thai building an N-bit adder
foll owing that process is not very pracli cal when N is much larger than 4. A 4- bit
adder has IWO 4-bit input s. meaning eighl input s total, and has four sum outputs and a
carry oUlpUt. So we could des ign the adder using Ihe standard combinalional logic
des ign process of Table 2.5. For exampl e, a 2-bit adder, whi ch adds two 2-bit num-
bers, could be desi gned by starting with the truth table depi cted in Figure 4.24. We
could then impl ement Ihe logic using a two-level logic gale based implementation for
each output.
Inputs Outputs Inputs Outputs
.1 aO b1 bO e s1 sO .1 aO b1 bO e s1 sO
0 0 0 0 0 0 0 1 0 0 0 0 1 0
0 0 0 1 0 0 1 0 0 1 0 1 1
0 0 0 0 0 0 1 0 0 0
0 0 1 1 0 1 0 1 1 0
0 0 0 0 0 1 0 0 0 1 1
0 0 1 0 0 0 1 0 0
0 0 0 1 1 0 0
0 0 0 0
Figure 4.24 Trulh table for a 2-bil adder.
166 Datapath Components
. hat for wider adders. the approach resul ts in
The problem with such an approach" t , 6 b' . dd>r has 16 + 16 = 32 inputs
I I' h
' bl ' d too gate,. A I - II a c ,
too "rge 0 Iru t t,1 e' ,In .// . 's A two- level logic gate based
. , bl ' Id h'IVe over jOllr bl rO/l IVII .
mealllng the trulh t.l e wou , . "II ' of oates To ill ustrate this
. I> > , " I' h' ' bl> would likely reqUIre ml Ion 0 .
IInp ement,lIl on a t ,II ta e. . . ' h we used Ihe standard combinational logic
Point we performed an tn whl c . ' h I b' dd ' u
' . . .' wi dth stantng Wit - II a ers on up. ",e
de'ign proce" 10 create adder> 01 Increasing i n 'tool avai labl e. and asked the tool to
used the most advanced commercial logiC des g r d" OR
. . ( I ve l of AND gates lee tng tnlO an gatt
create a design u. ing two levels of logi C one e II ')
. . ber of gates (actua y. trans istors .
for each out put) and using the minimum num
The plot in Figure 4.25 sum-
mari ze:, OUf results. Not ice how
fast the number of transi. tors
grows as the adder width i,
increased. This fast growth is an
effect of exponential growth- for
an adder wi dth of N. the number of
truth table rows i, proporti onal 10
2N (more preci,ely, 10 2"' ,v).
Clearly, Ihi s exponential growth
prohibits uS from w,ing the stan-
dard design proces, for adders
wider than perhaps 8 to 10 bits. We
could nOI compl ete our experi-
ments for adders larger than 8
bi ts-the 1001 simpl y could nOI
1?
* 6000
'in
c
'" ,::
2 3 5 7
N
Figure 4.25 Why large adders aren't built using
!"wndard two-level combinati onal logic-nOlice the
exponential growth. How many transistors would a
32-bil adder require?
complete the design in a reasonable . .
amount of ti me. The tool needed 3 seconds 10 build the 6-bll adder,40. sewnds to bUIld
the 7-bi t adder, and 30 minutes for the S-bit adder. The 9-bl t adder dldn t fi nt sh aft er one
full day. Looking at thi s data. can you predict the number of transistors requlfed a 16-
bi t adder or a 32-bi t adder u ing two-l evels of gates? From the figure, II looks hke the
number of transistors is doubling for each increase in N, with about 1000 transistors for
N=5. 2000 t.ransistors for N=6. 4000 transistorS for N=7. and 8000 transistors for N=8.
Assuming that trend continues for larger adders, then a 16-bit adder woul d have S more
doublings beyond the S-bi t adder. meaning mult iplying the Size of the S-bll adder by
2
8
=256. So a 16-bit adder would require 8000 256 = about two mi lit on transistors. A
32-bit adder would require an additi onal 2
16
=64K doublings, meaning 2 mi ll ion 64K =
over 100 bi/lio/l transistors. That's an outrageous number of transistors. We clearly need
another approach for designing larger adders.
Adder-Carry-Ripple Style
An alternative approach to the standard combinational logic design process for adding
two binary numbers i to instead create a circuit that mimics how we add binary
numbers by hand. which is one column at a time. Consi der the addi tion of a binary
number A-IlIl ( 15 in base 10) and 8-0110 (6 in base 10), column by column, shown
in Fi gure 4.26.
Figur. 426 Adding 1\' 0 bln"ry numbe"
b) h;md. column by column.
+ 0
o t t 0 1
4.3 Adders 167
+ 0 o
o I 0 I
For each column, we odd Ihree bit, togelher, "lid we generate II SlIllI bit ror the
column and a carry bi t fnr the ne" colllllln. The firs l COIUIlIII is all exception in
that we onl y ad I two bi t, t gelher, hUI ,till a MIIIl IIlId tt curry bit. The carry or
the last column become, the lifth bit or the ' lim. The MIIIl i, 101 0 I (2 I in base 10).
We can create a c mhinat ional compollenl to perrOrlll the requ ired addilion 1'01' a
single column. The input' and outpuh of ' "ch colllpOll ent s arc , lt owil in Figure 4.27.
Thus, all we need to do i, de, ign tho,c cOlllpOll ent S thai perrorm Ihe addi lion in each
column. and connect them together u, shown ill Fi gure 4.27 to creat e" 4- bi l adder. Bear
in mind, though. that this llI ethod or creating lin adder ;' illtended to enuble eflicient
design of wider adder,. like those with 8 hil ' and above. We arc ill uslruting Ihe metllOd
u. ing only n 4-bi t adder becm,," that ,ile adder keeps our figures sma ll and readable, but
if al l we rcall y needed wa' a 4-bi l adder, Ihe , tandurd combinalioll al logic design process
for two-Icvel logic wou ld probably work j UM line.
0 -------,
Figure 4.27 sing
combinat ional components
to add Iwo binary numbers
colu mn by column.
A:
+ B: 0
o
SUM
We' ll now design the components in each column of Fi gure 4.27.
Half-Adder
A half-adder is a combinalional component that adds two bits
(a and b), and generales a sum (5) and carry out (c o) bit. ( ote
that we did flot ay that a half-adder adds /lVO 2-bi t /Ill/fi bers-a
half-adder merely adds tlVO bits.) The componenl on the ri ght in
Figure 4.27 that adds the rightmo t column's two bits (a and b)
and generates the sum (5) and carry-out (CO) bit is a half adder.
We can design a haJJ-adder using the straightforward combina-
tional logic design process from Chapter 2, as foll ows:
Inputs Outputs
b co
0 0 0 0
0 1 0
0 0 1
0
Figure 4.28 Trut h table
for a half-adder.
168
Dalapalh ComponenlS
lrulh table 10 caplure lhe funclion. '!be
'tep I: We ' ll use a
nppropnnle lrul h lable" hO" n In Figure 4.2
and rhal S - a' b
Slep 2: Convert 10 We can clearly see lhal
db' Ole Ihallhe equullon S - d' b + ab' i lhe arne as
Slep J: Creole circuil. The cIrcUli
for ,I half-adder, Implemenl'"g Ihe above
equulloo;. I' ,h()wn III Figure 4 29(a),
Fi gure <I ,29(b) ,how, U bloel ,ymbol fa
half-ndder
Full-Adder
Jull-adder " " cmnb,", lI onal compo-
nenl Ih"1 Jdd' Ihree hll\ (d, b, and cO
and generale, J 'um (s ) ,,"d a carry-oul
(co ) bll. ( ole Ihar we did flol Ihal a
full -adder add, 111'0 J-bll fI"mben- 1I
a b
co
(I )
Halfadder
(HA)
co 5
(b)
figure 4211 Half-adder' (.) cireuil. and
(b) block symbol.
llIerely .,dd, ,"r", bllf. ) The three component in Figure 4.27 thaI add the Iwo bilS of a
column (a and b) along Wllh Ihe carry from Ihe column on the righl (ci) and generates
the SUIll (s ) and carry oul (co) bll_ are full -adders, We can de ign a full-adder usi ng !be
' 1r:lI ghlforw" ru comblllallonal logIC de,ign proccs . as follows :
Step I: Capillre t"eJl/llction. We'll usc a ItUth lable 10
caplure Ihe funcll on, , hown In Figure 4.30,
tep Z: Com'.rl/o equations. We oblain the foll owi ng
equali on, for co and S, For ; impli ilY, ler' s wri le C i as
c, We'lI u,c algebmic method, 10 implify rhe equations,
co - a 'bc + ab'c + abc' + abc
co - a ' bc + abc + ab'c + abc + abc' +
abc
co - (a'+a)bc + (b ' +b)ac + (c '+c)ab
- a'b ' c + a ' bc ' + ab ' c' + abc
- a'(b'c + bc ' ) + a(b ' c ' + bc )
- a ' (b xor c) ' + alb xo r c)
, t ,
Inpula Outpula
b 01 co I
a 0 a a a
a a 1 a
0 0 a 1
a 1 1 0
a a a 1
a 1 0
a 0
Figure 430 Trurh lable for a
full-.dder,
During algebmic simplification. for co, we nOled Ihat each of rhe first three terms could
be combi ned wilh Ihe In I term abc. as each of the first three lerms differed from rhe last
lem1 in jusl one lileral. We thus re3led three instances of rhe last' term a bc (which
doesn't change the funclion) and combined rhem with each of rhe first three lerms. DoO' I
worry if you aren'l able 10 come up wirh thaI simplification on your own righl now-
Seclion 6,2 introduces merhocls to make such simpli fication more straightforward. If you
have read rhn! seclion, you mi ghl try usi ng a K-map (introduced in that secLion) to sim-
plify the equali ons,
J: thr drt'u;t.
The CIrcUlI f{lr lull',ld<kr I'
,ho" n 10 -I .\ I( a t, .uld
rhe lulI adder', hi. '\mhol
" hO\, n 10 hgure J 11 (h)
-I- Oil url') -Rippl \ddrr
L"lOg Ihrec lull'Jd(k" ,Uld
one halt -add r, "e an
a 4-bll carT)-npple adder,
"h,ch add, I"" -I bll
numbe" ,lOd gener.He, " J
bll urn. ,ho"n 111 h j!urc
.j L The 4-hll CdrT) npple
Jdder ,10,0 generale' J l'drT)
oul bll
aJb3 112b2
co 53
(a)
4 J Addors
a b co
co
(I )
Figure 4 31 I uti Jddrr (,lllIreu'l. II "d (b) hlock 'YlIlh,,1.
., bl
J
03020 100
4bll ddor
co s3s2s150
- II
. 1 aO
(b)
fIgure 4 J2 4bll adder, ( ) arry "pplc ImplCIllCnlnllOn Wllh 3 fuli -adde" und I 111M-udder, und
(bl blocl tmbol
169
We can Include a carry-In hll WIth Ihe 4-blt lIdder, which cnllble; to connCCl 4-bil
adder\ logether 10 build larger adde". We Include the cllrry-in bil by replacing Ihe half-
adder (whIch WOL\ In Ihe ri ghtm \1 bil po\lllonj by a fu ll -adder. , hown in Figure 4,33,
a3b3 a2b2 81 bl
co 53 s2 51 sO
(a) (b)
figure 4.33 4-bi l adder: fa) carry-ripple implemenlalion with 4 full-addc". wilh a carry-in inpul,
and (bj block symbol.
170 Datapath Compone nts
. .' r Su ose that all inputs have been Os for a long
Let 's ana lyze the behavIOr 01 thi s adde. POP d ' II c i va lues of the full adders will
S
' 11 b 0000 co wtl l be . an d
time. meaning that WI e . 11 d 8 becomes 000] at the sa me time (whose
also be O. oW suppose that A becomes 0] an f A and 8 will propaoate throuoh the
] 000)
Th ,ew values a , " "
sum we know shoul d be . ose I .? S So 2 ns after A and 8 change, the sum
full-adders. Suppose the delay of a full -adder IS - n .. F' re 4 34('1) So 53 will become
' 11 h 0 as shown In Igu . ,.
output s of the full -adders WI c. an"e. +0+0= ] (with c02=0), 5] will become
0+0+0=0 (with c 03=0). 52 wtll become ]1 ]-0 ( ' th coO=] ) But 1111 + OllO
. 1-0) d sO will become + - WI .,
1+0+0=] (with co - . an 01000 What went wrong?>
should not be DOll O-inste:ld. the sum should be .
0111+0001
c030 Os3
Output after 2 ns (1 FA delay)
o 0
Output after 4 ns (2 FA delays)
Output after 6 ns (3 FA delays)
o
o (d)
o Output after a ns (4 FA delays)
Figure 4.34 Exampl e of adding 0111 +0001 using a 4-bil carry-rippl e adder. The output wi ll
exhibit temporaril y incorrect (spuri ous) results until the carry bit from the fi ght most btl has had a
chance to propagate (ri pple) all the way through to the leftmost bit.
OI hing went wrong-the carry- ripple adder simply isn' t done yet after ju t 2 ns.
After 2 ns, coO changed from 0 to 1. Now, we must all ow time for that lIew va lue of coO
to proceed through the next fu ll -adder. Thus, after another 2 ns, 51 wi ll equal 1 +0+ 1 =0,
and co2 wi ll become 1. So after 4 ns (two full -adder delays). the output will be 00 100,
as shown in Figure 4.34(b).
The IeI'm "ripple<
carry" adder is
(IClltol/Ylllore
COII/IIIOIl. I prefer
Ille term "corn-
ripple" for .
cOllsistent I/alll illg
lIIith OIlier adder
types. like carry-
sdeCi (lIId carry-
lookailea{/, which
we describe in
Chaprer 6.
4.3 Adders 17 J
Keep waiting. After a third full-adder delay. the new value of co2 wi ll have propagated
through the next full -adder, resulting in 52 becomi ng 1+0+1- 0. with c o2 becoming 1. So
after three fu ll-adder delays, the output will be 00000. as hown in Figure 4.34(c).
Just a htl le more patience. After a founh full-adder delay. co2 has had time to pro-
pagate through the last full-adder. resulting in 53 becoming 0+0+1-1, wi th c03 staying
O. Thus, after four full-adder del ays. the output will be 01000. as hown in Figure
4.34(d), and 01000 is the correct re ult.
To recap. until the carry bits have had time to rippl e through all the adders. from
ri ght to left. the output was not COrrect. The int ermedi ate output va lues are known as spu-
rious values . The delay of the 4-bit adder, meaning the time we must wait until the Output
IS the stable correct va lue, is equal to til e delay of four full -adder. or 8 ns in thi s case.
which is the time for the can'y bit s to rippl e through all the adders-hence, the term
carry-ripple adder.
Students often inti all y confuse full-adders and N-bi t adders. A full-adder adds 3 bilS.
In contrast. a 3-bit adder adds two 3-bit numbers. A full -adder produces one sum bit and
one carry bit. In contrast, a 3-bit adder produces three sum bilS and one carry bit. A fulJ -
adder is usually used to add onl y olle colf/1I111 of two binary numbers. wherea an N- bit
adder is used to add two N-bit numbers.
An N-bit adder often comes wi th a carry-in bit. so that the adder can be cascaded
with other N-bit adders to form larger adders. Figure 4.35(a) haws an 8-bit adder built
from two 4-bit adders. We would set tWe carry-in bit (ci) on lhe ri ght to 0 when adding
two 8-bit numbers. Figure 4.35(b) shows a block ymbol of that 8-bi t adder.
a7a6a5a4 b7b6b5b4
a3a2al aO b3b2bl bO
ci abit adder CI
co
(a) (b)
Figure 4.35 8-bit adder: (a) carry-rippl e implementati on built from two 4-bit carry-ripple adders.
and (b) bl ock symbol.
EXAMPLE 4.8 DIP-switch-based adding calcul ator
Let 's design a very simple calculill or that can add two 8- bit bi nary numbers and produce an 8-bi l
result. The input binary numbers wi ll come frol11lwO 8-swil ch DIP switches. and the ourput \, i11 be
di splayed usi ng 8 LEDs. as illustrated in Fi gure 4.36. An 8-bit DIP (Dualllllille Package) ,witch is
a simpl e digital component havi ng switches that a user cnn by h:md mo\'e up or dO\\ n. \\ ilh up out-
putting a ] on the corresponding pin. and down outputt ing a O. An LED (Iight-emitling diode) is jU'1
a smalllighl Ihm illumi nates when the LED's input 1. and is dark when the input O.
We con implement this calculator by ut ili zing an 8-bi t c:llT) -ri pple adder for the CALC block.
as shown in Figure 4.36. \Vhcn n moves the switches on 3 DIP s \\ itch. the ne\\
propagate through the adder's gates. generating intcnnittent outputs and henC'<' C3lb1ng
172 4 Datapath Components
B.bit carry-ripple adder
ci 0
Figure 4.36 8-bit DIP-switch- LEOs
based addi ng calculator. The
addition 2+3=5 is shown.
CALC
. .' 0 until the values have finall y propagated through the entire cir-
rapid blinking of some of the LE s'. d th LEOs display the correct new sum.
CUil , al which point the output stabliJZes an LEeD ' 1'1-. 'I.e intenniuent values. we can introduce a
,L bl" ki g of the s w "' "'
If we want to aVOId ",e In n . h ' d' t ' s when the new value should be di splayed. We
.. ] ..) t the system whlc to lea e
button e (for equa s . a fi ured 'both DIP switches to represent the new inputs to be summed.
press e only after haVing can g . . F re 4 37 We connect the e input to the 1 oa d
We can utili ze the e input with a register, as JO on the DW switches, new intennittent
d . When a user moves 5
input of a parallel loa regISter. bl k d at the regi ster's inputs, as the register holds its
values appear at the adder outputs, but are'L OC e 'ous value When the e button is pressed, then on
th LED d' splay ",at previ .
previous value and hence e s J ded d the LEOs will then di splay the new value.
the next clock edge the register wi ll be.IIO
I
abe only if the sum is 255 or less. We could connect
Notice that the displayed value WI carr
co to a ninth LED to display sums between 256 and 51 1.
1
Figure 4.37 8-bit DIP switch-
based adding calcul ator, using
a regi ster to block spurious
LED outputs. The LEOs onl y
get updated after the button is
pressed, which loads the
output register.
B-bit adder
ci 0
CALC
LEOs
EXAMPLE 4.9
4.4 Shifters 173
Delay and Size of an 8-Bit Carry-Ripple Adder
Assuming full-adders are implemented usi ng two levels of gates (ANDs followed by an
OR), and that every gate has a delay of I ns, let 's compute the total delay of a 32-bit
carry-ri pple adder. Let ' s also compute the size of such an adder.
To determine the delay, note first that the carry must ripple from the first full-adder to the
32nd full -adder. The delay of the first full-adder is 2 gates * I nslgate = 2 ns. The new carry
must now ripple through the second full -adder, resulting in another 2 ns. And so on. Thus, the
total delay of the 32-bi t carry-ripple adder is 2 nstfull -adder * 32 full -adders = 64 ns.
To determine the size, note that a full-adder requires approximately fi ve gates (we
say approximately because the 3-input OR gate in a full-adder requires more transistors
than each 2-input AND gates, and the 3-input XOR gate requires even more transistors).
Since the 32-bit adder has 32 full-adders, the total size of the 32-bit carry-ripple adder is
5 gates/full -adder * 32 full -adders = 160 gates.
The 32-bit carry-rippl e adder has a long delay, but a reasonable number of gates. In
Section 6.4, we' ll see how to build faster adders, at the expense of using more gates, but
still using a reasonable number of gates.
Compensating weight scale using an adder
A scale, such as a bathroom scale. uses
a sensor to determine the weight of an
object (e.g .. a person) on the scale. The
sensor's readings for the same object
may change over lime, due to wear and
tear on the sensing system (such as a
spring losing elasticity), resulting
perhaps in reponing a weight that is a
few pounds too low. Thus, the scale
may have a knob that the user can tum
to compensate for the low reponed
weight. The knob indicates the amount
to add to a given weight before dis-
B-bit adder
playing the weight. Suppose that a knob
can be set to change an input compen-
sat ion amount by a value of 0, I, 2,
Weight
clk
7, as shown in Figure 4.38.
We can implement the system using
an 8-bit carry-ripple adder, as shown in
the figure. On every rising clock edge,
the di splay register will be loaded with
the sum of the currently sensed weight
plus the compensation amount.
4.4 SHIFTERS
to display
Figure 4.38 Compensating scale: the dial outputs a
number from 0 to 7 (000 to Ill), which gets added
to the sensed weight and then di splayed.
Shifting is a common operation applied to data. Shifting can be u ed to manipulate b!ts,
li ke when we want to reverse the bits of a number. Shi fti ng is useful for communi aung
data serially, as was done in Example 4.6.
174 Datapath Components
Simple Shifters
.. d' 'd' " by a factor of 2. In base I 0, you are
Shift ing is also useful for multlpl ymg or IVI In" d b s Ilpl y appendin
o
a 0 to a
. I . I ' b 10 can be one y II 0
fami li ar wit h the Idea that mu li p yJJ1g Y . O I '111e as shiftin
o
left one
-' O -0 ApP' ndJJ1 o a IS tIe s, I 0
number. For exampl e. ) times I IS). " 0 d b pendin
o
a 0 meaning by
. . . . . b ? I ' I . 1" by ? can be one y ap 0'
pOSI ti On. LIkeWIse. In ase -. mu tip yll " - h . base 10 multiplying
. . 01 . . ?' . 1010 Furt ermore, 111 .
Shifting left one pOSlll on. So 01 times - IS ... I f ' So ' 11 base 2 multiplying
. O' 11Iftlno e t tWIce. I , ,
by 100 can be done by appcndJJ1 g two s. 01 S Of I . . base 2 1 's 1.Ike multi-
. . f . SI ' f( 10 Ie t t"ee lII11es In ,
by 4 can be done by shI ftIng Ie t tWIce. 11 II " . I ' b ? h' f(
Plyino by 8. And so on. And since shifting left is the same as multl p yJJ1 g y _, S I 109
o ... Od" ddb ? ls 0101.
ri ght is the same as dl vldJJ1 g by 2. So 101 IVI e . y - . fi nd the need to
. . . hft reolster sometimes we
Althou"h slll ftlng can be done uSIng a S I " ' . . d I h'ft b d' f
" .. h fa 11S the ShIft , an t lat can S lY 1- use a separale combinati onal component t at . n
ferent numbers of positi ons and in di fferent directi ons.
that can shift an N- bit input by some An N-bit shifter is a combinat ional component
amount to generat e an N- bit output. ... .. 5 we want a s hifter that
The simpl est shi fter shi fts one pOSIt IOn 111 one directIon. .ay . . 0
shi fts left by I positi on. That simpl e shifter's deSIgn IS strai ghtforward, COnSIStln
o
of Just
wires as shown for the 4-bit left shifter Figure 4.39(a) . Note that the shIfter has an addI-
tional' input thaL is the va lue La shifL int o the ri ghLmost bit.
i3 i2 i1 iO
W
q3 q2 q1 qO
$-
(a)
q3 q2 q1 qO
(b)
i3 i2 i1 iO
inl
q3 q2 q1 qO
(e)
Figure 4.39 Combi national shifLers: (a) len shifter wiL h block symbol shown at bOLl om, (b) len
shin or pass component. (c) left/right shi ft or pass component.
A more advanced shi fte r can eiLher shifL one pos iLi on when an addiLi onal inpuL sh is
1, or can pass the inpuLS Lhrough La the OULpULS unshi fLed when s h is O. We can deSign
LhaL shifLer USi ng 2x I muxes, shown in Fi gure 4.39(b). . .
An even more advanced shifLer can shift left or ri ghL one pos iL ion, shown JJ1 FIgure
4. 39(c). When bOLh shi ft control inpuLs are 0, the inpuLs pa s th rough unchanged. When
s hl=l , the shi fLer shifLS left , and when sh R=l, the shifLer shi fLS ri ght. When bOLh Lhose
control inpuLS are I, the shi fLer could be des igned La OULput Os by connecL ing Os La the 13
inpuLS of the muxe (noL shown). Funher eXLensions of the simpl e shIfter pOSSIble:
such as all owing shi ft s of one po iLi on or two posiLi ons. Such mulu funcLl on shlfLers
inLernal designs require larger muxes, and mapping of the control Ignals to the
select lines, jusL as was necessary in designing multifuncti on regi Lers.
4.4 Shift ers 175
EXAMPLE 4.10 Approximate Celsi us to Fahrenheit convener using a shifter
We arc given a digi tal thermometer that digiti zes a tempermure in
Celsius inlO on 8-bit binary number C. So 30 degrees Celsius woul d
be digili zed as 0001111 D. We wan! to Conven Ihal lemperat ure '0
Fahrenheit . again using 8 bit s. The equmion for convert ing is:
F = C*9/5 + 32
Let's assume that we are nOI concerned about accuracy. so we' ll
repl ace the equati on by a simpl er oll e;
F = C*2 + 32
We can design the converter straightforwardl y using a left shifter
(wilh a shin in value of 0) 10 compule C*2. and Ihen an adder to add Figure 4.40 Celsi us to
32 (00 100000). as in Figure 4.40. Fahrenheit convener.
.. FAHRENHEIT VERSUS CELSIUS_
The U. S. represents temperature usi ng Fahrenheit .
whereas most of the world uses the metri c system's
Celsius. Presidents and other U.S. leaders have desired
lO switch to the melfic system for almost as long as the
U.S. has existed, and several aClS have been passed over
the centuries, the mosL recem being the Melri c
Conversion Act of 1975 (amended several limes since).
The ACL designates the metri c sysLem as the preferred
system of weights and measures for U.S. Irade and '
commerce. Yet switChing (0 metric has been slow. and
few Ameri cans Loday are comfonabl e with metric. The
probl em with such a slow transiri on was poignantly
demonstraLed in 1999 when Ihe Mars ClimaLe Orbiter.
EXAMPLE 4.11 Temperature averager
Recall Example 4.3, in whi ch registers
were used to save a hislOl)' of tempera-
ture values over the last three clock
peri ods. We want to extend thi s system 10
save the last four values instead of three.
We also want the system to compute the
average of the last four values and output
thai average on an output Tavg. The
average of four va lues Ra, Rb. Re, and
Rd is (Ra+Rb+Re+Rd) 14. NOie thaI
dividing by 4 is the same as shining right
by two. Thus. we can design the system
using a right shifter Ihat shifts by two
pl aces (wiLh a shift in value of 0). as
shown in Figure 4.4 1.
costing seveml hundred million dollars. was destroyed
when enteri ng the Mars atmosphere too quickly_ The
reason: "a navigati on error resuhed from some
spacecraft commands being ent in English units
instead of being converted to metric units." (Source:
www.nasll .gov). Perhaps if all readers of this book in the
U.S. use Celsius when they talk. we' ll help speed up the
transiti on? So instead of saying a warm ninety
degrees outside today," say "II 's a warm thirty-two
degrees outside today." Actuall y. we mjghl say '11's a
wann three ten and two degrees outside today"
(remember correct counting in Chapler I?).
Figure 4.41 Temper:Jlure a\'erager using a right-
10 divide b) 4.
b
176 4 Datapath Components
Barrel Shifter
An N.bil barrel shifler is a general purpose Nbit shift er that can shift or rotate any
number of posi tions. For simplicity. le!"s consider only left shIfts for the moment. An ..
bit barrel shifter can shift left by I position. 2 positIons. 3 poslllons. 4 posllJOns, 5 poSI
tions. 6 position,. or 7 positions (and of cour eO positions. meaning no shift is done). An
8.bi t barrel shift er therefore requires 3 control inputs. say x. y, and Z, to speCIfy the dt s
tance of the shift. xy z- OOO may mean no shift. xy I shift by I position, xy
shifL by 2 positi ons. etc.
We could design such a barrel shifter by placing an 8x I mux in front of each of the 8
shifter outputs. connecti ng xyz to each of the eight mux's select input. and then con
necLing the mux inputs wi th the appropriat e shifter inputs for each configuraLion of x, Y,
and z. So 10 (corresponding to xy z- OOO. meaning no shi ft) of each mux would just get
the present bit's shift er input. II (corresponding to meaning left shift by one
position) would get the shift er input one posi tion to the ri ght. 12 (xy z=O I 0, meaning left
shift by two positions) would get the shifter input two positions to the right. And so on.
Such a design. whil e conceptuall y strai ghtforward. has too many wi res being routed
about. And the design does not scale well to larger bit widths. such as a 32bit barrel
shift er-a 32x I multiplexor cannot be built with two levels of gates (AND/OR), because
gates with 32 inputs are too big to be implement ed efficiently. and must instead be imple
ment ed using multiple levels of small er gates.
A more elegum de,ign for an Sbi t barrel shifter
consists of 3 cascaded simple shift ers. as shown in
Figure 4.42. The firs t simple shift er can shift left four
positi ons (or none). the second c<ln shift left by two
positions (or none). and the Lhird by one position (or
none). Notice th<lt the shift s "add" to one another-
shifLing left by two, then by one. results in a total
shift of Lhree positions. Thus. by configuring each
shifter appropriately. we can obtain a total shift of
any amount between zero and seven. ConnecLi ng the
control inputs xy Z to the shifters is easy-just think
of xy z as a binary number representing the amount
of the shift, x represents shifting by four, y shifting
by two. and z shifti ng by one. So we just connect x Figure 4.42 B bit barrel shifter
to the left-by-four shifter, y to the left-by-two shifter, (Iefl shift onl y).
and z to the left-by-one shifter.
The above design considered a barrel shifter LhaL could only shift left. We can easily
extend the barrel shift er to support both left and right shifLS. We would replace the
internal left shifters by shifters Lhat could shift left or right, Lhus each having a control
input indicaLing the direction. The barrel shifter would Lhen have a direction control input
also, connected to each internal shifter's direcLion control input.
Finally, we can easily extend the barrel shifter to support shifLS and rotates. We would
replace the internal shifters by rotators Lhat could ei ther shift or roLate, thus each having a
control input indicaLing whether to shift or roLate. The barrel shifter would then have a shift-
or-rotate control input also, connected to each internal shifter's shift-or-rot ate control input.
4.5 Comparators
177
45 COMPARATORS
We often Want to compare t b'
than the other F wo tnru: numbers to see if Ihey are equal, or if one is greater
suring we 111Ight want to sound an 819rnl if a thermometer mea-
Fahrenheit (394 d Y emperature reports a temperature greater than 103 degrees
binary egrees Celsius). Comparator component s perf0n11 such comparison of
Equality (Identity) Comparator
An Nbil eqllalily COm I ( .
para or sometImes called an idel/lily comparalor) is a datnpath
cfomLhponent .that compare two N bi t input A and B. setting an output control signal to 1
I e two tnputs are equal 1\yo N b' .
B-b3b2blbO . . It mputs, say two 4 bit inputs A- a3a2a l aO and
a3-b3 2 b2' arc equal If each of theIr corresponding bit pairs are equal. So A-B if
.a - ,al-bl. andaO-bO.
turinFOllowing the combinational logic design process of Table 2.5, we can start by cap-
g the functi on of a 4 bit equali ty comparator as an equation:
eq - la3b3+a3 'b3 ' 1 * la2b2+a2 'b2 ') * lalbl+al'bJ'1 *
laObO+aO 'bO' 1
b th term detects if the corresponding bits are equal, namely, if both bits are 1 Or
o li S are O. The expressions inside each of the parentheses represent the behavior of
?n XNOR gate (recall from Chapter 2 Lhat an XNOR gate outputs J if the gate's two
tnput bIts are equal), so we can replace the above equati on by the equivalent equation:
eq - (a3 xno r b31 * (a2 xnor b2) * lal xno r bl) * laO
xnor bO)
We convert the equation to the circui t in Figure 4.43.
a3 b3 a2 b2
4blt equality comparator
eq
Ib)
Figure 4.43 Equality comparalor: la) inlemal design, and (bl block symbol.
Of course, we could have built the comparator starting with a truLh Lable, but that
would be cumbersome for a large comparator, with too many rows in the truth table to
easily work with by hand. A truth Lable approach enumerates all the possible situation
for which all the bits are equal, si nce only those situations would have a I in the column
for the output eq. For two 4-bit numbers, one such situation will be 0000-0000.
J
178 Datapa th Components
Anot her wi ll be 000[=0001. learly, there wi ll be as many situat ions as there are 4-bit
binary there wi ll be 2
J
= 16 where both are equaJ.
F r two 8-bi t numbers, there wi ll be 256 equal ilUall ons. For two 32-blt numbers, there
will be four bi ll ion equal ,i tuat ions. A comparator built wilh such an approach wi ll be
large if we don' t min imile Ihe equation, and Ihat minimi zali on will be hard with such
large numbe" of terms. Our XNOR-based des ign looks 10 be much simpler and scales to
wide inputs wonderfu ll y- wi dening Ihe in put s by one more bil involves merely adding
One morc gil le.
Magnitude Comparator- Carry-Ripple Style
An N-bitmagl/itll de comparator i, a dalapalh componenl Ihal compares two N-bit binary
numbers A and B. and indicmes whel her A>B. A=B, or A<B.
We have already seell several limes Ihal designing cert ain datapat h components by
sl:l rt ing wilh a Irul h lable involves 100 large of a trulh lab Ie. Lei's instead design a magni-
tude comparalor by con,idering how we compare numbers by hand. Consider comparing
IWO 4-bil number> A=a3aZalaO-10 11. B=b3bZb lbO= 100 1. We stan by looking al the
high-order bi IS of A and B, namely. a3 and b3. Since Ihey a.re equal (bot h are 1). we look
at Ihe nc.XI pair of bits. a Z and bZ. Again. since Ihey are equal (both are 0), we look at the
neXI pair of bi". al and b1. ince aJ>bl (l>O), we conclude thai A>B.
Thu" comparing IwO bi nary numbers takes place by comparing from the hi gh bit-
pairs down 10 Ihe low bil -pairs. A; long as bi l-pairs arc equal. we need to compare the
neXI lower bil -pair. As soon as a bil-pair is different. we conclude that A>B if a i = 1 and
bi =0, or Ihal A<B if bi -1 and ai-D. We can thus des ign a magni tude comparator using
Ihe struclure shown in Figure 4.44.
a3 b3
tgl __ in_gl
leq ..... in_eq
lIt __ in_1I
b
oul_gl
oul_eq
OUUI
Stage3
a2 b2 al bl
SIage2
(a)
Stage 1
4bil magnilude comparalor
(b)
aO bO
a
AglB
AeqB
AtlB
SlageO
Figure 4. 44 4-bil magnil ude comparalor: (a) internal design usi ng idenli cal components in each
slage. and (b) bl ock symbol.
Each stage works as follows . If i n_9t=1 (meaning a hi gher stage determined A>B),
this stage need nOl compare bits, and just sets ou t_9 t = 1. Likewi se, if i n_lt = I
(meaning a hi gher stage determined A<B), this stage ju t sets out_ l t=1. If i n_q=1
(meaning hi gher stages were all equal), thi s stage must compare bits, setting ou L 9t =1
4.5 Comparators 179
if acl and b-O, sell inc Ollt I - [ ' f'
I a-O and b- 1. " nd setl ing out q-l if a and b
bOlh equal I or bOlh equal O. -
We coul d C'lplure II r . , '
b
. h ' Ie unCll on 01 a siage', block a Irulh lab Ie wilh 5 ;'lP1115 For
revlly l augh " II ' I .
. 'f " mp y U"" Ihe foll owing equal ions deri ved from the earli er exph -
nati on 0 how c!ach Si a k I ' . '
I
. " ge wor s: I Ie CirCUli for each stage would foll ow di recll y from
t lese equall on>:
ou t_9 i 11_9 + (i 11 q A a ' b ' )
ou _It - in. It + (i n_q * a ' , b)
out_q - in q * (a XNOR b)
cGJ)
a3 b3
1
a3 b3
1
b3
1
a3 b3
t
o
Stage2
a2 b2
0
a2
o
a2 b2
o
bl
(0)
Slage l
o
(b)
Slagel
cG])
al bl
al
Stagel
(d)
Figure 4.45 The "rippl ing" wilhin a magnitude comparalor.
1
bO
StageO
1
aO bO
StageO
StageO
aO bO
Stag eO
AglB
AeqB
AIIB
AglB
AeqB
AIIB
AgtB
AeqB
AIIB
glB
eqB
ItB
180
Data path Components . f and
. arator works for an IOpUt a .
Fi gure 4.45 shows how thIS camp sisting of four stages.
We can view the comparator's behavIOr as can by sellin
o
the external input I I,
5() ve star! 0 . - 1 d
In Stage3 shown in Figure 4.4 a, \ the compari son. Stage3 has 1 n_eq- ,an
to force the comparator to act uall y do ' 11 become I , whil e out _g t and out_I t
since and then ouLeq WI
wi ll become O. that since out_eq of Stage3 connects
I S 2 shown in Figure 4.45(b). we see .'11 be 1 Since and
n tage 2' in eq WI '.
to in eq of Stage2, then Srage s -t d ou t 1 t will be O.
- h' l out 9 an - .
out eq will become I , w J e - that since Stage2' s out _eq IS con-
I S- I shown in Fioure 4.45(c), we see '11 be 1 Since and b1=O,
n tage 0 J's i 0 eq WI .
nected to Stagers i o_e q, eq out_l t will be O.
out gt will become 1, whIle - that the outputs of Slagel cause
- . 445(d) we see
In StageO shown in FIgure .. h'directly causes StageO's out_gt to become
StageO's i o_g t to become I, whlc b a Noti ce that the values of a a and bO
l and causes out_eq and out_l t to e . t to the comparator's external out-
, . 0' outputs connec
are irrelevant. SIOce Slage s 8 d A 1 t8 will be O.
8
' 11 b 1 whIle Aeq an
puts, Agt WI e, h he staoes in a manner similar to a
I . les throug t " .
Because of the way the resu t npp ' 1 h' way is oft en referred to as havIOg a
. d para tor bUl t t IS II " .. b'
carry-ri pple adder, a magOltu e com thou h what 's rippling is not rea y a carry II.
carry-ripple style implementall on, even g t d straiohtforwardl y WIth another 4-bll
. an be connec eo.. .
The 4-bit magOltude comparator c. . d comparator and likeWIse to bUIld any
' Id 8-btl maglll
tu
e , ( tB
magnitude comparator to bUl an. . on outputs of one comparator Ag ,
. I b nnectln
o
the compans I I It)
size comparator, sImp y y co . 0 of the next comparator (I 9 t, eq, .
Aeq8, A 1 t8) wi th the comparIson IOpUtS of looic, and a gate has a I ns delay, then each
If each stage is built from two levels f " -ripple style 4-bit magnitude compar-
staOe wi ll have a 2 ns delay. So the delay 0. a Carryarator built with thi s style wi ll have a
is 4 stages 2 ns!stage = 8 ns. A 32-blt comp
delay of 32 stages * 2 ns!stage = 64 ns.
. . . f 0 numbers using a comparator
EXAMPLE 4.12 Computing the mll11mUm 0 tw ak I va 8-bil inputs A and 8, and OUlputs an
I mponent that t es \
We want to deSign a combmauona co magnitude comparator and an 8.bu 2xl
f A dB Wecanusea
8-btt OUlpUI C thaI IS the minImum 0 an shown In FI gure 4.46
multiplexor to Impl ement thiS component, as
MIN
A1-__
8 8
A B
8-bit magnitude comparator
8-bit
2x1 mux
8
C
(b)
(a) . . f wo numbers: <a) internal
Figure 4.46 A combinalional componenl to compule the ITIlmmUm 0 I
design using a magnitude comparator. and (b) block symbol.
4.6 Count ers 181
If A<B. UlC comparalOr's A 1 tB OUI PUI wi ll be 1. In Ihi s case. we wanl 10 pass A through the
mux. so we conneCI A 1 tB 10 Ihe 8- bil 2x I mu. selecl inpul. and A 10 Ihe mu. s I I input. If Al tB
is O. Ihen eilher Ag tB-I or Ae qB- 1. If Ag tB- I, we wanl 10 pass B. If Ae qB- l. we can pass either
A or B (since Ihey are idemic"I). and so leI 's pass B. We Ihus simply connecl B 10 Ihe 10 inpul of the
8-bil 2x I mu . In Dlher words. if A<B. we' ll pass A, and if A is nOI less than B. we'll pass B.
NOli ce that we sel the comparator's I eq conlrol inpul lO 1. and the I gt and 11 t inputs 10 O.
These values rorce the comparmor to compare its data inputs.
4.6 COUNTERS
Up-Counter
An N-bit COUllIer is an extended N-bi t register component thaI can increment or decre-
ment its own va lue on each clock cycle, when a count enable control input is I.
Illcrement means to add I. while decr emellt means to subtract I. A counter that can
increment is known as an liP-COli liter , a counter that can decrement is known as a down-
COli Iller , and a counter that can increment and decrement is known as an IIp/doWII-
COlllller. A 4-bi t Up-count er would thus count the foll owi ng sequence: 0000 , 0001.
0010 , 0011 . 0100, 0101. 0110, 0111. 1000. 1001, 101 0, 1011,
1100, 11 01. 1110, 1111. 0000 , 0001,etc. Notice that a counter wraps aroulld
(also known as rollillg over) from the hi ghest value (1111) to O. Likewise, a down-
counter would wrap around from 0 to the highest value. A control output on the counter,
often call ed termillal COUllt, or tc, becomes 1 during the clock cycle that the counter has
reached it s last (terminal) count value, aft er whi ch the counter wi ll roll over.
Figure 4.47 shows the bl ock symbol of a
4-bi t up-counter. When co t=I , the counter
increments it s own value on every clock
cycle. When the counter maintains it s
present va lue. On the cycle that the counter
roll s over from 1111 10 0000, the counter
sets tc=l for that cycle, returning tc to a on Figure 4.47 4-biI up-counter block symbol.
the next cycle.
We can design an N-bit up-counter using the
register design process described in Tabl e
4. I- the incremented value of the register
would be fed into a mux input, and the
counter's control lines would be mapped to
the mux select lines. A simpler view of an up-
counter design is shown in Figure 4.48,
assuming an incrementer component exists to
add 1 to the present value. When cnt=O. the
register should maint ain its present val ue.
When c n t = I, the register should be loaded
wi th il s present value plus 1. Note that the 4-
input AND gate causes temlinal count t c to
become 1 when the counter reaches 1111. Figure 4.48 4- bil up-counter imernal design.
182 Datapath Components
Incrementer .' ' rcuit for the incre-
We need to desion a COlllbJl1Htl Onal CI . 0 the
" . . N-bit adder. by setlln"
menter. We could sImply use an . a an N-bit
d I n to a But usm"
8 input to 0001 an tIe CaJrY- I . e looic involved in
adder is overki ll-we don t need all th t '0001 Instead.
an N-bi t adder, because 8 is always JUs . .' mber
dd' 1 to 3 bma1 )' nu
observe in Figure 4.49 that a mg ' three bits per
involves onl y two bns per column. not b rs
. 31 bmary num e .
col umn like when addmg twO gener, S ( on 4 3)
Recall that a half-adder adds two bits (see ec I 1." lf:
. . d b budt usmg 1,1
Thus. a simple mcrementel caul e
adders, as shown in Figure 4.50.
:!.
$
c
" E
u
S
Figure 4.50 4bit incremenrer: (
a) internal design. and (b) block symbol.
We could instead design an
incrementer using the combina
tiona I logic design process from
Chapter 2. We would start with a
truth table, shown in Figure 4.5 1.
We obtain each output row simply
by adding 1 to the corresponding
input row binary number. We would
then deri ve an equation for each
output. For exampl e, we can easily
see that the equati on for cO is
eO=a3aZa1aO. We can also easily
see that sO=aO . We would derive
equations for the remaining outputs,
and then implement the circuit for
each output. The resulting incre-
menter would have a total delay of
only two gate levels, which is less
delay than the incrementer in Fi gure
4.50 built from half-adders.
33
0
0
0
0
0
Inputs
32 31
0 0
0 0
0
0 1
0
0
0 0
0 0
1
1
0
0
00 cO
0 0
1 0
0 0
1 0
0 0
1 0
0 0
0
0 0
1 0
0 0
1 0
0 0
0
0 0
carries: 0 1 1
00 1 1
1
00100
Figure 4.49 Adding I to
a binary number requi res
onl y 2 bits per column.
(b)
Outputs
s3 s2 sl sO
o 0 0 1
o 0
o 0
o
o
o
o 1
o
o
o
1 1
o 0
o 1
o
1 1
o 0
o 1
o
o 1 1
1 1
o 0
o 0
o 1
o
1 1
o 0
Figure 4.51 Truth table for four-bit incrementer.
4.6 Counters 183
We could usc the same combi nat ional logic des ign process to build larger incre-
menters. Recall that we said in Section 4.3 that building adders USing the combinational
logic design process was not very practi cal. Yet here we built an incremenl er using the
combi national logic design process. A key difference to note is that a 4-bit adder has 8
inputs, whereas a 4-bit increment cr has onl y 4 inputs. Thus. we can build wider incre-
ment ers as two- level logic implementat ions usi ng the combinat ional logic design process.
Of course, at some point. even the number of inputs for an incrementer gets too large, in
whi ch case we mi ght chain smaller increment ers toget her to make a wider incrementer.
EXAMPLE 4.13 Up-counter used in the above-mirror di splay
In Example 4.4 and Example 4.6. we assumed
pressing a mode button would cause input." xy to
sequence from 00. 01. 10. 11. and back to 00
again. A simple design to achieve such sequencing.
assuming the mode input is 1 for exactl y one clock
cycle per bUllon press (sec Example 3.9), "tili zes an
up-counter. as shown in Figure 4.52.
EXAMPLE 4.14 1 Hz pulse generator using a 256 Hz oscillator
Suppose we have a 256 Hz oSci ll ntor. but we want a
Down-Counter
I Hz pul se signal. We can cOllven the 256 Hz signal
to a I Hz signal P lI sing an 8-bi t Counter. The 8-bit
COUlHer wraps around every 256 cycles. so we can
si mply connect the osci ll ator signal to the counter's
clock input, set the counter's load input to 1. and
then use the counter's tc output as the pulse signal .
as showll in Figure 4.53. A I Hz signal may be
useful for driving a clock or a wmch, for example.
since I Hz means I pul se per second.
A down-coumer can be designed simil arly to
an up-counter, repl acing the incrementer by a
decrememer, as shown for til e 4-bit down-
count er in Figure 4.54. A decrementer could
be designed in a similar manner as an incre-
menter, staning from a tnlth table like that in
Figure 4.5 1. Note that the terminal count te
becomes 1 when the down-coumer reaches
0000, implemented using a NOR gale-
recall Ihat NOR oUlputs 1 when all it s inputs
are O. The reason the down-counter detects
0000 for te, rather than 1111 like the up-
counl er. is because a down-count er wraps
around after 0000. as in the foll owi ng coum
sequence: 0100, 0011, 0010, 0001. 0000.
x y
Figure 4.52 Sequencer for xy inputs of
above-mirror di splay.
Figure 4.53 Clock divider.
1111, 1110. . Figure 4.54 dl!sign.
184 4 Datapath Components
Up/Down-Counter
An up/down-coullter can COLIIlI
either up or down. It requ ires an
input signal d i r to indi cate the
count directi on, in audition to the
count enable signal cn t. We' lI let
d i r=O mean to count up anu
d i r= I mean to count down.
Figure 4.55 shows the design of
such a .J-bit up/down-counter with
synchronous clear. A 2x I mux
passes either the decrement ed or
increment ed value. with d i r
selectin Q among the two-d i r=O
(count the incremented
dir
elr
ent
va lue and d i r = I (count down) Figure 4.55 4-bi t up/down-counter design.
passes the decrement ed value. The
passed va lue gets loaded into the 4-bit register if cnt=1. di r also selects to pass
the NOR or AND output to the terminal count tc external output-d 1 r-O (count up)
selects the AND. whil e d i r= I (count down) selects the NOR. . .
Alternati vely. we could design an up/down-counter using the regtster destgn process of
Section 4.2. by directly connecting the incrementer and decrementer outputs to muxes In front
of each flip-flop. and mapping the c 1 r. cn t. and d i r control stgnals to the mux select llIles.
Noti ce that we also added a control input c 1 r. whi ch we could have added to the
counter and down-counter too. Ihat when 1 SYllchrollol/sly clears the regt ster, mealllng
reselling the register to ali Os on a risi ng clock edge. We used a 4-bit register wi th clear to
support the clear operati on.
EXAMPLE 4.15 light sequencer
We want to design a sLri p of 8 light bulbs. such thaL the
bulbs illuminate one :1t a lime. ri ght to left , and then
repeal illuminating in LhaL sequence. The sequence
should proceed at the rate of one bulb per second. Such
a li ghting displ ay might be attracti ve outside a restau-
rant or movie the.ner. for example.
For simplicity. assume we have an osci llaror that
generate a I Hz clock signal (meaning one ri sing
clock edge per second). We ll connecl Ihis clock to a 3-
clk
(1 Hz)
lights OOOOOOOO
bi L up-counter. and connect the counter' s three outputs Figure 4.56 Light sequencer.
10 a 3x8 decoder. as shown in Fi gure 4.56.
When the power is on, the system counlS up (we don't know what the initial value of the counter
wa,. but it doesn' t reall y matter). wrapping around from 111 10 000. We don' t need the tc output
in thi s example.
Notice that we used a 3-bil COlllll er will! (I decoder, and 1101 all 8- bil COl llll er , even though
there were 8 OUlpUt S. An 8-bit counter would generate the sequence 00000000. 00000001.
00000010 .... 11111110. 11111111. That sequence is 11 01 the desi red sequence.
Counter with Parallel Load
Count ers often come wi th the
abi lily to initiali ze the count va lue,
achieved by loading the counter' s
register wi th parall el data. Figure
4.57 shows the design of a 4-bit up-
count er with parall el load. When
control input 1 d is 1. the 2x I mux
passes load data input L to the reg-
ister; when 1 d is O. the mux passes
the incremenled value. Furthemlore.
we OR the count er's 1 d and cnt
s ignals to generate the load signal
for the regi ster. When c n t is 1. the
incremented value wi ll be loaded.
When 1 d is 1. the parallel load data
wi ll be loaded. Even if c nt is 0,
1 d = 1 causes the register to be
loaded. A down-counter or up/
down-counter could similarl y be
extended to have a parall el load.
Parall el load is useful when we
want to generate a pulse signal that is
not directl y obtainable from lelting a
counter wrap around and pulse its t c
output naturall y. An N-bit counter
narurally wraps around every 2N
cycles. What if we want a pul se
every X cycles, where X is not a
4.6 Counters
Figure 4.57 Inlemal design of a 4-bit up-counter
with load.
4-bit down-counter
Figure 4.58 A counter selup thaI pul ses
t C every 9 cycles.
I8S
power of2? For example. say we have a 4-bit down-counter. which nonnally pul es the tc
OUlput and wrap around every 16 cycles. and suppose we want to pul se every 9 cycles. We
can achieve Ii pulse every 9 cycles by selling the load data input L to 9-1. or 8 (1000). and
by connecting the tc output to the load control input 1 d. as shown in Figure 4.58. When the
counter reaches its lowest value (0000). tc wi ll become 1. cau ing the 1 d inpul to become
1. Thus. on the next clock cycle, the counter will load 1000. rather than wrapping around to
1111. (Note: the load occurs on the lI exl cycle. not the present cycle. because t c changes
to I after the ri sing cl ock edge. so the new value for 1 d doe n' t gel seen until the next clock
edge.) The counter would thus count in the sequence 8. 7. 6. 5. 4. 3. 2. I. O. pulsing tc and
then return ing 10 8. The reason we load 9-1 . rather than 9. even though we want a pul e
every 9 cycles. is because we must remember that 0 is included in the count sequence-just
as Ihe count from 15 down to 0 takes 16 cycles.
We could instead u e an up-counter for the same purpose. but we must make the load
value equal to the total cycles minus the desired cycles. So for the above example. we
would use a load value of 16 - 9 = 7 (0111). The count er would count the sequen e 7. 8.
9. 10. I I. 12. 13, 14. 15. pulsing tc and then retuming to 7.
186 4 Oata path Components
EXAMPLE 4.16 New Year's Eve countdown display d d ad
I ul the numbers 59 down to O. an a ec er
in Example 2.30. we uti lized n microprocessor 10 ou P h' . 1ple we' ll repl ace the micropro-
" " b d h' l output In t IS cX3n .
(0 Il lumlll ate one 01 60 <lse on I I' ul 59 down to O. Suppose we have an 8-bit
cessor by J down-counter with parall el load to ou P 0 \1 1 cd ( 0 load 59 and then count
. . f ?55 down to . n' e nc
down-coumer avail able. whi ch can COll nt rom - I d 59 ' nl O the coumer and then the
' 11 d reset 10 oa . I .
down. Assume the can press a bUllon ca e . . (d' nl) 10 the 1 positi on (count) to
. d f Ih" a pOSI ti on on I cou
user can move n swit ch count own rOI11 L: F . 459
begin the countdown. The system implcmcn13lion is shown III Igure . .
a
abit
cO
c1
c2
c3
c4
c5
c6
c7
Ie
iO
i1
i2
i3
i4
i5
dO
d1
d2 o---., ...........
d31---........-r'\
d5a
d59
d60
d61
d62
d63
Fi gure 4. 59 Happy New Year counldown system using a down-counler.
Happy
New
Year!
fireworks
Notice thaI the tc signol is our "Happy New Year" indicoti on. We'veconnecled that signal to
an outpul called fi reworks. which we' ll assume aClivates a deVIce Ihat Ignlles fireworks. Happy
New Year!
EXAMPLE 4.17 1 Hz pulse generator using a 60 Hz oscillator
In the U.S .. electricil Y 10 Ihe home operates as an alternating current with a frequency of 60 Hz.
Many appliances convert Ihis signal to a 60 Hz di gital signal. and then convert the 60 Hz dlgll,1
signal to a I Hz signal. 10 drive a clock or olher device needing to keep track of lime at Ihe granu
lari ty of seconds. Unli ke Exampl e 4.2. we can' t
simpl y use a counter of a parti cul ar bilwidth. since no
basic up-counter wraps around aft er 60 cycles-a 5-
bit counter wraps around every 32 cycles, whil e a 6 6-bit up-counter
bit counter wraps every 64 cycl es. Let's start with a csc
6-bit counter, whi ch counts from 0 to 63 and then (60 Hz)
wraps around to O. We' II add some some extra logic,
as shown in Fi gure 4.60. The extra logic should
detect when Ihe counter has counted up 10 59, and
should clear the counter back to 0 on the neXl ri sing
clock edge ralher Ihan lelt ing the counter continue
count ing to 60 and beyond. Fi fty- nine as a 6- bit
bi nary number is 111 a 11. Thus the AND gale in
Figure 4.60 detects 111 a 11. in whi ch case the AND Figure 4.60 Clock divider.
e
Timer
We load 999.
rather ,hall 1000.
bC('(lIIse we mus,
remember ,"m 0
;s parr o/the
CO/U/I. Tr\'
cOlmlillg from 9
dOll'll fO D. raising
(I fi nger ('(lch lime
)'011 say a I lIImber.
No/ice {"m H'hell
roll remc/t D. fell
fingers are lip.
4.6 Counters 187
gale output set the COunt er clear input to I . We assume the counter's clear input clr lakes prece-
dence o:er the Counter's count input ('nt. Since the AND output wi ll pulse every 60 cycles
and the Input clock frequency is 60 Hz. this circuit convcns a 60 Hz input clock into a J Hz output
clock. A circliit thm convcns an input clock into a new clock wi th a lower frequency is known as a
clock diJllder.
A common use of a count er is as the central component wilhin anolher device call ed a
li mer. A limer is a speci al type of count er that mea ures time. Measuring time is a very
common task in a di gital system.
One type of timer is based on " down-counter. We store a value into the counl er. and
wail for the terminal count (0) to be reached. If we know the count er's oscill ator fre-
quency. then we can load " value corresponding to a des ired time int erval. For example.
SUppose we Want 10 know when one second has passed. usi ng a counter havi ng a clock
frequency of I kHz. We woul d thu load 999 (in binary. meaning 1111100 Ill) into the
counter and enable count ing. Aft er I second, the counter woul d reach 0 and assen its ter-
minal count outpu!. notifying us that I second has passed. A timer may repeat this
process aut omati cally, using the terminal count to automat icall y reload the de ired time
va lue (999 in our exampl e) int o the count er. Such a timer mi ght be used in any type of
watch or clock. Our earli er three-cycl es-hi gh laser timer (from chapter 3) coul d have been
bui lt using a timer component. especiall y if in tead of wanting the laser high for three
cycles, we want ed the laser hi gh for a peri od of time like 1.5 seconds.
Another type of timer is based on lin up-counter. We reset that counter to O. and then
enabl e counting when some event occurs that we want to time. When the event ends. we
di sable the counter, aft er whi ch the Counter contains the number of cycles that occurred
during the event. Knowing the time of one clock cycle. we mUltipl y the number of cycles
by the time of one cl ock cycl e to obt ain the total time for the event . For example. if we
time an event as lasting 500 cl ock cycles. and the timer' s oscill ator freq uency i I kHz.
then the time for the event was 500 cycl es * 0.00 1 slcycl e = 0.5 s. We ill ustrate this type
of timer using all exampl e.
EXAMPLE 4.18
Highway s peed meas uring system
Many hi ghways and freeways have ystems that measure the speed of car at various parts of the
hi ghway and upl oad Ihal speed information to a cenl ral computer. Such inforn1a tiol1 is used by law
cnrorcemcnr, traffi c planners. and radio nnd Internet traffic rcpons.
One technique ror measuring the speed of a car use two sensors embedded under the road. 3S
ilJ ustraied in Figure 4.61. \Vhen a car is over a sensor, the sensor ourputs n 1: otherwise. the sensor
outputs a O. A sensor' s output travels on underground wires to a speed-measuring computer box. some
of which are above Ihe ground and others of which are underground. The speed measurer delermines
speed by di viding Ihe distance betwcen the sensors (which is fixed and known) by the time required for
:l vehi clc to Lravel frollllhe first sensor to the second sensor. If the distance between the is 0.01
miles, and a vehicl e takes 0.5 seconds to tr3vel from the first 10 the second sensor. then the ,elucle's
peed is 0.01 miles I (0.5 seconds ( I hOll r 13600 seconds)) = 72 mile per hour. .
To measure the lime between the sensors. we can con truct a imple controls:1 16-bu
timer. as shown in Figure 4.61. St ate SO clears the timer to O. The transition, 10 ' tate J \\ hen
a car passes over the first sensor. 51 starts Ihe timer counting up. The F M stays in J until the 3r
188 4 Datapath Components
,
,
!..-----------a
Figure 4.61 Measuring vehicle
speeds in a highw<.Jy speed
measuring system.
(a)
b'
(b)
'lSSCS over the o;::ccond 'ensor. causing a transi ti on 10 swtc 52.52 SlOpS counti ng .'lIld computes
',i;11C lIsino limcr"'s out put C. Assuming a I kHz clock input to the timer. each cycle
is 0.00 1 theillhe time would be C * 0.001 s. :hal be by D.?"
3600 to the speed. We omit the impiementali on detail s of tht; speed computatIOn, which
would Illost likel y be implcmented as soft warc 011 a microprocessor.
HOW DOES IT WORK? CAR SENSORS IN ROADS.
How does a highway speed sensor or a traffic li ght car sensor know
mat a car is present in a parti cular lane? The main method t.oday uses
what' s call ed an inductive loop. A loop of wire is placed Just under
the pavement-you can usually see the cuts, as in Figure 4.62(a).
loop of wi re has a particular "inductance," which is an electrol1lcS
tenn describing the wire's oppositi on to a change in eleclIic current-:-
higher inductance means the wi re has higher opposition to changes 10
current. It turns out that placing a big hunk of met al (like a car) near
the loop of wire changes the wire's inductance. (Why? the
metal di srupts the magnetic field created by a changing current In the
wire-bul that's getljng beyond our scope.) The traffic li ght c,antral
ci rcuil keep checking Ihe wire's induclance (perhaps by Irylng 10
change the current and seeing how much the current reaJly changes In
a certain time period). and if inductance is more than nonnal , the
circuit aSSumes a car is above the loop of wire.
Many people lhink Ihal Ihe loops seen in Ihe pavemenl are scales
that measure weight-I've seen bicyclists jumping up and down o.n
the loops trying 10 gel a lighl 10 change. ThaI doesn'l work, bUI II
sure is entenaining to watch.
Many others believe Ihal small cylinders a((ached 10 a Lraffic lighl 's
suppon anns, like Ihal in Figure 4.62(b), delecl vehicles. Those inslead
are Iypically devices illal delecl a special encoded radio or infrared-lighl
signal from emergency vehicles, causing the traffic li ght 10 tum green
for the emergency vehicle (e.g .. 3M's "Oplicom" syslem). Such systems
are anolher example of digilal syslems, reducing the lime needed by
emergency vehicles 10 reach the scene of an emergency as well as
reducing accidents involving the emergency vehicle ilself proceeding
Ihrough a traffic light, thus often saving lives.
(b)
Figure 4.62 (a) Inducti ve loop for
delecling a vehi cle on a road, (b)
emergency vehi cle signal sensor for
changing an intersecti on's traffic
lighl 10 green for Ihe approaching
emergency vehicle.
4.7 Multiplier-Array Style
189
4.7 MULTIPLIER-ARRAY STYLE
An NxN lIIulliplier is a d'II 'IP' tl h ' . .
. , , a 1 component I at mul tipli es two N-blt inpul binary
Illl.mbers A (Ihe multiplkand) nnd B (Ihe multi plier). and OUIPUIS an (N+N)-bi t result. For
exampl e, an 8x8 muili pli er multipli es 11'0 8- bil bi nnry numbers and OUIPUIS a 16-bil
result. Deslgnlllg an NxN multipli er in 11'0 levels of logic using the siandard combina-
ti onal deSi gn process wi ll result in 100 complex of a design. as we've al ready seen for
prevIous operati ons like addit ion and compari son. For multipli ers wilh N grealer than 4 or
so, we need a more effi cienl melhod .
. We can creale a reasonabl y sized multipli er by mimick ing how we perl'onn multipli-
call an by hand. ConSider multiplying 111'0 4-bil binary numbers 0110 and 0011 by hand:
OllO
001l
0110
0110
0000
+0000
(Ihe lap number is call ed the lIIultiplicalld)
(Ihe bOll om number is call ed Ihe IIIl1ltiplier)
(each row below is call ed a partial product)
(because Ihe righlmoSI bil of Ihe multipli er is 1. and 0110*1 =0110)
(because Ihe second bil of Ihe multipli er is 1, and 0110*1 =0110)
(because Ihe Ihird bil of Ihe multipli er is O. and 0110*0=0000)
(because Ihe leflmOSI bil of Ihe mullipli er is O. and 011 0*0=0000)
00010010 (Ihe product is Ihe sum of all Ihe panial producls: 18. which is 6*3)
Each panial prodUCI is easi ly oblained by ANDing Ihe presenl multiplier bit wilh lhe
multipl icand. Thus. multipli cation of IWO 4-bil numbers A (a3a2alaO) and B
(b3b2 blbO) can be represenl ed as fo ll ows:
a3 a2 al aO
X b3 b2 b 1 bO
- - - - - - - - - - - - - -- - ---------------- ----
bOa3 bOa2 bOal bOaO (ppl)
bla3 bla2 bl al bl aO 0 (pp2)
b2a3 b2a2 b2al b2aO 0 0 (pp3)
+ b3a3 b3a2 b3al b3aO 0 0 0 (pp4)
- --- --- - ------ -- --- - -------
- - - - - - - --
p7 p6 p5 p4 p3 p2 pI pO
Afler generaling Ihe partial produclS (pp l. pp2. pp3. and pp4) by ANDing the preselll
mu lli plier bil wilh each mullipl icand bit. wc me rely need 10 sum those partial products
together. We can use Ih ree adders of varying widths for compuling Ihat sum. The resulting
design is shown in Figure 4.63.
Th is design has a reasonable size. about Ihree times bigger than a carry-ripple adder.
The desi gn has reasonable speed. The delay consists of I gate-delay for generating the
partial producls. plus Ihe delay of Ihe adders. If each adder is a carry-ripple adder. then
the 5-bil adder delay wi ll be 5*2 = 10 gate-delays, Ihe 6-bi l adder delay will be 6*2 = 12
gale-delays, and Ihe 7-bil adder delay will be 7*2 = 14 gate-delays. If we a sume lhat lhe
10lal delay of Ihe adders is simpl y Ihe sum of lhe adder delays. Ihen the lotal delay would
Ihus be I + 10+ 12 + 14 = 37 gale-delays. However. Ihe 100ai delay of carr -ripple adders
when chained logelher is aClually less Ihan Iheir sum-see Exercise 4. 15.
190
Datapath Components
a3 a2 a1
aO
A B
; Block symbol p7 .. pO
Figure 4.63 Inlernal design of a 4-bil by 4- bil array-SlYIe ll1ullipl ier.
Delays for larger multipliers. whi ch lVili have an even longer chain of adders, lVi li be
even slolVer. Faster mUliplier des igns are possibl e. al Ihe expense of more gates.
4.8 SUBTRACTORS
An N-bit slIblracl or is a datapath component that takes two N-bit binary inputs A and B.
and output s an N- bit resull 5 equaling A- B.
Subtractor for Positive Numbers Only
Subtracti on gets sli ghtl y more complex when we consider negati ve results, like 5 - 7 = -2,
because thus fa r we haven' t di scussed representati on of negati ve numbers. For now, let 's
assume we are on ly dealing with positive numbers. so the subtractor's inputs are posit ive,
and the result is always positi ve. This cou ld be the case, for example, when we are
designing a system that onl y subtracts small er numbers from larger nu mbers. such as when
compensating a sampled temperature that wi ll always be greater than 80 using a small
compensati on value that will always be less than 10.
Designing an N- bit subtractor using the standard combinati onal logic design process
suffers from the same exponenti al size growth problem as an N-bit adder. (See Section
4.3.) Instead. we can again try to mimi c subtracti on by hand in hardware.
Figure 4.64 shows subtracti on of 4-bit binary numbers "by hand." Starting wi th the
nrst column, we see that a is less than b (0 < 1). necessitating a borrow from the pre-
vious column. The nrst column result is then 10 - 1 - 1 (in base ten, two minus one
equals one). The second column has a a for a because of the borrow by the nrst column,
making a < b (0 < 1), generating a borrow from the third column- which must
4.8 Subtractors 191
itself borrow from the fa nl I
. u 1 co umn. The result of the second column is then 10 _ I _
1. The third column bec f h b
'. ,ause a I e arrow generated by Ihe second column. has an a of
1, whIch IS nOi less than b I If ' . .
. so 11e resu l athe Ihlrd column IS I-I The founh col umn
has a=O due 10 Ihe bo f h . .
0-0=0. rrow rom I e Ihlrd column. and smce b is also 0, the resull is
- 0
l si column
o
o % 10
2nd column
o 1 10
..y l{) ..y 0
- 0
1
(b)
(a)
3rd column
o 1
..y 0
-0
41h column
o
..y 0 0
- 0
o 0
(e)
Figure 4.64 Design of a 4-bil sublraClor: (a) subtracli on "by hand". (b) borrow-ripple
Impl ementati on with four full -subtraclors \vi th a borrow-in input wi. and (c) bl ock symbol.
Based on the above-described behavior. we coul d create the internal design of 3 full-
subtractor combinat ional component to implement the behavior of each col;mn. with a
full- subtractor having an input wi representing a borrow by the previous column. and an
output wo representing a borrow from the next column. in addition to the inputs a and b
and the output s. (We use w's for the borrows rather than b's becau e b is already used
for the input : the IV comes from the end of the word borrow.) We leave the design of a
fu ll -subtractor as an exercise for the reader.
EXAMPLE 4.1 9 DIP-switch-based adding/subtracting calculator
In Exampl e 4.8. we designed a simple ca/culalor Ihal could add IWO 8-bil bi nary numbers and
produce an 8-bil resuli . using DIP switches for inpuls. and a regisler plus LEDs for outpUI. LeI'
extend thai calculator to tlllow the user (0 choose 311l ong addi tion and subtraction operations. \Vc'l!
introduce a si nglcswil ch DIP switch that CIS a signal f (for "function") as another sy (em input.
When f =0. Ihe calculator should add: when f l. Ihe calcut ator shoutd subtr:lcl.
One illlplemcnlnti on of thi s calcul ator would use an adder. a subtractor. and 3 multiplexor. as
in Fi gure 4.65. The f inpul chooses whi ch component . the adder or sublraclor. 10 pass through the
I11UX to (he register inputs. \Vhen the user presses e. ei ther the addition or subtrnclion result gets
loaded inlo Ihe regisler and displ ayed al Ihe LEDs.
Thi s exampl e assumes the result of a subtracti on is always :l positive number. negathe.
It also assumes thm the result is always between 0 and 255.
192 4 Datapath Components
Figure 4.65 8- bil DlP-swi lch-
based adding/subtracting
c"lcul mor. Inpul f sc lecls
between addition and
subtraction.
1
o
DIP switches
CALC
I OOOOO.O'/ LEDS
EXAMPLE 4.20 Color space converter- RGB to CMYK
Comput er moni tors. di gi tal cameras. scanners, primers, and other electroni c devices deal with
color images. Those devices Lreal an image as milli ons of tiny pixels (short for "pi cture ele-
mems"). which are indi visible dots representing a tiny part of the image. Each pi xel has a color, so
an image is j ust a coll ecti on of colored pixels. A good computer monitor may support over 10
milli on uni que colors for each pixel. How does a monitor create each unique color for a pixel? In
a common method used in what are known as RGB monitors. the moni tor has three li ght sources
inside-red, green, and blue. Any color of li ghl can be crealed by adding spec ifi c inl ensities of
each of the three colors. Thus. for each pixel. the moni tor shines a specifi c intensit y of red. of
green, and of blue at that pi xel's locati on on the monitor's screen. so thai the three col ors add
IOgelher 10 creale Ihe desired pi xel color. Each subeolor (red, green, or blue) is Iypicall y repre-
sented as an 8- bit binary number (thus each ranging from 0 to 255), meaning a color is represented
by 8+8+8=24 bils. An (R. G, B) value of (a, 0, 0) represems bl ack. ( la, 10. 10) represenl s a very
dark gray, while (200, 200, 200) represenls a li ght gray. (255, 0, 0) reprcsenlS red, whi le ( 100. 0,
0) represents a darker (noninl ense) red. (255, 255, 255) represenls while. ( 109, 35. 201 ) rcpresellis
some mixture of the three base colors. Representing color lIsing intensity values for red. green.
and blue is known as an RGB color space.
ROB color space is great for compuler monitors and cert ain other devices, but not the besl for
some other devices, like pri nters. Mi xing red, green. and blue ink on paper will not result in white,
but rather in black. Why? Because ink is not li ghl ; ralhcr, ink reReCis li gh!. So red ink refleClS red
lighl, absorbing green and blue li gh!. Likewise, green ink absorbs red and blue li gh!. Blue ink
absorbs red and green li gh!. Mi x all Ihree inks logelher on paper, and the mi xlUre absorbs olf lighl,
reRecti ng none, Ihus yielding bl ack. Printers Ihererore use a differenl color space based on the com-
plementary colors or red/greenlblue, namely, cyan/magent a/yell ow, known as a eMY color space.
Cyan ink absorbs red, reRecting green and blue (Ihe mix ture of whi ch is cyan). Magenta ink
absorbs green Ii ghl , reRecling red and blue (whi ch is magema). Ye ll ow in k absorbs blue, rcRecling
red and green (which is yell ow).
Notice a color printer may have
three color 10k cartri dges, one cyan. one
magenla. and one yellow. Figure 4.66 shows
ink cartri dges for a particular color
pnnter. Some printers have a single cart ride:c
for of three. wi th Ihal single
cart ndge lIltemally contai ning separated
nui d compartments for the three colors.
A printer must convert a received RCB
inKlge into CMY. Let's design a fast circuit
to perform thut conversion. Given three 8-bit
values fa: R. C, and B for a part icular pixel.
the equati ons for C. M. and Yare simpl y:
C 255 R
M 255 G
Y 255 8
(255 is the maximum value of an 8-bi t
number). A circui t to perform such conver-
sion can be built using subtractors. as shown
in Figure 4.67.
Actuall y. Ihe conversion needs 10 be
slighll y more complex. Ink isn' l pcrrcci.
meani ng that mi xing cyan, magenta, and
4.8 Subtractors 193
Figure 4.66 A color pri nter mixes cyan. magenta.
and yell ow inks 10 create any color. The pi cture
shows inside a color printer having those three
colors can ridges on Ihe righl. labe led C. M. and Y.
Such pri mers may usc black ink direcll y (Ihe big
cnnridgc on the left). ruther Ihan mi xing the three
colors. to make gr:.Jys and bl acks, in order to creale
a better-looking black and to conserve the more
e,xpcnsive color inks.
yell owyields a black Ihal docsn' l look as black as you mighl cxpeCi . Funhennore. colored ink.s are
expenSive c?l11.pared 10 black ink. Therefore. color printers use black ink whenever possible. One
way 10 maXimize usc of black ink is to factor out Ihe black from the C. M. and Y values. In other
words, a (C, M. Y) value or (250. 200. 200) can be Ihoughl of as (200. 200. 200) plus (50. O. 0).
>-
:2
u
2
<D
"
a:
Figure 4.67 RGB 10 CMY converter.
Figure 4.68 RGB 10 CMYK convener.
Datapath Components
The (200. 200, 100). whi ch is i.I li ght gray. call be generated using black i nk. The remaining (50, O.
0) can be generated lI sing a small amoun t of cyan. and no or yell ow ink at all , thus
savi ne. precious color ink. A CMY color :-. pace c.xtcnded with black IS knowll as a CA1YK color
spnce- (the "K" comes from the last Jetlcr in the word "black'" " K " is used instead of "8" 10 avoid
confusion with the " B" frol11 " blue"),
An RGB to CMYK conver1er can thus be described ;1S:
K Min imu m (C . M. Y)
C2 C K
M2 M - K
Y2 Y - K
where C. M. ;lnd Y are defi ned as ear lier. \Ve thus create the circuit in Fi gure 4.68 for convening an
RGB color space 10 a CMYK color space. We've used the RGBloCMY component from Figure
4.67. \,Vc've al so used two of the MIN component lhat we created in Example 4.12 to
comput e the minimum of two !lumbers: using twO such components computes the minimum of
three numbers. Finally. we use three more subtractors to remove the K va lue from the C. M, and Y
values. In a rcal primer. the imperfections of ink and paper requi re even more acijllsllneill s. A more
rea li sti c color space convener mult iplies the R. G. and B values by a seri es of constant s, which can
be described using matrices:
I C I I mOO mO 1 m02 I I R I
Iml 0 mll m12 1* I GI
IYI I m20 m2 1 m22 I I BI
Further di scussion of such a matri x- based converter is beyond the scope of thi s exampl e.
Representing Negative Numbers: Two's Complement
The subtractor design in the previous section ass umed we onl y dealt with positi ve input
numbers and positi ve result s. But in many systems, we may obtain result s that are nega
ti ve. and in fact. our input values may even be negati ve numbers. We thus need a way to
represent negali ve numbers using bilS.
One obvious but not very effecti ve represent ati on is known as signed-magnitude. In
thi s representati on. the highest-order bi t is used onl y to represent the number's sign, with
o meaning positi ve and 1 meaning negat ive. The remaining low-order bits represent the
magnitude of the number. In thi s represent ation. and using 4-bi t numbers, 0111 would
represent +7. whil e 1111 would represent -7. Thus, four bits could represent -7 to 7.
(Notice. by the way. that bot h 0000 and 1000 would represent 0, the former representing
O. the laller -0.) Signed- magnitude is easy for humans to understand, but doesn' t lend
itself easily to the design of simpl e arit hmetic component s li ke adders and subtractors.
For example. if an adder's inputs use signed-magnitude represent ati on, the adder would
have to look at the hi ghc t- order bit. and then internall y perform either an additi on or a
subtraction, using different circuit s for each.
Instead. the most common method of representing negati ve numbers and performing
subtraction in a di git al system actuall y uses a tri ck that all ows u to lise (III adder 10
pelform subtractiOIl . Using an adder to perform subtract ion would enable us to keep our
simple adder. and to u e the same component for both additi on and subtract ion.
The kcy to performing subt racti on using addit ion li cs in what are known as comple
mellts. We' ll first inlroduce complement s in the base ten numbcr system just so you can
We (/re
illiroducillg l ell's
complell/ em jll SI
/or illllliliol/
purposes- we '1/
(l clltally be usillg
11\'0 's complemelll,
familiari ze yourself with the concept. but bear in mind that the
mt enll on IS to use compl ement s in base two. nOt base ten.
Consider subtraction invol ving two single-digit base ten
numbers, say 7 - 4. The result should be 3. Let' define the
complemellt of a single-digit base ten number A as Ihe mlmber
Ihal ,vhell added 10 A res,,/Is ill a S"III of lell. So the comple-
ment of I is 9, of 2 is 8, and so on. Figure 4.69 provide the
compl ements for the numbers I th rough 9.
The wonderful thing about a compl ement is that you can
use It to subtracti on uSing addit ion. by repl acing the
number bemg subtracted with its compl ement. then by adding,
and then by finall y throwing away tJ, e carry. For example:
7 - 4 -) 7 + 6 13 -) t 3 3
4.8 Subtracters 195
1-9
2-8
3-7
4-6
5-5
6-4
7-3
8-2
9-1
Figure 4.69 Complements
in bnse ten.
We replaced 4 by its compl ement. 6, and then added 6 to 7 to obtain 13. Finally. we
then threw away the carry. leaving 3. which is the correct re ult. Thus, we perforllled sub.
lr(l Cl fOli uSing oddi/ioll.
complements
Adding the complement results in an answer
exactly 10 too much - dropping Ihe lens column gives
the right answer.
Figure 4.70 SUbtracting by adding- subtracting a number (4) is the same as adding the number"
compl ement (6) and then droppi ng the carry. since by definition of the compl emenl. lhe resul t will
be exactl y 10 too much. Arter all . that's how the complement was defined- the number plus its
complement equals 10.
A number line helps us visualize why complement work. as shown in Figure -1.70.
Complements work for any number of digits. Say we want to perfonn ubtracti on
using two two-digit base ten numbers. perhaps 55 - 30. The complement of 30 would be
the number that when added to 30 results in 100. so the complement of 30 i 70. - - + 70
is 125. Throwing away the carry yields 25. whi ch is the correct result for 5: - 30.
So using compl ements achi eves subtracti on using addition.
"Not so fast! " you might say. In or ler to determine the complement. don't w{, have to
perform subtraction? We know that 6 is the complement of 4 by computing 10 - = 6.
We know that 70 is the complement of 30 by computing 100 - 30 = 70. 0 haven't \\ e
just moved the subtracti on to another step-the step of computing the complement'?
196 4 Datapath Components
Two'scomplemellr
call he compllled
s imply by
ifli'erti"8 the bits
and adding J-
IhllS al'oiding the
needior
slI brracrion Il'hen
computing a
complement.
The highest-order
bit in two 's
complemem aClS
as a si8" bit: 0
means pOJili ve,
I mean.' negati ve.
Yes. Except. it lUms out that ill base two, we call compute rite complemel1{ ill a milch
simpler way-jllsl by inverling all Ihe bils alld addillg J. For exampl e, cons ider com-
puti ng the compl ement of the 3-bit base- two number 00 1. The complement would be the
number that when added to 001 yields 1000-you can probably see that the complement
should be 111. Using the same method for computing the compl ement as we did in base
ten, we comput e the two's complement of 001 as: 1000 - 001 = Ill-so III is the
complement of 00 1. However, it just so happens that if we invert all the bits of 00 1 and
add 1, we get the same result! Inverting the bits of 00 1 yields 110: adding 1 yields
110+ 1 = Ill-the correct compl ement.
Thus, to perform a subt racti on, say all - 00 1, we would perform the following:
a ll - 001
- ) all + (( 001 ) ' +1 )
all + ( 110+1)
=011+11 1
= 1010 (throwaway the carry)
- ) 010
That's the correct answer, and didn ' t involve any subtractions-onl y an invert and
addi ti ons.
We omi t di scussion as to why one can compute the compl ement in base two by
inverting the bits and adding I -for our purposes, we just need to know that that trick
works for binary numbers.
There are actuall y two types of complements of a binary number. The type we' ve
been using above is known as the two's complement, obtained by inverting all the bits of
the binary number and adding 1. Another type is known as the olle's complemellt, which
is obtained simpl y by inverting all the bits, without adding a 1. The two' s complement is
much more commonly used in di gital circui ts and results in simpler logic.
Two' s complement leads to a simple way to represent negati ve numbers. Say we have
four bits to represent numbers, and we want to represent both positive and negative num-
bers. We can choose to represent positive numbers as 0000 to a 111 (0 to 7). Negative
numbers would be obtained by taking the two's complement of the positive numbers,
because a - b is the same as a + (-b)' So - I would be represented by taking the
two's complement of 000 1, or( 000 1 ) '+ 1 = 1110+ 1 = 1111. Likewise, -2 would
be (00 10) ' +1 = 1101+1 = 1110.-3 would be (0011 ) ' +1 = 1100+1 = 1101.
And so on. -7 would be (all]) '+1 = 1000+1 = 1001. Notice that the two' s com-
pl ement of 0000 is 1111 + 1 = 0000. Two' s compl ement represent ati on has only one
representation of 0, namely, 0000 (unlike signed-magnitude representation, which had
two representations of 0). Also not ice that we can represent - 8 a 1000. So two's com-
pl ement is Slightly asymmetri c, representing one more negative number than positive
numbers. A 4-bit two's-complement number can represent any number from -8 to +7.
Say you have 4- bit numbers and you want to store-5. - 5 would be (0101) '+1
1010+1 = 1011. Now you want to add -5 to 4 (or 0100). So we s imply add: 1011 +
a 1 a a = 1111, which is -I-the correct answer.
Note that negati ve numbers all have a 1 in the hi ghest-order bit; thu . the hi ghest-
order bit in two' s compl ement is often referred 10 as the sign bit, a indi cating a positive
number, 1 a negative number.
4.8 Subtractors 197
If you Want to know the n . d f' .
. . I agnuu e a a two s complement negatIve number, you Can
obtall1 the magDl tude by taki ng the two's complement again. So to determine what
number 1111 represents, we can take the two's complement of 1111: (1 111 ) ' + 1 =
0000+1 .= 000 1. We put a negative sign in front to yield -0001, or-I.
. A qUI ck way for humans to mentall y figu re out the magnitude of a negative number
ln 4-bn two's compl ement (having a 1 in the hi gh order bit) is to subtract the magnitude
of the three lower bits from 8. So for 1111 , the low three bits are 111 or 7, so the mao -
nnude IS 8 - 7 = I, which in -tum means that 1111 represents _ I. For an 8-bit two':s
compl ement number, we would subtract the magnitude of the lower 7 bits from 128. So
10000111 would be-(128-7) = - 12 1.
. To sum,,:,ari ze, we can represent negati ve numbers using two's complement represen-
tall on. AddulOn of two' s compl ement numbers proceeds unmodifi ed-we j ust add the
numbers. Even if one or both numbers are negati ve, we simply add the numbers. We
perform subtractI on of A - 8 by taking the two' s complement of 8 and then adding that
two's complement to A, res ulting in A + (- 8) . We compute the two's complement of 8
by simpl y inverting the bits of 8 and then adding 1.
Building a Subtractor Using an Adder and Two's Complement
With knowledge of the two's complement representa-
ti on, we can now see how to subtract using an adder. To
compute A - 8, we compute A + (-8) , which is the
same as A + 8 ' + 1 because - 8 can be computed as
8 ' + 1 in two's complement. Thus, to perform subtrac-
ti on, we invert 8, and input a 1 to the carry- in of an
adder, as shown in Fi gure 4.7 1.
Adder/Subtractor
Figure 4.71 Two's complement
subtractor buill with an adder.
sub
b7 b6
.:tE:ft\SUb
IvY'
, ..
~ ... _----- ..,,/
adder's B inputs
(b)
We can strai ghtforwardl y design
an adder/subtractor component ,
havi ng an input sub, such that
when s u b= 1 . the component sub-
tracts, but when sub=O, the
component adds, as shown in
Fi gure 4.72(a). The N-bit 2x I mul -
tipl exor passes 8 when sub=O.
and passes 8 ' when sub=l. sub
is connected to C in also, so that
c i n is 1 when subtracting. Actu-
all y, XORs can be used instead of
the inverters and mux, as hown in
Figure 4.72(b). When sub=O, the
output of XOR equals the other
input 's value. When sub=]' the
output of the XOR i Ihe inverse of
the other input' s value.
Figure 4.72 (a) 1\1'0'5 complement adderl ubtrn tor
using a I11UX. and (b) allemative circuit for Busing XOR
gate.
198 4 Datapath Components
EXAMPLE 4.21 DIP-swltch-based adding/subtracting calculator (continued)
Let's revisi t our DIP-switch-based 3dding/subtfaCling calculator of Example 4. 19. Ob ervc Lh at at
any ojvcn lime the Olil pUI displays the results of either the adder or subtraclOr. ,but ,never both
Thus. we rcall y don', need both an adder and a. sublraclOf In parallel;
instend. we can li se a single adderlsubtraClOr component. DIP swltc.hes have been set,
setting f ""0 (add) verMIS f 3 1 (subtract ) should result in the foll owlIl g computations:
00001111 + 00000001 00010000
00001111 - 00000001 00001111 + 11111110 + 1
00001110
\Ve achieve thi s simply by connecting f 10 (he 5 u b input of the as shown in
Figure 73.
Figure 4.73 S-bil DIP-
swilch-based adding!
subLracting calculator. using
an adder/s ublractor and
two's complement number
representation.
DIP switches
Le('s consider signed numbers using (Wo's complement. If the user is unaware that two's com.
plement represcntation is bei ng used and the user will only be inputting posi ti ve numbers using the
DIP witches. Ihen Ihe user should only use Ihe low-order 7 swi lches of the 8-switch DIP inputs,
leaving the eighth switch in the 0 position. meaning the user can only input numbers ranging from
0 (00000000) to 127 (0111 I Ill). The reason the user can'l usc the eighth bit is that in two's
complement representation. making the highest-order bit a 1 causes the number to represent a neg-
ative number.
If the uScr iii aware of two's complement, then the user could use the DIP switches to represent
negative number too. from - I (1111111) down 10 - 128 (10000000). Of course. the user will
need to check Ihe lefimoSI LED 10 delerminc whclher Ihe outpul represent. " posilivc number or a
negali ve number in two's complement form.
Detecting Overflow
When we perform ari thmeti c using binary numbers of a nxed bit width. sometimes the
result i, wider than the fixed bitwidth, a si tuation known as overflow. For example, can.
ider adding two 4-bit binary numbers (just regular bi nary numbers for now, not two's
complement numbe,,) and storing the result as another 4-bit number. Adding 1111 +
0001 yields a re_ult of I OOOO-a 5-bit number. which i, bigger thnn the 4 bi lS we have
to store the re, ult. In ot her words. 15 + I = 16, and 16 require 5 bi" to repre em in
-----_ .. _---
4.8 Subtractors 199
binary. We can easil y detect overfl ow when adding two binary number simply by
100kll1g m the carry-out bit of the adder- if the carry-out bi t is 1. overflow has occurred.
So a 4-bl t adder adding IIII + 0001 would output 1 + 0000. where the 1 is the
carry out-i ndi cming overfl ow.
When using two's complement
numbers, detecting overflow is
somewhat more compli cated.
Suppose again we have 4-bi t
numbers but now those numbers are
in two's complement form. Con-
sider the additi on of two posi ti ve
numbers, such as 0111 and 000 I
in Figure 4.74(a). A 4-bi t adder
would output 1000, but that is
incorrect-the result of 7 + I should
be 8, but 1000 represents -8 in
two's complement. The problem is
that the largest positive number we
can represent in 4-bittwo's compl e-
sign bits
(0\ 1 1
0
(j)ooo
overflow
(a)
r;\ 1 1 1
A:Jooo
@11 1
overflow
(b)
rl 0 0 0
(j) 1 1 1
no overflow
(c)
If the numbers' sign bits have the same value. which
differs from the resuWs sign bit, overflow has occuned.
Figure 4.74 Two's complemem o'erflow
detection comparing sign bits: (3) when adding
two po itive numbers. (b) when adding {Wo
negative numbers. (c) no overflow.
ment form is 7. Thus, when adding two positive numbers. we can detect O\'erflow by
checking whether the most significant bit is a 1 in the result.
Likewise, consider the addit ion of two neati ve numbers. such as 1111 and 1000 in
Figure 4.74(b). An adder would output a of 0111 (and a caIT) out of 1). 0111 i
incorrect: - I + -8 should be -9. but 0111 is +7. The problem is that the mo t negative
number we can represent with 4-bit two' complement i -8. Thus. when adding two neg-
ative numbers. we can detect overfl ow by checking whether the mo t ignificant bit is a a
in the result .
Notice thaI adding a po itive with a negati ve. or a negative with a positive. can never
result in overflow. The result wi ll always be less negati "e than the moot negati\e number.
or less positive than the most positive number. For example. the extreme i the addition of
-8 + 7. whi ch is - I. Increasing -8 or decreasing 7 in that addition still re ults in a number
between -8 and 7.
So detecting overflow in two's complement iovo" es detecting that both input
numbers were positi ve but yielded a negati ve result. or that both input numbers were neg-
ative but yielded a positive result. Restated. detecting overflow in 1\\0' complement
involves detecting that the sign bit ' of bOlh inputs are the same as one another but differ
from the result 's sign bit. If we call the sign bit of one input a and the . ign bit of the other
input b. and the sign bit of the result r . then the following equllti n outputs I \\ heo there
is overflow:
ove rflow - abr ' + a'b'r
Although the cireuit implementing the above o\t'fflO\\ del'ction equation is quit
simpl e Hnd illluiti vc. we cun cre:tte an e\en simpler circuit if our adlkr gen r:uc!\ 3
out. The simpler method merel) ompare ' the can, into the 'Ign 1>11 alumn \\ ith the
arry out of the sign bit column-if the calT) in allll ';In, (lut dlll>r. \)\emo\\ h <
occurred.
200 Datapath Components
Figurc 4.75 illustrates thi s
1 0 0 0 0 0 0
method for several cases. In Figure
4.75(a). the carry into the sign bi t is l.
whereas the carry out is O. Because
the carry in and carry Oll t difTer. over-
flow has occurred. A circuit detecti ng
whether two bits dirfer is j ust an XOR
gatc. whi ch is slightl y simpl er than
the ci rcuit or the previous mcthod. We
omit discussion as 10 why thi s Illctil od
works. but laoki ne: at the cases in
Figure 4.75 shoul d help provide the
intuiti on.
0 1 0 0
+0 0 0 + 1 0 0 0 + 0
o t 0 0 0 10 0 1
overflow overflow no overflow
(a) (b) (e)
" the carry into the sign bit column differs from the
carry out of that column. overflow has occurred.
Figure 4.75 1\\lo's complement overflow
detecti on comparing carry int o and out of the
sign bi t column: (a) when addi ng two positive
numbers. (b) when adding two negative
numbers. (c) no overfl ow.
WHY SUCH CHEAP CALCULATORS?
Se\'eral earl ier examples dealt with designing simple
ca1culators. Cheap caJcularors. costi ng less than a
dollar. are easy (Q find. Calculators are even given
away for free by many companies selling something
else. But a calculator internally contains a chip
implementi ng a digital cireui!. and chips nomlally
arcn '{ cheap_ Why are some cnlcul:uors such a
bargain?
The reason is known as economy of scale. which
means that products are often cheaper when produced
in large vol umes. Why? Because the design and setup
costs can be amonized over larger numbers. Suppose it
cOSIS S 1.000.000 to design a CUSlom calculator chip
and to setup the chip's manuracturing (not so
unreasonable a number}----design and setup costs are
often caJJed nonrecurring engineering. or NRE.
coSIS. If you plan to produce and sell one such chip.
--_ ... _-_.
then you need to add $1.000,000 to the selling pri ce or
thai chip if you wanl to break even (meaning to
recover your design and setup COSlS) when you sell the
chip. Ir you plan to produce and sell 10 such chips,
then you need to add S 1.000.00011 0 = $100.000 to the
selling pri ce of each chip. Ir you plan to produce and
sell 1.000.000 such chips, then you need to add only
S 1.000.00011.000.000 = $1 to the selling price or each
chi p. And if you plan to produce and sell 10.000.000,
you need to add a mere $1.000.00011 0.000,000 =
50. 10 = 10 cenlS to the selling price or each chip. Ir
the actual raw materials only co t 20 cenlS per chip,
and you add another 10 cents per chip for profit. then I
can buy the chip from you ror a mere 40 cents. And [
can Lhen give away such a calculator for free, as many
companies do. as an incentive ror people to buy
somethi ng else.
/
Display Chip (covered) Battery
4.9 Arithmetic-Logic Units-ALUs 201
4.9 ARITHMETIC-LOGIC UNI TS-ALUS
An N-bit adthmetic-Iogic ullit (A LU) is a datapath component able to perfonn a variety
of anthmellc and logic Operati ons on two N-bit wide data inputs, generating an N-bit data
output Example arithmetic operat ions incl ude addi ti on and ubtraction. Example logic
operall ons .'"clude AND, OR, XOR. etc. Control inputs to the ALU indicate which panic-
ul ar operat Ion to perfonn.
To understand the need ror an ALU component, consider Example 4.22.
EXAMPLE 4.22 Multi-function calculator without using an ALU
LeI's extend our earli er DIP-switch-based calculator to sUPPOI1 eight operations. determined by a
three-switch DIP switch that provides three inputs x. y. and z to our system. as shown in Figure
4.76. For each combi nation of the three switches. we want to perform Lhe operations shown in Table
4.2, on the S-bit data inputs A and B. generating the S- bit output on S.
TABLE 4.2
Desired calculator operations
Inputs
Sample output ir
Operation
A =0000 Illl,
X
Y Z
B-OOOOO10 I
0 0 0 S-A+B
5=00010100
0 0 5=A-B
S=OOOOIOIO
0 0 S = A + I
5=00010000
0 S=A
5=000011 II
0 0 S = A AND B (bitwi se A D) S=OOOOO10 I
0 5 = A OR B (bitwise OR)
5=00001111
0 S = A XOR B (bi twi se XOR)
S=OOOOIOIO
S= OT A (bitwise complement) S=I II 10000
The tabl e includes several bitwise operations (AND. OR. XOR. :Illd complement). A biI><is.
operation appli es to each corresponding pair or bits or A :Illd B separatel).
\ Ve can design 3 circuit for our aJculator a shown in Figure t76. u iog 3 separ.lIC datapath
component to compute each operation: we use an adder 10 compute the addition. 8 subtrnctor to
compute the subtraction. an incremcllIer to compute the increment. and so on. HO\\(!\cr. that
circuit is very inefficient with respect to the number of wire. power consumption. or lbere
nre too many wires that must be routed to all those components. and to the mu."(. \\ b.icll
wi ll have 8*8 ;: (H input!!>. Furthermore. every operation is computed all nme. \\hh .. \\asfes
power. hmlgi nc instead that \\c were dealing nOt with -bit numbel'$. but \\ ith num-
bers. and we wanted to suppan not just operations but 3_ opernuons. Then \\ould hJ.\ C!\ n
morc wires (32*32 = 1024 at I1lU\ inputs). and e\en more po\\cr n!>umpu\'In. Funher-
more. a 3:!x I \\ ill rcquir'l:' sc\cral els of I!ntes. du to pr.t .... al "'ns. d
logic gate the IllU\) \\ill li"-cl) n':c.!d to'" be implemented -I, ('If ... mall r
logic
202 4 Datapath Components
DIP swilches
,..---=-=-=-=-=-="
OODDDOOB
1
o
I.------yo
8
CALC
Wasted
power
Fig ure 4.76 -bi t DIP-switch.based
multifunction calculator. using
separ.lIe components for each
function.
We saw in the above example that using sepamte component s for each operalion is
not effi cient. To solve the problem. we observe lhat the calculator can only be configured
to do One operation at a time. so there is no need to compute all the operallons III parallel
as was done in the example. Instead. we can create a slllgle component (an ALU) that can
compute any of the eight operati on . Such a component woul d be more area and power
efficient. and would have less delay because a large mux woul d nO! be needed.
Let' s stan wi th an adder a our ba e internal AL design. To avoid confusion. we' ll
call the inputs to the int ernal adder 1 A and 1 B. shon for "int ernal A" and "int ernal B.': to
lhose input s from the external ALU inputs A and B. We stan \YlIh the deSign
shown in Figure 4.77(a). The ALU consists of an adder. and logiC III front of the
adders input s. We' lI call lhat logic an arithmeticfl ogic extender. or IIL-extellder. The
purpose of the AL-extellder is to et the adder inputs based on the values of the ALU's
Figure 4 77 Arnhmetl c- Ioglc unll '
la) AL de\lgn ba",d on a \I ngle
adder with an anthmcllc/Joglc
extender. dnd IOJ drnhmClltlloglc
Icnder detail
a7 b7 a6 b6
la7 ib7 la6 1b6
aD bO
(b)
4.9 Arithmetic-Logic Units-ALUs
203
control x. y. and z. such that the desired ari thmeti c or logic result appears at the
adder s output. The AL-extellder actuall y consists of eight identical components labeled
abext. one for each pair of bits a i and b i . as shown in Figure 4.77(b). It al so has a Com-
ponent cill ext to comput e the c i n bit.
Thus. we need to design the abext and cillext components to complete the ALU
design. Con ider the fi rst four calcul ator operati ons from Table -1.2. which are all arith-
meti c operations:
When xyz=OOO. S=A+B. So in that case. we want IA=A. 1 B=8. and ci n=O.
When xyz=OOl, S=A - B. So we Want 1 A=A. 1 8=B ' . and ci n= 1.
When xyz =O 1 O. S=A+ 1. So we want 1 A=A. 1 B=O. and c i n=1.
When xy z=O ll, S=A. So we want I A=A. IB=O. and cin=O .. Olice that A will
pass through the adder. because A+O+O=A.
The last four ALU operat ions are all logical operati ons. We can compute the desired
operati on in the abext component. and input the result to 1 A. We then set 1 B to 0 and ci n
to O. so lhat the va lue on 1 A passes th rou2h the adder unchan2ed.
One possibl e design of abext pl aces ; n 8x I mux in front each output of the abexr
and cillext component s. wilh x. y. and z as the select inputs. in which case we would set
each mux dat a input as described above. A more efficient and faster de ign would reate
a custom circuit for each component output. We leave the completi on of the internal
design of the abert and cillext component s as an exerci e for the reader.
Example 4.23 redesigns the multifunction calcul ator of Example -I.n . this time uti-
lizing an ALU.
EXAMPLE 4.23 Multi -function calculator uSing an ALU
Exampl e 4.22 bui ll an eighl funcli on calculmor \\ ithoUl an AL . The result \\ as W:bled area
and power. complex wiring. and long deja) . sing the abo\ c-designed ALL', the akulJ.mr could
inlOtcad be built :IS shm\ 11 in Figure 4.78. I ot ice the simple and efficient
Flgur. 4.78 S-hlt DIP-
, \\ ih:h-hn, cd mul l! -
flllll' llOn calcuhuor.
U'll1g nn ALU
4 Datapath Components
410 REGISTER FILES
An MxN register file is a datapalh memory component that provides efficient access to a
collecti on of AI registers. where each register IS N bll s Wide. To.understand the need for a
register file component in building good datapaths. rather than JlIst uSll1g M separate reg-
consider Example -1.24.
EXAMPLE 424 Abovemlrror display system uSing 16 32bit registers
Recall the above-mirror di splay syslem rrom Example 4.4. Four 8-bit were to
all S-bi l OUlpUt. Suppose imacad that the system required sixteen 32-blt registers: to display more
values. c3ch of more precision. We would therefore need a 16:< I mulllP.lexor, shown
in Fif!ure ..t 79. From 3 purely digital logic perspective. Ihe deSign IS Just fin.e. BUI In that
multiplexor \'cry incfli cicnl. COllnt the number of wires that would ..be fcd Into that multlplexor-
16:<12 = 511 wi res. That's n 101 of wires to I ry 10 route from the rcguers to the plug-
oin!! 5 11 wires into the back of one stereo system for a hands-on demonstration. HaVing too many
: in a small area is known as cOllgestioll .
" E ., 0
4x 16 ec
u.. Q)
" 4 13'10
e
load
o
32
figure 4 79 Abovemi rror display design. ass uming sixt een J2. bit registers. The mux has too
many input wires .. in Also. the data lines C arc fanned out to too many
in weak current.
Likcwi\e. consider routi ng the dala inpUlto all c;; ixtccn &Ich data input wire is being
branched inl o ,ixtccn Imagine electric current being Iikc a ri ver of waler- branching a
main river inlo smaller will yield much waler now in each c; maller river than in
the main river. branChing a wire. known a}. jallolli . can only be done so many times
before lhe branched wires' arc 100 \ mall to conl rol Furthennore.
low-current wire, may be very 'low altOio. '0 fanoul can create long delayc; over wires too.
The fanout and congesti on probl ems illustnllcd in the previ u< e nmple nn be solved
by ob,ervi ng that we never need to load more Ihan one (It a lime. and lhal we never
need 10 read more than one al a lime ei lher. An M N rcgmcr foIe <olves the fanoul
,lI1d by grouping the M Into a component, with that
4.10 Register Files
having a si ngle N-bit wide data inpul. and a si ngle N-bit wide data outpuL The
wlfmg mSlde the component is done carefully to handle fanout and congestion. Figure
4.80 shows a block symbol of a 16x32 register file ( 16 registers. each 32-bits wide).
. Consider writing a value to a register in a reg-
Ister fi le. We would place the data to be written on 32 32
the input W_ data. We then need a way to indicate A_data """':'-
which register we actuall y Want to write. Since A_addr -+
there are 16 regi sters, we need four bits to speci fY
a panicu lar register. Those four bits are called the A_en _
register's address. We would thus write the desired
regi ter's address on the input W_ add r . For
example, if we wanted to write to register 7. we
would set To indicate that we
actuall y Want to writ e on a panicular clock cycle
(we won' t want to wri te on every cycle). we would set the input W_en to 1. The coUec-
tion of inputs W_ da tao W_add r. and W_en i known as a regi ster file' wrile port.
Reading is similar. We would pecify the register to read on input R_addr. and set
1. Those valJes would cau e the register file to output the addressed regi ter con-
tents Onto output R_data. R data,R addr.and R en are known as a re.n terfile' read
port. The read pon and writ; pon are i ndependent ;;f one another. Thus. during the same
clock cycle, we can write to one register. and read from another (or the same) register.
Let 's consider how to internall y design a regi ster file. For simplicity. con ider a 4 x 3_
regi ster file. rather than the 16 x 32 register file described above. One internal design of a
4x32 register fi le is shown in Figure 4.81. Let's consider the circuitry for writing to this
register file, found in the left half of the figure. If the reg; tcr file \\ on't write fO
any register, because the write decoders outputs will be aliOs. If I. then the write
decoder decodes dd r and sets to 1 the load input of exactly one regi ter. lllat register
will be written on the next clock cycle with the value on W_data.
32
W_data + --...,...'-___ ..... ___ --,
iO
it
d
2x4
d
wnle
decoder
d
e
4x32 register "ie
206 Datapath Components
Such componems
ore more
commnnl\' J..nOh ll
0\ . tn-slate
dnn:rs rtflher
than' three-stote."
But "tri-state"
If a registered
trademark of
VOl/ollal
SemlC:ondu({or
Corp .. fO rother
than the
requITed
(rademarJ.. Hmbol
aJlu t"\ er;. of
lhe lerm "frI-
UaU. man,\
documenH 11ft' the
term rh"l'-\lClle
Notice the circled one-input one-output
component placed on Ihe ICda ta line (there would a ILI -
ally be 32 such components since ICda ta is 32 blls Wide).
That component "'-flown as a drirer. call ed a
bllffer. illU'1r3!ed in Figure 4.82(3). A dnver S OUIPUI
equals it, input. but the OUIPUI is a stronger (higher current )
Remember the fa nout problem we descnbed III
E;amplc -l.2-l? A driver reduces Ihe fanout problem. In
Fi2ure -l .8 1. the ICda ta lines only fanout to twO registers
before Ihey go Ihrough the driver. The driver's OUlput then
oul to on l y IWO morc registers. Thus. inslcnd of a
ranout of four, Ihe H_da ta lines have a ranout of only two
d
q=d
(a)
e=l : q=d d-q
e=O: q=' Z' d- ; -q
like no
(b)
(actually three if you count the driver itself). The inserti on Figure 4.82 (a) driver, (b)
of drivers is beyond the scope or Ihi s book. and is inslead a Ihree-Slale driver.
subjeci ror a VLSI design book or an advanced digital . .
desi2n book. But secinc at least one exampl e of the usc of a dnver hoperull y gives you an
idea-or one reason wh; a register file is a userul component-the component hides the
complexity or ranoul rrom a designer. .
To under;tand Ihe read circuiuy. you must fi rst understand Ihe behaVior or another
new componelll thai we've illlroduced-the tri angul ar component having two inputs and
one output. That component is known as a three-Slate driver or three-state bllffer, Illus-
trated in Figure -l.82(b). When the control inpul C is 1. the component acts like a regul ar
driver-the component' s out put equals its input. However, when the control input c is 0,
the driver's OUIPUI is neither D or 1. but instead what is known as hi gh-impedance, wri tten
as 'Z'. High-impedance can be thought or as no connecti on at all between the dri ver's
inpul and output. '"Three-stale" means the driver has three po, ible output tates-D, 1,
and Z.
Let's now consider the circuit ry ror reading rrom the register file. round in the right
hall' or Figure 4.81. II' R_en=D. the regisler fi le won 't read rrom any register, since the
read decoder' s outputs will be all Ds, meaning all the drivers wi ll output Z's,
and thus the Out pul R_da ta wi ll be high-impedance. II' R_en-1. then the read decoder
decodes R_addr and scts to 1 the control input or exactly one three-Mate driver. which
will pa s ilS regi ster val ue through to the R_da ta output.
Be awarc that each shown three-state driver actually repre,ents a set or 32 three-
,tate driver>. one ror each or the 32 wires coming rrom the 32-bit and going
10 the 32-bit R_da ta OU lput. All 32 drivers in a ,el arc controlled by Ihe same
control input.
The wi res red by the various three-Mate driver', arc known a, a bllS, as indi-
cated in Figure 4. 81. A bus is a popular alternative to a multiplexor when each mux
dala input many bllS wide and/or when there are many mux dma inputs. becau e
a bus result; in les, congesti on.
Notice that Ihe regi ster file design ,cales well to larger numbe" or registen.
The write data 11I1e, can be driven by more drivel'\ If nece"ary. The read data line
arc red rrom three-state drivers and thu, there " no congc'l1on at a single multi-
plexor. The reader may wi sh to compare the rcg"ter file de Ign In Figure 4. I with
the de\lgn In hgure 4.6. which was c"cntially a poor dcslgn or" regi\tcr file.
-----_. -
4.10 Register Files
. Figure 4:83 provides example timing diagrams describing wriling and reading or a reg-
Ister fi le. Dunng cycleJ, we do not know the contents of the register file. so the register file's
contents are shown as "?" DUring cycle J, we set W_d ata =9 (in binary. or course).
H_addr=3, and W_en=l. Those values cause a write of9 to regi sler file location 3 00 the
first cl ock edge. Notice that we had set R_en=D. so the regi ter file outputs nothing ('T).
and the value we put On R_addr does not matter (the value is a "don't care", written as "X").
elk
2 3 4 5
W_dataX X i X
w_addrX::=:::t ; Gtx X X;
W_en} -: : 1 I : I :
;:::..
R_data > Z i Z i
R_addr( X !X21X \3 ! 3 i
I I .1
I i L' i k' i I ' !
, , I I I I
I i i i i
2: ? : 2: ? : 2:? t 2: ? : 2: ? : 2: 177 2. 177
I I , I I
3: ? ! 3: 9 i 3: 9 I 3: 9 i 3: 9 : 3: 9 11 3.j 555
I , ,
Figure 4.83 Writing and reading a regisle, fi le.
Duri ng cycle2, we setICdat a=22. W addr=1. and W en= . These values ause a
wri te or 22 to register file location I on edge _. -
During cycle3. we et W_en=D. so then it-doe n't marter to wbat valu \\e set
W_data and W_addr. We also set R_addr=3 and R_en=1. Those "alues use the reg_
ister file to read out the contents or register file location 3 0010 R_da a. ausing a:c
to output 9. Noti ce that the reading i not yn hronized to cI k
changes soon after R_en becomes I. Examinin2 the desi2n or Fi2ure -l. I hould make
clear why reading i not synchronous- etting R_en t; 1 enabl the output
decoder to turn on one set or the three-state buffers.
During cycle-l. we return R_en to D. Note that this cause me ", 3gtllD.
During cycle5. we Want to si mult aneously" rite and read the regi ter iile. We read
locati on I (which causes JLda a to be ome while writing 1 ati 02
with the value 177.
Finall y. during cy le6. we want to simultan read and 'Hite the : me register
fi le location. We set R_addr=3 and R_en-l. causing I ation 3' < contenl'> fQ to appear
n R_da a sh rtll' after setting those ' -:llues. We also set W .3 .Q3t3=...:S. llld
W_ en-1. On clock edge 6. 5:5 thu. gets ,tored into localion.1. :\ou
clock edge. R. da a abo changes to :55.
TIle ability t read and " nte locations cf J regl,ter til. , n the
ution. i ' a "idel) u,ed feature of regbter fiie>. The ne\ t e\ .ullple m e, \I> l fth.it fe lUI\".
208 4 Datapath Components
EXAMPLE 4.25 Above mirror displ ay system using a 16x32 register fil e
mml ",)rt f
rnm on 0
In (I
pmdulf Mat' f O
r"ul ptJrt f and 5
lot rift' {J'Jr lf
E)..;}mple 4.4 used four S-bil registers for an above-mi rror display Example 4.24 extended
the system to use sixteen 32-bi t regi sters. resulting in and problems. \Ve can redo
that using a register fi le. The design is shown in 4.84. 511lcc system OUt-
puts one of the register values to the display. we ti ed the R_en Input to I . Not ice that the wnung and
reading of pani cular regi sters are independent of one another.
figur.4.84 Abovemirror
di spl ay design. using a
regi ster file.
':"::
16x32 - 1
register lite RA
A register fi le having one read pon and one write pon is sometimes referred to as a
dual-ported regisrer file. To make clear that the twO pons consist of one read pon and
one write pon. such a register fil e may be referred to as follows: dllalporred (I read, I
write) register file.
A regi ster file may actuall y have just one pon, whi ch would be used for both reading
and writing. Such a register file has only one set of data lines that can serve as inputs or
outputs. one set of address inputs. an enabl e input, and one more input indicating whether
we wi sh to wri te or read the register file. Such a register file is known as a sillgle-ported
register file .
Multiported (2 Read, 1 Write) Register File. Many regi ster fi les have three pons:
one write port , and two read ports. Thus. in the same clock cycle. two regi sters can
be read simultaneously. and anot her register written. Such a regi ster file is especially
useful in a microprocessor. since a typical microprocessor in. truction operates on
two register and stores the result in a third register. like in the instruction "RO <-
RI + R2 ."
We can create a second read port in a register file by addi ng another set of lines,
Rb_da t ao Rb_addr . and Rb_en. We would introduce a second read decoder wi th inputs
Rb_add r and enable input Rb_en. a second set of three state drivers. and a second bus
connected to the Rb_da ta output.
Other Register File Varia/iOtl s. Regi ster fil es come in all sons of configurations.
Typi cal numbers of registers in a regiMer fi le range from 4 to 1024. and typical register
wi dths range from 8 bi ts to 64 bits per register. but may vary beyond those mnges.
Regi ters fil es may have one pon. two pons. three pons. or evcn more. but increasing to
many more than three pons can slow down the rcgbtcr perf0n110nCC incrca c its
signifi cantl y. due to the difficulty of routing olllhose wires around in,ide the regi ter
file. Nevenheless, you' lI occasionally run aero" rcgi'lcr liIe, with perhops J wri te ports
and 3 rcad pons, when concurrent access IS cflti col.
4.13 Product Profile: An Ultrasound Machine
Ul9
4.11 DATA PATH COMPONENTTRADEOFFS (SEE SECTION 6.4)
For each datapath component that we introduced in previous sections. we created the most
basIc and easy-to-understand implementation. In thi ection. which physically appears in
the book as SectIon 6.4, we describe alternative implementations of several datapath com-
ponents. Each alternative trades off one design criteria for another-most of those
alternatIves trade off larger size in exchange for less delay. One use of this book covers those
alternatI ve Impl ementations immediately after introducing the basic implementations
(meaning now). Another use of the book covers those alternative implementations later. after
shOWing how to use datapath components during register-transfer level design.
4.12 DATAPATH COMPONENT DESCRIPTION USING HARDWARE
DESCRIPTION LANGUAGES (SEE SECTION 9.4)
Thi s secti on, which physically appears in the book as Section 9.4. shows how to use
HDLs to describe several datapath components. One use of the book describes such HDL
use now, whil e another use describes such HDL use later.
4.13 PRODUCT PROFILE: AN ULTRASOUND MACHINE
I f you or someone you know has ever had a baby, then you may have seen ultrasound images
of that baby before he/she was born. like the images of a fetu . head in Figure 4. -(a).
figure 4.85 (a) Uhrasound image of a fetus. created
using an ullrnsound devi e lhat is simply placed on the
mom's abdomen (b) and lhm fonns the image
gcncrnting sound waves and li stening to the
Pholos coune y of Philips )Slems.
That image wasn't taken by a camem omt'how in. ned into th uteru" Nt r:uh r
an ult rasound machine pressed against the mom's skin :md pointed to\\ ,mI th f tlL. <'
Ullrasound imaging is now common prJctice in obterri - Illainl) helping d.: tl . t"
truck the fetus' progres, and om t potential probl ms earl). Nt aI . ... );1\ 11\ nl:- a
huge thrill when the get their tirst glimpse of their bab) 's h' ud. h:md.. ... :md lint f 't'
110 4 Datapath Components
Functi onal Overview
This section brieny describes the key func ti onal idea, of how ullra ound imagi.ng work .
Digital de,igners don't typicall y work in a vacuum-in>tead. they their skills to par
. -I . t'Oll ' 'Ind thus designers typicall y learn the key functIOnal Ideas underlYlllg lieu ar .lpp I S. _ . . . .
tho,e application,. We therefore inlroduce you to basIc Ideas of ult rasound appitcatl ons.
Itra,ound imaging works by sending sound waves IIlt o the body and itstelllng to the echoes
that return. like bones yield difTerent echoes than objects like ski n or nUld , so an
ullrasound machine processes the different echoes to generate Images li ke III FIgure
-I .85(a)-strong echoes might be displ ayed as white. weak ones as black. Today . ult rasound
machines rely heavi ly 0/1 rast circuits to generate the sounds waves, li sten 10 the
echoe,. and process the echo data to generate good quality images in real lime.
Figure 4.86 Ba.:; ic components of an Ullr'JSound machine.
Figure -1 .86 illustrates the bas ic pan of an ult rasound machine. Let's di scuss each
pan indi viduall y.
Transducer
A lrallsducer convens energy from one form to another. You' re cenainl y fa mi liar with
one type of transducer. a te reo speaker, which convens electrical energy into sound by
changi ng the current in a wire. which causes a nearby magnet to move back and fonh,
whi ch pushes the air and hence creates sound. Another familiar transducer is a dynamic
microphone. which convertS sound into eleclrical energy by letting sound waves move a
magnet. whi ch induces current changes in a nearby wire. In an ult rasound machine, the
lransducer conven> eleclri cal pulses into sound pulses. and sound pulses (lhe echoe ) into
electrical pul ses. but the lranducer u!.es piezoeleclri c cry ..tab inMead of magnets.
Applying electri c current to such a crystal cause .. the cry' tal to change ,hape rnpidly, or
vibrate. thus generating sound waves-typically in the I to 30 Mcg"hert l frequency
range. Human .. can't hear much above 30 ki lohenl- thc term "ultrasound" re fers to the
fact that the frequency is beyond human hearing. Inver,ely. ,ound wave, (echoes) hitt ing
the crystal create electri c current. An ultrasound machine', tr:l n,duccr component may
contain hundred .. of , uch crYMal ,. which we can of ;" hundred, of t", n; ducer.;.
Each ,"eh tran,ducer i .. con .. idered to form a challl/ci.
Beamformer
A heamformer elel/rfill/ClIIII' "focu,c," and "qeer," the 'OIlI1d beam of:1I1 amy of lllln .
duce" to or from panicul,,; focal poinL' . without ac tually mO\lIIg.ln hardware like 3
d"h to obtall1 \lIch focu .. lI1g and .. teenng.
Real designers
mllsl often lean!
abolll ,he doma;"
for which 'he)' will
deSign, Mall)'
designers
cOllsider such
leaming about
domains. like
II ltrasoul/d, as olle
of 'he !ascillmillg
features of Ihe job.
4.13 Product Profile: An Ultrasound Machine
211
To understand the idea of beam forming. we mUSl first under Land the idea of additive
sound. Consider two loud fi reworks expl odi ng al the same lime. one I mile away from
you, and the olher 2 mil es away. You' ll hear the clo er firework after about 5 seconds-
assuming sound travels 0.2 mil es/second (or I mile every 5 seconds)--a reasonable
approximati on. You'll hear the fanher fi rework after about 10 second . So you'lI hear
"boom .. . (five seconds pass) ". boom." However, suppose instead that the closer firework
expl oded 5 seconds later than the fanher one. Then you'lI hear both at the ame time-
one bi g "BOOM!" That's because the two sounds add logether. ow suppose there are
100 fireworks spread throughout a city, and you want all the sound from tho e fireworks
to reach one pani cular house (perhaps somebody you don' t like very much) at the same
time. You can do thi s by expl oding the closer fi reworks laler than the farther fireworks. If
you time everylhing just ri ght, that panicul ar house will hear a tremendou Iy loud ingle
"BOOOOOM! !!!." probabl y rattling the house's wal ls pretty good. as if one huge fire.
work had expl oded. Olher houses lhroughout the city will instead hear a serie of quieter
booms. since the liming of the expl o ions don ' t result in all the sounds adding at th.ose
olher hou es.
Now you understand a basic principl e of beamforrning: If you have multiple sound
source (fi reworks in our example, lransducers in an ultrasound machine) in different
locati ons. you can cause the sound to add together at any desired point in pace. by care.
fu ll y riming the generati on of sound from each source uch that all the ouod wa\'es arrive
at the desired point at the ame lime. In other words. you can electronically focus and
steer the sound beam by introducing appropriale delays. Focusing and teering the sound
to a panicular point is useful because lhen Ihal poilll will prodllce a much louder echo
,ltan all ollter POiIllS, so we can easily hear the echo from lhat point o\'er all the echoes
from other points.
Fi gure 4.87 illustrate the concept of electronic focusing and teering. using two
sound ources to foc us and steer a beam to a desired point X.
focal
wave
(a) (b)
Bo/h waves reach the focal
point the same ome
,>- ...
, I
' ..... '
(e)
focaf
polnt
(dl
Figure 4.87 Focu>ing .ound at 3 p3nic'ul3r point using be3mfonning (al Ii t nme
boll 111 tran du cr (b) lime :-tt'p--the lOp [r:lnMlu r 00\\ ge:oer.u _
too. (e) third time 1\\ 0 '\ound Jllhe f "at POlOt (d) an II1\bD'3o m, l.I\g
thul the top tmnsduccr j., (\\ 0 lime II\\ from the focal p0lOt. \\ hlle the )[tl'l11 tr'3.ru
three time tel' 3\\,1). mcun1l1g the lOp trnn,du,'er ,hould gent:r.ut." ... \'Oe un\( p t r
the bollOIl1 lrnll,dul'cr.
Datapath Components
At the fi, ' te) (Fioure .j.87(a)), the bottom source has begun its
, r. t lime S.I e (F' ur' .j 87(b, the top source has begun lransmllllng
!)ound wave. Arter two lime steps Ig C h
its sound wave. After tllece time steps (Figure 4.87(c: the waves frol11 both reac
h f
. . I TI , II continue adding as long a the waves from both
t e ocal POlllt addll10 t02et ler. Ie) . .
. ' e. - th ' r We can si mplify the draWing by shOWing only the
sources are 111 phase wnh one. ana wn in Fi 2ure .j .87(d).
lilles from the sources to the focal POII1!. as sho -.. .
An ultrasound machine uses thi s abilit y to electrolllcall y focus and steer sound,. In
d
. ' I entire reoion in fron t of the LIansducers. The machine
or er to scan, POIIl! by pain!. tIe e ..
does such scanning perhaps tens or hundreds of limes per second. .
F I f 1
. h chine needs to It sten to the echo lhat comes back from
Or eac 1 DCa P01l1t. I e m3 . . . .. .
whatever object is located at the focal point , to determine If that object IS bone, skin,
blood. etc., utili zing the fact that each such object generates a different echo. Remember,
the echo from the focal point wi ll be louder than echoes from POlillS, because lhe
sound adds at lhat pain!. We can use beamfomling to also focus ilI on a panlcular pOint 111
space that we want to lisl ell to. In the same way lhat we generated sound pul ses wl lh par,
ti cular delays to focus the sound all a pani cul ar pain!. likewise, to "listen" to the sounds
from a panicular point, we also want to introduce delays to the Ignals received by the
transducers. That's because the sounds will arrive at the closer lransducers sooner lhan at
the fanher lIansducers, so by using appropriate delays. we can '.' Iine up" the signals
each LIansducer so Ihat the sounds coming from the focal pOint all add together. ThiS
concepi is shown in Figure 4.88.
focal x\
POInI "-J
(a)
Q)Q)
' I
--'
(b) (e) (d)
resull wilhoUi
Ihedelay
Figure 4.88 Li lening 10 ound from a part icular poinl u. ing beamfomling: (a) firsl lime Slep.
(b) second lime slep-lhe lOp transducer has heard the sound 1i,,1. (c) Ihird lime slep-lhc bouom
Iran,ducer hears the sound al Ihis lime, (d) delaying Ihe lOp lran,ducer by one lime slep results in
the waves from the focal poinl adding, ampli fyi ng the sound.
NOle that lhere wi ll cenninl y be echoes from olher poinL' in Ihe region, but those
coming from the focal poi lll will be much slronger- hence, the weaker echoes can be fil
lered OU!.
NOIice lhal beamforming can be u' cd to li sten to a panicular point even if the ounds
coming from lhat poim are not echoe' coming from our 0\\ n ,ound pulses-the
,ound could be coming from the objeCt at the point IL,clf. ,uch u\ a cur cngi ne or a person
talking. 8eamformi ng b Ihe electronic equi vafelll to poi llltng n hl g flambolic dish in a
panlcular directi on, bUI beamforming require\ no rnovlII g PUrt,
4.13 Product Profile: An Ultrasound Machine 213
8 eamforming is lremendously common in a wide variety of sonar applications, such
as observtnga fetus, observi ng a human hean, searching for oil underground, monitoring
the .s urroundlll gs of a submarine, spying, etc. 8eamfomning is used in some hearing aids
havlllg mulupl.e microphones, lO focus in on the source of detected speech-in that case,
lhe beamfomnlllg must be adaplive. 8eamforrning can be used i.n multimierophone ceU
phones to focus III ,on the user's voice. and can even be used in cellular telephone base
stall ons (uslll g radiO signals lhough, not sound waves) lO focus a signal going lO or
commg from a cell phone.
Signal Processor, Scan Converter, and Moni tor
The signal processor analyzes the echo data of every point in the scanned region. by fil -
tering out noise (see Seclion 5.11 for a di scussion on filLering), interpolating between
pOInts. asslgnlll g a level of gray to each poi m depending on the echoes heard (echoes cor-
responding to bones might be shaded as while, liquid as black, and skin as gray. for
example), and olher tasks. The resulL is a gray,scale image of the region. The scan con-
vener steps lhrough this image to generate the necessary signals for a black-and-while
moni tor, and the monitor displays the image.
Digital Circuits in an Ultrasound Machine's Beamformer
Much of the conLIol and signal processing lasks in an ullIasound machine are carried OUi
using software running on one or more microprocessors, typically special micro-
processors specifical ly designed for digital signal processi ng, known as digital signal
processors, or DSP . But cenain tasks are much more amenable to custom digital
ci rcui lIy. such as those in the beamfomner.
Sound Generation and Echo Delay Circuits
8eamforrni ng during the sound genera-
li on step consi IS of providing starCout
appropriate delays to hundreds of tran -
ducers. Those delays vary depending on
lhe focal point. so they can' l be built
into the lIansducers themselves. [nstead.
we can place a del ay circuit in front of
each LIansducer, as shown in Figure
4.89. For a given focal poim. the DSP
writes the appropriate delay val ue imo
each delay circui t. by wriLing lhe delay
val ue on the bus labeled de lay_out.
Delay
Figure U9 Transducer OUtpul in-uilS for
writi ng the "address" on the lines Iwo channels,
fabel ed add r. and enabling the decoder,
The decoder will lhu et the load line
of one of the OllrDeia . component ,
fter wri ting to every ueh c mpollcnt. the 0 P stJJ'lS all of them simullJ.lleQ\l> b)
selli ng s ta rCou t to 1. Each OIlIDelll), c mponeOl \\ill. after the _pe<-ilied deJa), put
its 0 output, which we'll assume cau es the lransdul'Cr to generate s undo TIte D P \\ auld
then sel S ta rt_out to O. and then Ii -ten for th -ho,
21 -4
Datapath Components
We C3n implement the Oil/Deia\' compo-
nent lIsing a downcount er with parallel load.
as in Figure 4.90. The parall el load
inputs L and 1 d load the down-counter With
its count value. The ent input commences the
down-CoUilting-when the CQunter reaches
zero. the pulses te. The data output of
the counter is unused in tbi s implementati on.
After the ultrasound machine sends out
sound waves focused on a part icular focal
point. the machine Ill ust li sten to the echo
cOllli ng back frolll that focal point. Thi s li s-
tening requires appropriate delays for each
transducer to account for the differing di s-
tances of each transducer from the focal poi nt.
Thus. each transducer needs another delay
circuit for delaying the received echo . ignal.
as shown in Figure 4.91. The EchoDelar com-
ponent on input t the signal from the
transducer. which we ll assume has been dig-
itized into a stream of N-bit values. The
component should output that signal on output
t_de 1 ayed . delayed by the appropriate
amount. The delay amount can be written by
the DSP using the component 's d and 1 d
IIlputs.
We can implement the EchoDelay com-
ponent using a series of registers. as shown in
Figure 4.92. That impl ementation can delay
the OUtput signal by O. I. 2. or 3 clock cycles,
imply using the appropri ate select line values
for the 4x I mux. A longer register chai n. along
wi th a larger mux. would support longer
delays. The DSP confi gures the delay amount
by writing to the top register. which sets the
4x I mux select lines. A more nexibl e imple-
mentation of the EchoDelay component woul d
instead u e a timer component.
s
t
010--- te
ent
LI--,....
Id l---
counter
_ c
<1--<
Out Delay
Figure 4.90 Out Delay circuit.
start_out
delay_out
d ---,..:...-
r-...., .... -_ .. to
d
Id
adders
Figure 4.91 Transducer output and echo
delay circui ts for one channel.
Summation Circuits-Adder Tree figure 4.92 EchoDelay circuit.
The output of each transducer, appropriately
delayed. hould be , ummed to create a single echo signal from the focal poi nt. as wa iIIus
tented in Fi gure 4.88. That illu tration had only two transducer;. and thus only one adder.
What if we have 256 transducers. would be more likely in a real ultmsound machine?
How do we add 256 values? We could add the value!> in a lincar way. illustrated on the
left Side of Fi gure 4.93(a) for eight value' . The delay of that cir Ult i, roughly equal to the
delay of ,"ven addm. For 256 values. the delay would roughly be that of 255 adders. That '
a very long delay.
4.13 PrOduct Profile: An Ultrasound Machine
figure 4.93 Adding many numbers: (a,
l.inearl y. (b) using an adder tree. :\me that
both melhods use seven adders_
215
We can do better by reorganizing
how we compute the sum, USing a config-
uration of adders known as an adder tree.
In other words, rather than computing
((((((A+B)+C)+D)+E)+F)+G)+H.
depicted in Fi gure 4.93(a), we could
IIl stead compute ((A+B)+(C+D +
((E+F)+(G+H)). as shown in Figure
4.93(b). The answer comes out the same
and uses the same number of adders, bu;
the latter method computes four addi-
tions in parallel. then two addi ti ons in
parall el, and then performs a last addi-
tion. The delay is thus onl y that of three
adder. For 256 values. the tree's first
level would compute 128 addi ti ons in
parallel, the second level would compute
64 addi ti ons, then 32, then 16. then 8, then 4. then 2. and finally I last addition. Thu . that
adder tree would have eight level. meaning a total delay equal to eight adder dela, . That'
a lot faster than 256 adder delays-32 limes/asler. in fact. -
The output of the adder tree can be fed into a memory to keep track of the re ults for
the DSP. which may access the results sometime after they are generated.
Multipliers
We presented a greatl y simplifi ed version
of beamforming above. In reality. many
other factors must be considered durin2
beamfonning. Several of those
ati ons can be account ed for by
multiplyi ng each channel with specific
constant values. which the DSP a2ain
sets indivi duall y for each cbannel. -For
example. focusing on a point close to the
handheld device may require u to more
heavil y weigh the incoming Signals of
transducers near the center of the device.
A channel may therefore actuulIy include
a mUltiplier. as shown in Fi2ure 4.94. The
DSP could wri te to the ";gister shown.
Figure 4.94 Channel e\tended \\i th a
multiplier.
whi ch would represent a constant by which the transducer signal" uld be multiplied
Our introduction of the ultrasound ma hine is simplifit'd from real rna-tune.
yet even in thi s simplified introducli n, you an see of this chapt r's dat3P'lth xtn-
ponents in use. We used a down-c unter t implemt'nt the OllrD 1<11 'mpon nt .1nJ
several registers along with muxes r the component. We u>t'd many 3JJe
to sum the in ollling tmn du er , ignals. nJ \\e ust'd a multiplit'f to \\clgh tIK
incomi ng !'ignab.
216 4 Datapath Components
Future Challenges in Ultrasound
Over the past two decades. ultrasound machines have, moved from mostl y machines
to mosll y digilal machines. The digital syslcmS conSISI of bOlh CUSlom dI gItal CirCUIts and
software on DSPs and microprocessors. working together (0 creHl e real -time Images.
One or the mai n trends in ultrasound machines involves crcating three-di mensional
(3- D) images in realtime. Most ultrasound machines or the I 990s and 2000s generated two-
dimensional images. with Ihe qualit y or those images (e.g .. more rocal points per image)
improvino during those decades. In contrast to two-dimensional ultrasound. generating 3-D
images r:quires the regi on of interest from differen,l perspecti :,es, just li ke people
vicw things from lheir tWO eyes. Such generation also requires extenSi ve computations to
creale a 3- D image from the twO (or more) perspecti ves. The result is a picture li ke that in
Fi gure -1 .95.
Thal's a fetus' face. Impressive. isn'l il ? Keep in
mind that image is made solely from sound waves
bouncing int o a woman's womb. Col or can also be
added 10 distinguish among different Ruids and ti ssues.
Those computati ons take time, but faster processors.
coupled with clever custom digital circuits, are
bringing real-time 3- D ultrasound cl oser to reality.
Anot her trend is toward making ultrasound
machines small er and lighter, so that they can be used
in a wider vari ety of health care situations. Earl y
machines were big and heavy, with more recent ones
comi ng on roll able cans. Some recent versions are
Figure 4.95 3- D ullrasound image
of a fe lus's face. Photo counesy of
Phi lips Medi cal Syslems.
handheld. A related trend is making ultrasound machines cheaper. so that perhaps every
doctor coul d have a machine in every examination room. every ambul ance could carry a
machine to help emergency personnel ascenain the extent of cen ain wounds. and so on.
Ul trasound i used for numerous other medi cal appli cations. such as imaging of the
heart to detect artery or valve problems. Ultrasound is also used in vari ous other appli ca
tions. like submarine region monitoring.
4.14 CHAPTER SUMMARY
In this chapter. we began (Secti on 4. 1) by introducing the idea of new bu ilding blocks
intended for common operati ons on multibit data, wi th those blocks being known as data
path components, or register-tran fer-level component . We then introduced a number of
datapath components. incl uding registers. shifters. adder. comparator. counters. multi
pliers, ,ubtractors. arithmetic- logic uni ts. and register fi le,. F r each component, we
examined two a pects: the internal design of the component , and the u,e of the compo
nent in applications.
We ended (Secti on 4. 13) by describing some principles underl ying the opera
tion of an ultrasound machine, and showing how several of th ' datuput h components
might be u'>ed to implement pans of such a machine. One thing YOll mi ght n ti e i how
de\igning a real ult rasound machine would require ,ome knm ledge of lhe domnin of
)
4.15 Exercises 217
ultrasound. The requirement th t f
understanding of an r . a a so tware programmer or digi tal designer have some
. app Icall on domam IS quite common.
I n the commg chapter yo '11 I
sequenti al logic desi (' u WI app y your knowledge of combinational logic design,
cuits that c . I gn controll er deSIgn), and datapath components, to bui ld digital cir-
an Imp ement very general and powerful computations.
4. 15 EXERCISES
'\
.L .......
ExerFci ses marked with an asterisk (*) represent especiall y chal lenging problems.
or exerCIses relallng to data th .
bl h . pa components, each problem Indi cates whether the
pro em emp aSlzes the component 's internal design or the component 's use.
SECTION 4.2: REGISTERS
Trace the behavior of an 8-bil araJl I I d ' ..
. I . P e oa register With IIlpUI I. output Q. and load conrrol
IIlpUI d by compl ellng the foll owing liming di agram.
Id
Q
Trace the behavior of an 8-bit parall el load regi ster with input I. OUtpUI Q. load conrrol input
Id, and synchronous clear IIl put clr by completing the following timing diagram.
H
-l.S
_____________ ____ _
ctr --------____ --.l L--___
clk
Q
Design a 4-bit regist er with 2 control inputs 51 and sO, -1 data inputs I .11. II. and 10. and 4
dala outputs Q3. Q2. Q I. and QQ. When s 150=00. the regisler maint:uns its "3Jue. \\'hen
5150=01. the register loads 13 .. 10. When slsO=I O. the register clem itself ro 0000. When
s I sO= II. Ihe regisler complements itself. so, for enmple. 0000 would become 1111. 3ild
1010 would become 0101. (Componem design problem.)
Repeat Ihe previous probl em. but when s IsO= II. the regisler re\'erses its bits. so 1110 ",auld
become 011 1. and 1010 would become 0101. (Component design problmLl
Design an -bit regisler with 2 control inpuls sl and O. data input> I . .lD. and J;uu outpul>o
Q7 .. QQ. s lsO=OO means mai nl ain the prescnt \alue. IsO=01 me3n. load. 3ild IsO=lO me>n>
clear. s I sO= I I Illeans to swap Ihe high nibble with the 10" nibble (3 nibble is 4 l>il:>\.
11110000 w uld become 000011 11. and 11000101 " ould tx>rollle 010111 . '(',.""1'<"1<'/11
drsig" problelll. )
218 Datapath Components
If0jS
I lice officer is always outputting;] radar and the
-' ,6. The radar gun used b) .1 ,POass However. when the officer wants to tIcket <Ill mdlvldual for
of the caf !<o as the) p. . "'d 'd of the caT on the md3f unit. Bui ld a system to
d' I " I swc the mcaSUft: spel: . .
spec mg. k: . rc for the r:ldaf gun. The system ha!) an 8-bll speed mpul 5, an
impl ement S3\\; fe.llu th . d r gu n and an 8-bil output D that will be sent to the
input 8 from the S3\C butl on on e r.l a ... .
. d d' pi ")' (ColIIl'Jollellf li se problem.)
mdar 5 gun
SECTION ADDERS
'" " . rino nt the outputs of a 3-bit carry-ripple add\! r for one full-adder
-'.7. Trace the \ alues .tppe.l e . . h 011 Assume all inputs were prevIOusly zero for a
delay time peri od. when adding I II \VII .
long time. . . d d
- . I' I f I time unit . comput e the longest tllne require to a d two
... . 8. Assu[11Jng nil gates have n de 0 dd
numbers usi ng an S-bi! carry-npple a er. . .
. I 0 have;] dclay of 2 lime uni ts. OR gates have a delay of I lime unit, and
-'.9. Assuming AND cates f 3 . e units compute the longest time required to add two
XOR 2ateS have a del:ly 0 tlJll
numbe;s usi ng an S- bit carry-ripple adder.
Dcsi2n a carry- ripple adder using carry-rippl e adders. (Componelll use problem.)
If0jS De;i: n an odder lhOl computes the sum of three S-bil number. using S-bil carry-ripple adders.
lise problem.) . . .
Des ion an adder thaI computes lhe sum of four S-bil numbers. uSIOg S-b" carry-npple adders.
If0jS -l.12 (Co117
p
Oll elllllse problelll.)
... 13 Design a digital thermometer lhat can compensate for errors in the sensing
. devi;e's output T. which is an S- bit inpu.t t? our system. The can be
osi li\'e onl y. and comes to our system VIU Inputs a. b. and c. .1 3-pln DIP switch. Our
p h "' nsated temperature on an 8-blt output U. (CompOllelll liSt
system should output t e compe
problem.)
Repeal the previous problem. except that the compensati on amount can positi ve. or nega-
ti ve comino to our system via four inputs a. b. c. and d from a 4-pl.n DIP switch. The
amount is in two's complement form (so lhe . scttlng the DIP switch
beller know that!). Design the ci rcuit. What i the range by which the Input temperat ure can be
compensated? (Compone11l lise problem).
We can add three 8-bil numbers by chaining one 8-bil corry-ripple adder 10 the Outpul of
another -bil carry-ripple adder. Assuming every gate has a delay of I lline- Unlt. the
longe" delay of thi s lhree 8-bit number adder. Hint: you may hove to look carefull y ,"Side the
carry-rippl e adde", even in; ide lhe fu ll -adder; . to correct ly compute lhe longesl delay from
any input to any output. (Compolle11l use problem.)
SECTlO, 4.4: SHIFTERS
4.16 De; ign an 8-bi t shifter lhat shifts its inpuls lWO bits to lhe ri ght (' hifling in Os) when the
shi fter\ 'hift control inpul is I. (Compollell l desigll plVhl em.)
c-: -I 17 Design a ci rcuit thaI OUlput, the avemge of four 8-bi t input ' rcpre,enllng binary numbers (not
PLUS . in two', complement form). (CompOll elll ll le pmhlelll. )
Dc"sn a CIrCUit thaI take, an 8-bi tlnput D repre<;ent'"g binary number. (not in two's compl<
ment rorm). and outputs two that \<tluc. (Componelll IHl' IUy/blr",. )
-1.19 De"gn a eircUitthat output , nine tim .. 11' 8-blt ,"put D reprc,enllng blnllry numbers (not in
two\ complement form). II lnt: \e:1 and an odder ( o",po"elll 'HI!
4.15 Exercises 219
4.20 Design a special multipl ier ei rcuil lhal can multiply ilS l6-bil inpul by I. 2, 4. 8. 16. or 32.
speCified by lhree ,"puts a, b. c (abe=OOO means no multipl y. abc=OOl means multiply by I.
abc=OIO means by 4, abe=OII means by 8. abe=IOO means by 16. abe=IOI means by 32).
H'"t: Use a predefined component deSCribed in lhi s chapler. (Component use problem.)
4.21 Trace lhrough lhe execul ion of lhe barrel shifter shown in Figure 4.42. when 1=011 00101. x =
I. Y = 0, Z = I. Be Sure to show how the inpul I is hifted after each internal shi fter stage.
4.22 Trace through the execuli on of lhe barrel shiftershown in Fi gure 4.42. when 1=1 0011011. x =
0, y = I, Z = O. Be sure to show how the input I is shifted after each iniernaJ shifter stage.
4.23 Using the ba,:,el shifter shown in Fi gure 4.42, whal settings of the inputs x. ). and z are
required to shift lhe ,"PUI I left by six posilions?
SECTION 4.5: COMPARATORS
4.24 Trace through the executi on of the 4-bit magnitude comparator shown in Figure 4.45 '" ben
a = 15 and b= 12. Be Sure to show how the comparisons propagate thought the individual
comparators.
Desig.n a comparator that determines if three 4-bit numbers are equaL by connecting 4-bit
magnitude comparators together and using additional components if necesS3I). ( ComponenJ
use problem.)
4.26 Design a 4-bit carry-ripple slyle magnitude comparator that has two outputs. a greater-than or
equal-to output gl e, and a less- than or equal -to output lIe. Be ure to clearly sho\\ the equa-
tions u ed in developing the indivi dual I-bit comparators and how they are connected to fonn
the 4-bi t circuit. (Compoll ellf design problem.)
4.27 Design a S-bil magnitude comparator. (Compollelll design problem.)
4.28 Design a ci rcuit thaI outputs I if the circuit'S S-bit input equal 99:
(a) usi ng an equal il y comparator,
(b) using gates onl y.
Hint: In the case of (b). you need only I AND gate and some imeners. (Componem us,
plVblem.)
4.29 Use magnitude comparators and logic to design 3 circuit that rompme5 the minimum of three
8-bit numbers. ( Componelll use problem. )
4.30 Use magnitude comparators and logic to design a circuit that compme5 the ma..,irnum of (Wo
16-bil numbers. (Compollelllilse problem. )
4.31 Usc magnitude comparators and logic to design a circuit thut outputs 1 \\hen an -bit lDput is
between 75 and 100, incl usive. (Compoflenr use problt'm.)
-1.32 You are to design 0 human body temperature alarm system for a h pit.!. Your 'tern
an 8-bi t input repre enting the temperature. whirh can range from 0 to :.!.55. If the nle:lSured
lemper:llure is 95 or less, you should set omput A to I. If the temperature I> 96 to 10-l.
should set out put B 10 I. If the temperature is 105 or abo\"e. u should set output C t 1.
( Companelll lise problem"
4.33 You are working as It weight gue in an amusement p3.fk. Your job is to tr) to go -- tM
weight of an individual before they on the scale. If .. .. i!-' n)( "ithllli n of
the individual'S octuulll'cight (higher or lo\\er)_ the indh'idual \\In-. pll2 BUild 3 \\ !$ht
analyzer system that OUtputs \\hether the \\ib "llhin ten 'Th-e \\
guess ullnl) ll! r has an -bit input G. J.I1 :'>- bit input from the S('31e \\ \\lth the... ::t
\\eight . and a outpUt C that is I irthe \\clght \\.b \\ ithlO Jeri"'-,\! hmlb of
lhe game. (CompoIJt'IJI usr l,mblem.)
220 Datapath Components
SECTION COlJNTERS
".J'" Design a 4-bi t up-counter that hrl s twO control inputs: elll enables cOllllling up. while clear
synchronously resets the counter to all Os:
(3) using a parallcl IO::ld regi ster as a building block.
(b) using flip-flops and 11111XeS directly by following the regiSlef design process of Section 4.2.
(Componelll desig" problelll.)
.t35 Design a 4-bit down-counter that has three control inputs: elll enublcs cOll ll li ng up. clear syn-
chronously resets the counter to all as. ;} nd sef synchronously sets the coumer to all Is:
(3) lI sing 3 parall el load regi ster as a building block.
(b) usi ng fl i p-flops and muxes direct ly by foll owing the register design process of Section 4.2.
(Compoll ellf design problem.)
.tJ6 Design a 4-bi l up-counter with an additional output IIpper. tipper outputs a I whenever the
counter is within the upper hal f of the counter's range, 8 to 15. Use J basic 4-bi t up-counter as
a bui lding block. (Compoll elll desigll problem.)
-1.-'7 Design a 4-bit up/down-counter lhat has four conuol inputs: CIICUP enables counting up.
elll down enables counlin! down. clear synchronously resets the counter to aliOs, and set
seLS the to all I s. If bot h counl control inputs cm_lIp and cllt_dow,l are
I. the counter will retain its current count value. Use a parallel load register as a building
block. (Component design problem.)
.. 38 Design a circuit for a 4-bit decrementer. (Component design problem.)
Design an electronic turnstile system using a 64-bit counter. The input is a bit A, which is I
for exact ly one clock cycle whenever a person walks through the turnstil e. The output is a 64
bi t binary number. A second input 8 is 1 whenever a reset button is pressed. and should reset
Ihe OUIPUIIO Os. Knowing Ihal California' s Di sneyland altraclS aboul 15.000 visilors per day,
and assuming they all pass through your one turnstil e. how many days would pass before your
counter would roll over? (Compone1l1ltse problem.)
(a) Using an up-counter with a synchronous clear contcol input , and extra logic. design a
circui l Ihal OUIPUIS a I every 99 clock cycles.
(b) Design Ihe counler from part (a). bUI use a down-counler wi lh parall el load.
(c) Whal are Ihe tradeolTs belween Ihe IWO designs from parts (a) and (b)?
(Compone1l1llSe problem.)
4AI (a) Gi ve Ihe counl range for Ihe foll owing sized up-counlers: 8-bils. 12-bil s. 16-bils. 20bi lS,
32 bi lS. 40bils. 64bilS, and I 28-bils.
(b) For each size of counler in part (a). assuming a I Hz clock. indical e how many minules.
hours, days, etc .. the counter would counl before wrapping around.
SECTION 4.7: MlJL TIPLIER-ARRA Y STYLE
Assuming all gales have a delay of I limeunit. which of the foll owing designs will compule
the 8-bil multiplicalion A ' 9 fasler:
(a) a ci rcuil as designed in Exercise 4.19. or
(b) an 8-bil array slyle multiplier with one of ils connecled 10 II conSlanl value of nine.
4.43 Design an 8bi l array,slyle multipli er. (CompOll elll desigll proMem.)
De;ign a more accurale version of the Celsius 10 F"hrcnheil convert er from Example 4. 10.
The new conver$ion circuit receives n digitized temperature in Cebi u us n 16-bi t binary
number C and OUIPUIS Ihe lemperalure in Fahrenheil as a l 6-bil OUIPUI F. Our more accurnle
equali on for calcul aling an approximate conversion from cl,i ll ' 10 Fahrenheil is: F = C'301
16 + 32. (Compoll elll lise probl em.)
4.15 Exercises 221
SECTION 4.8: SlJBTRACTORS
4.45 Creale Ihe internal design of a fu ll .
446 C h ' Sublraclor. (Compollelll design problem)
.. onvert t e foll OWing two's com I .
(a) 0000 II II P emenl binary numbers 10 decimal numbers:
(b) 10000000
(c) 1000000 1
(d) 11111111
(e) 100 10101
4.47 Conven the following Iwo's co I .
(a) 0 1001 101 mp ement binary numbers 10 decimal numbers:
(b) 0001101 0
(c) 111 0 1001
(d) 101 0 1010
(e) 11111100
4.48 Convert Ihe foll owi ng IWO's complemenl b' be .
(a) 111 00000 Inary num rs 10 deCImal numbers:
(b) 01 111111
(c) 1111 0000
(d) I 1000000
(e) 111 00000
4.49 Convert Ihe foll owing 9bi l IWO'S compleme I b' b .
(a) 011111111 n Inary num ers 10 deCImal numbers:
(b) 1111111I1
(c) I 00000000
(d) I I 00000oo
(e) 11111111 0
4.50 Convert Ihe foll owing decimal numbers 10 gb1 I" I .
(a) 2 I wo S comp ement binary ronn:
(b) - I
(c) -23
(d) - 128
(e) 126
(f) 127
(g) 0
4.51 Convert the foll owin!! decimal numbers 10 b'
(a) 29 - . II IWO'S complement binary ronn:
(b) 100
(c) 125
(d) - 29
(e) - 100
(f) - 125
(g) - 2
222 Datapath Components
. g. bit tWO' s complement binary fOfm:
Convert the roll owi ng decllllal numbers to
(a) 6
(b) 26
(c) - 8
(d) -30
(e) -60
(I) -90
(g) - 120 .
.... .' 9.bil twO' s compl ement binary fonn:
"'.53 Convert the foll owmg deCimal numbers to
(a) I
(b) - I
(c) -256
(d) -255
(e) 255
(I) -8
( 0) - 128
o th t has three S-bit inputs. A. B, and C, and a single
"' .5-& Usin2 4-bi t subtractors. bui ld a sublraclOf a bl )
8-bi t .... output F. where F=(A-B) - C. (Compollelllils
e
pro em,
. . that di oi li zes a temperature int.o a 16-bit binary number K
..a .S5 You are given a digital thermometer e 0 a 16-bit Fahrenheit value. Use the fol -
in Kel vin. Build a system to convert that temperalUre ,I *
. . ' d xi m, te converSIOn: F= (K-273) 2+32. (Compoll ellt lise
lOWing equauon (0 proYI e an appro
problem. )
SECTION 4.9: ARITHMETIC-LOGIC UNITS-ALUS
Desion an ALU with two 8-bit inputs A and B. and control signals x, y, and z. The ALU
should support the operations described in Table 4.3. Use an 8- blt adder and an anthmeuc!
logic extender. (Componefll design problem.)
TABLE 43 Desired ALU operati ons.
Inputs
Operalion
X y
0 0 0
S=A-B
0 0
S=A+B
0 0
S=A " S
0
S = A / 8
0 0
S = A NAND B (bitwi se NA D)
0 1
S = A XOR B (bitwise XOR)
0
S = Reverse A (bi t reversal)
S = NOT A (bitwi se compl ement)
4.57 Design an ALU wilh two 8-bit inputs A and B. and contro l x. y. and z. The
should support the operati ons described in Table 4.4. Usc an 8-blt adder and an anthmeuc!
logic extender. (CoII/ ponenl design problem.)
4.15 Exercises 223
TABLE 4.4
Desired ALU operations.
Inputs
X y
Operation
0 0 0 S-A+B
0 0 1 S = A AND B (bitwise AND)
0 0 S=A
AND B (bit wise NAND)
0 1 S = A OR B (bit wi se OR)
0 0 S = A NOR B (bit wise NOR)
0 1 S = A XOR B (bit wi se XOR)
0 S = A XNOR B (bi twise XNOR)
S = NOT A (bi twise complement )
4.58 An instructor teachi ng Boolean algebra wants to help her students learn and understand basic
Boolean operator.; by providing the students wi th a calculator capable of perfomling bitwise
AND. NAND, OR. NOR. XOR. XNOR. and NOT operations. Using the ALU specified in
Exercise 4.57. bui ld a simple logic calculator using DIP switches for input and LEDs for
output. The logic calculator should have three DIP swi lch inputs to select which logic opera-
ti on 10 perform. (CompOll elll use problem. )
SECTION 4.10: REGISTER HLES
4.59 Design an 8x32 two port (I read. I write) regi ster fi le. (Compollent design problem)
4.60 Design a 4x4 three port (2 read. I wri te) register fi le. (Compoll em design problem_)
4.61 Design a IOx l 4 register fi le (one read port. one write port). (Compollem design problem)
4.62 " Create a speed-dial system for a telephone. Ei ght speci:1l bunon bO-b 7 access each stored
number. The most recently dialed number exists as ni ne digits stored in nine 8-bi{ regi rers RO-R .
When the phone user presses another button S simultaneously with any bunon bO-b7. the most
recently dialed number gets stored in the button's corresponding storage. When the user presses a
button bO-b7 by itself. the number in that button' s storage gets read out and placed on nine -bit
outputs PO-P8. Hint: use nine regi ster fi les and some extra logic. (Componenl use problem. '
224 4 Datapath Components
DESIGNER PROFILE
Roman began slUdying
Computer Science in
college due to his interest in
soflware development .
During hi s undergraduate
studies. his interests
expanded 10 incl ude digilal
design and embedded
syslems and eventually led
hi m to become involved in
research developi ng new
melhods 10 hel p designers
quickly build large integraled circuils (IC). Roman
conLinued his educati on through grnduate studies and
recei ved hi s M.S. in Computer Science. after which Roman
worked for bOlh a large company designing integrated
circuits (Ie) for consumer electron ics as well as a slart-up
company focusing on high-performance processing.
Roman enjoys working as both a software developer
and hardware engi neer and believes that "fundamentally
soft ware and hardware design are very similar. both
relying on efficientl y solving difficult problems. While
good problem solving skill s are important, good learning
skill s are also imponant." Contrary 10 what many studenls
may believe. he points au[ that "Ieaming is a fundamental
activity and ski ll mat does not end when you recei ve your
degree. In order 10 solve problems, you often are required
10 leam new skills. adopl new programming languages and
10015. and delennine if existi ng solutions wi ll help you
solve the probl ems you face as an engi neer." Roman points
out that digital design has changed at a rapid pace over the
last few decades. requiring engi neers to leam new design
techniques, leam new programming languages, such as
YHDL or SyslemC, and be able 10 adopl new lechnologies
10 stay successful. "As the industry continues to advance al
such a rapid pace. compani es do not onl y hjre engineers
for what they already know, but more so on how well those
engineers can continue to expand their knowledge and
leam new skill s," He poinls oUI Ihal "college provides
slUdenls wilh an excell ent opportunity 10 not only learn the
essemi al infonnation and skill s from their course work but
also to learn additional infonnation on their own, possibly
by learning differenl programmi ng languages, gelting
involved in research. or working on larger design projects."
Roman is mOli valed by hi s enjoymenl of Ihe work he
does as well being able 10 work with other engineers who
share hi s interests. "Motivati on is one of the keys to
success in an engineering career. While motivation can
come from many different sources, finding a career that
you are trul y inleresled in and enj oy reall y helps. Co-
workers are also a great source of moti vation as well as
knowledge and lechni cal advice. Working as a member of
a team that communi cates well is very rewarding. You are
able 10 mOli vale each olher and use your strengths along
with the strengths of your co-workers to achieve goals far
beyond Ihal which you could achi eve on your own."
5
Register-Transfer level
(RTl) Design
5.1 INTRODUCTION
In the previ ous chapters, we've defined the combinational and sequential components
needed to build di git al systems. In thi s Chapter, we'lI learn to build interesting and useful
di gi tal syslems from those components. In particular. we'lI put LOgether datapaLh compo-
nent s 10 build datapaths, and we'll use controller to control those datapaths. The
combinati on of a controll er and datapath is known as a processor. Some processors. like
Ihose in personal computers. are programmable-those processors are the focus of
Chapter 8. Other processors are custom-designed for a parti cular task. and are nOl pro-
grammable-<lesign of such custom processors is the focus of this chapter.
Di gital designers today focus largely on designing cuslom proces ors. as opposed to
designing lower-level digi tal components. We can define a custom proces or as a digital
circuit that implements a computer algori thm-a sequence of instruction that carry Out a
pani cul ar task. For example, we can define an algorithm to filler out noise from a digitized
stream of audio. and we can then create a processor to implement that algorithm. Another
algorithm might encrypt data for secure electroni c commerce purpo es. An algorithm might
compare a fingerprint to a set of 10.000 fingerprints to quickly enable a pou e officer to
detemli ne if someone is a wanted criminal. An image processing algorithm might detect a
lank in a large video image. Beamfonning. pan of the ultrasound machine example in the
previ ou chapler. can be thoughl of as another algorithm. implemented u ing the processor
design described in that chapter. In facI, several of our exanlples in the previou chapler. like
the above- mirror di splay, DIP-switch-based calculator. and color space on\'ener. an a tu-
all y be thought of as very simple proce sors implementing imple algorithms.
Processors can be designed using different design method. The 010 t ommon
method in pm tice loday is known as register-transfer le\'el de ign. Regisrer-transfer
level desigll , or RTL design. actually consists of a wide variety of approache- but in gen-
eral. a des igner specifie the registers of a design. des ribe. the po ible tr.lnsJe _ and
operati ons perfoml ed on input. output . or register data. and define the ontrol that pe.-i-
fies when to transfer and operate on data.
Recall the design processes we defined for combinational logic des ign in hapler 2.
and f r sequenti al logic (controller) design in Chapter 3:
226 Registe r-Transler LevellRTLI Design
In the combinat ional logic design process outlined in Tabl e 2.5, .. .
I . The first step was to caplllre the desired behavior of the comblll auonal logtc,
wit h either a truth table or an equation.
2. The rcmaining stcps were to cOllller / the behavior to a circuit.
In the sequenti al logic (controffer) design process in Table 3.2. . . .
I. The first step was to caplll re the desired behaVior of the sequenual logtc, usmg
a finite-state machi ne.
2. The remai ning steps were to convert the behavior to a circuit.
It should therefore come as no surpri se that: .
I. The first step of an RTL des ign method wi ff be to captll re the des ired behavior of
the processor. We' ff introduce the concept of a hi gh-l evel stat e machme for cap-
turi ng RTL behavior.
2. The re maining steps wi ff be to cOlI l/ertthe behavior to a circuit.
Figure 5. I il lustrates the idea that the design process
be viewed as first capturing behavior and then con-
venin o the behavior to structure. That process applies
regardless of whether we are performing combinati onal
logic design. sequential logic design, or RTL design.
In thi s chapter. we wiff introduce the RTL design
process. also known as the RTL design method. As the
process is largely creati ve, we wiff utili ze numerous
examples to iff ustrate the process. We wi ff also intro-
duce several hi gh-level component s that are useful
during RTL design. includi ng memory component s and
queue components.
5.2 RTL DESIGN METHOD
Capture behavior
Convert to circuit
Figule 5.1 The design process.
RTL des ign is carri ed out using a wi de variety of methods in practice. but it may be
useful to defi ne a general method as in Table 5. 1
TABLE 5.1 RTl design method.
Step
CapfUre {I high-level
Q. S(llfe machine
cii
; a darapllI/i
<0
COlin eel the datapath
" 10 a cOli/ roller
.., Derive 'he
e- cOllfmller 's FSM
cii
Description
Describe the system's de ired behavior a a hi gh-level state machine.
The Slale machine consists of slales and Lransil ions. The Slate machi ne
is "high-level" because the transition condili ons and the stal e actions
are morc than just Boolean operations all bit inputs and outputs.
Cleate a datapat h to carry out the data opeltl ti ons of the hi gh-level
Slale machine.
Connect the datapmh to a controll er bl ock. Connect external Boolean
inputs and output to the controll er block.
Convert the hi gh-level state machine to a fini te-,t.te machine (FSM)
for the controll er. by replaci ng data operati ons with sctting and rending
of control signal s to . nd from the dutapath.
5.2 RTL Design Method 227
A fifth step may be necessary, in which one selects a clock frequency. Designers
seeking high performance may choose a clock frequency that is the fastest possible based
on the longest register-to-register delay in the final circuit.
Impl ementing the controff er's FSM as a sequenti al circuit. as we learned in Chapter
3, would then compl ete the design.
Notice that the first step captures the desired behavior, whi le the remaining step
COII l/ert that behavior to a circui t.
We' ll first provide a smaff and simple exampl e as a "preview" of the RTL design
method's steps, before we define each step in more detaif.
EXAMPLE 5.1 Soda machine dispenser
We are (0 design a processor for a soda di spenser. A coin dClcclOr
provides our processor with a Ibit input c mal becomes 1 for
one clock cycl e when a coin is detected. and an 8-bit input a
indicaling the coin's value in cents. Another 8.bit input S indi.
cates the cost of a soda (thi s cost can be set by the machine
owner). Once the processor has seen coins whose value equals or
exceeds the cost of a soda. the processor should set an OUlput bit
d to 1 for one clock cycle. causing a soda to be dispensed (thi s
machine has only one type of soda). The system does not give
change-any excess money is kept. Fi gure 5.2 provides a block
symbol of the system.
c_
d _
Soda
dispenser
processor
Figule 52 Soda dispenser
block symbol.
Step 1 of our RTL design method is to capture the
desired behavior of the system. Figure 5.3 shows a
hi gh-level state machine describi ng the desired
behavior. The first state. Ill il. sets the output d to 0
and initiali zes a local register tot 10 O. tot will
keep track of how many cents the syslem has seen
so far. The Slate machine then enters stare Wail.
(Recall from Chapter 3 that a transiti on with no
condi ti on has an impli cit "[rue" condition. and thus
transiti ons on the next rising clock edge.) The FSM
stays there as long as no coin is detected and the
total cents seen so far is less than the cost of a soda.
When a coin is detected. the stale machi ne goes lO
state Add. whi ch adds the coin's value to t o t. and
then returns to stale Waif. Once tot is greater than
or equal to (in other words. nO( les than) the cost
of a soda, the state machine goes to stale Disp.
which dispenses a soda by selling d to 1. The state
machine then returns to Slale /"il .
Inputs: c (bit ), a(8 bits), s (8 bits)
Outputs: d (bit)
Slep 2 is to create a datapath. We'll need a local
regisl er for tot. an adder connected to tot and a
to compute tot + a, and a comparator con
ne ted to tot and S to compute tot<S. The
resulting dalapUlh appears in Figure 5.4.
Local registers: tot (8 bits)
d=t
Figure 5.3 Soda dispenser high-!e\.!
m:Jchinc.
228 Register-Transfer l evell RTLI Design
Step 3 is to connect the datapath to a
controller. Figure 5.5 shows the con-
nections. Notice that the controller's
input s and outputs arc all just one-bit
signal s.
Step 4 is 10 derive (he comfoller's
FSM. The FSM has the same states
and transitions as the high-level stale
machine, but ut ilizes the datapath 10
perfoml any data operati ons. Fi gure
5.6 shows the FSM for the controll er.
in the hi gh-level stale machine. stale
fil iI had a data operarion of tot =
o (tot is 8 bits wide. so tha! assign-
ment of a is not a single-bit
operation). We replace that assign-
ment by selling tot_c 1 ~ 1, whi ch
clears the tot register to O. State
Wait's transitions had data operati ons
comparing tot < s. Now we have d
a comparalOr computing thaI com-
parison for the controll er. so the
controller need onl y look at the
result of that comparison in the
signal tot_l t_s. State Add had a
da!a operati on of tot ~ tot +
a. The da!apath computes that addi -
tion for the controller using the
adder. so the controller merely needs
to set to t_ 1 d ~ 1 to cause the addi -
tion result to be loaded int o the tot
regi ster.
To complete the design, we
would implement the cont roll er's
FSM as a Slate register and combi-
national logic. Figure 5.7 shows a
partial state table for the controll er,
with the states encoded as /lIil : 00,
Wail: 01. Add: 10, and Disp: 11. To
complete the controll er design, we
would complete the state table.
create a 2-bit Slate register, and
crcate a ci rcuit for each of the five
outputs from the table. as discussed
in Chapter 3. Appendix C provides
details of compl eting the controll er' S
design. That appendix al so traces
through the functioni ng of the con-
troller and datapath wi th one
another.
E
'iii
!:
"0
"0
<:
a.
'" 0
Datapath
Figure 5.5 Soda di spenser controller
and datapath connecti ons.
InpulS: c, toUCs (bit)
Oulpuls: d, toUd, toCctr Ibit)
Controlier
d=l
Figure 5.6 Soda di spenser contoll er FSM.
15
I-
s t sO c
,;:
nl nO d
0 0 0 0 0 t 0
0 0 0 1 0 1 0
0 0 1 0 0 1 a
0 0 1 1 0 1 0
0 1 0 0 1 1 0
0 1 0 1 0 1 0
0 1 1 0 1 0 0
0 t 1 1 1 0 0
1 0 0 0 0 1 0
... ...
1 1 0 0 0 0 1
... ...
15
1-
0:
0
0
0
0
0
0
0
0
1
0
Figure 5.7 Sada di spenser controll er's stute table
(panial ).
toUd
0
I-
g
t
1
t
1
0
0
0
0
0
0
5.2 Rll Design Method 229
The previous exampl e gave a preview of the RTL design method. Notice that we
started with a high-level state machine, which wasn't just an FSM because there were
local registers declared, and because there were dat a operati ons (rather than just Boolean
operations) in the states and on the transiti ons. We then created a datapath to implement
those local registers and to carry out the data operation. We further needed a controller to
control that datapath. We defi ned the behavior of that controll er to be the same as the
behavior of the high-level state machine. except the contrOller' s FSM used datapath
cont rol signals to carry out and evaluate the datapath operations. Finally, we could design
the controll er using Chapter 3's Controller design process.
We now di scuss each RTL design method step in more detail, while illustrating each
step with another example.
Step 1-Creating a High-Level State Machine
A hi gh- level state machine is a comput ation model similar to a finite-state machine_ but
with additional features that enable the descripti on of computations involving more than
just Boolean data.
Recall that a finit e-state machine (FSM) consists of inputs. outputs, states_ state
acti ons (a mapping of states to output va lues), and state transitions (a mapping of state
and inputs to next states). However, the inputs and outputs of an FSM are limited to
Boolean types, actions are limited to Boolean equati ons, and transition conditions are
limit ed to Boolean expressions. These limitations make specifying of computations
involving data cumbersome, other than for just si ngle-bit data.
Fi gure 5.3 showed a high- level state
machine describing the behavior of a soda dis-
penser processor. Notice that the state machine is
not an FSM because of the severa l reasons hi gh-
li ght ed in Fi gure 5.8. One reason is because the
state machine has inputs that are 8-bi t types,
whereas FSMs only all ow input and outputs of
Boolean types (a single bit each) . Another reason
is because the state machine declares a local reg-
ister tot to store int em, ediate data. whereas
FSMs don' t all ow local data storage-the only
"stored" it em in an FSM is the stat e itself. A
third reason is because the state acti ons and tran-
sition conditions involve data operations. like
InpulS: c (bit), t8 bitsl. s (8 bits)
Outputs: d (bit)
Locat registers: tot (8 bits)
d=l
Figure 5.8 ada dispenser high-le\eJ
State machine with noo-FSM con.sttucts
hi ghlighted.
tot = 0 (remember that tot is S-bits wide). tot < s (there' no .. <_. Boolean oper-
ator), and tot ~ tot + a (where the "+" is addition. not OR. and there's no addinon
Boolean operator). whereas an FSM all ows only Boolean equations and expre <tons. _
Therefore. a useful foml of hi gh-level state machine i an extenston of an F 1\I lD
whi ch:
input s and outp uts may involve dma types beyond just single bits.
local registers may be declared (of various data type ). and .
actions and condition may involve general arithmetic equmion. and e: prenoru;.
rather than just Boo)c.1t1 equations and expressions.
230 Register Transfer Level (RTL) Design
Sli ch a high-level state machine is not the onl y possible ex.tension to an FSM.
of varieti cs of extended FSMs exist. However. we will be lItlhzlI1g the above-descnbed
extended FSM variety throughout this chapter. That parll cul ar vari ety of hI gh-level state
machine is someLimes call ed an FSM with data . or FSMD. .
We will continue to use the foll owing conventi ons for hI gh-level state machines,
which we also used for FSMs:
Each transi ti on is impli citl y ANDed with a ri sing cl ock edge.
Any bit output not expli citl y assigned a value in a Slat e is implicitl y assigned a O.
NOIe: thi s conventi on does not appl y for mulllbJl output s.
We now provide anoLher example of describing a sys tem using a high-level state
machine.
EXAMPLE 5.2 Laser-based distance measurer- High-Ievel state machine
There are coullIless applications thnt require one to accurately measure distance of an object
from a known point. For example. road buil ders need to the .Iength of a
of road. Map makers need to accurately determine the locat.lOns a.nd. heights of hill s and mountainS
and the sizes of lakes. A t! iant crane for constructing skyfl sc bUi ldings needs to accurately deler
mine the distance of the crane arm from the base. In all of these appli cnti ons. stringing out
a tape measure to measure the ....di stance is not very practical. A bellcr method involves laserbased
distance measurement.
In laser-based distance measurement. a laser is paimed at the object of illlerest. The laser is
briefly turned on. and a timer is started. The laser li ght , traveling at the speed of light. travels to the
object and refl ects back. A sensor detects the refl ecli on of the laser li ght . causing the timer to stop.
Knowing the time T taken by the light to travel to the object and back. and knowing that the speed of
light is 3x lOB meters/second. we can compute the distance 0 eas ily by the equati on: 2D = T seconds
* 3x meters/second. Laser.bused distance measurement is illustrated in Figure 5.9.
o
20 = T sec . 3xl 0
8
mlsec
Figure 5.9 Laserbascd distance measurement .
Objectot
interest
Let 's design a processor to control the laser and the timer and to comput e di slHnces up to 2000
meters. A block diagram of the system is shown in Figure 5. 10. The system has a bit input B. which
equal s 1 when the ulicr a butt on to stan the measurement. Another bi t input S comes from
the ' en,or. and is I when the rcnected laler is detected. A bit output L control. the luser. turning the
la,er on when L i, 1. Finall y. an N-bit output D indicates Ihe diltance in binary. in units of meters-
we' ll aSlume a display converts that binary number into a decimal number alld displ ays the
on ''" LCD for the U; cr to read. D will have to be at lealt I I bitl. sill c I I bi ts cun represent the
number; 0 to 2047. and we want to measure dillanCc,1 up to 20()() metefl. Let' l make D 16 bits.
from bunon B
Laserbased
distance
measurer
L
S
5.2 RTL Design Method
to laser
from sensor
Figure 5.10 Block diagram of the laser based di stance measurement system.
Step I-Create a hi gh-level state machine.
231
We can describe the overall comrol of the system using a hi gh-level stille machine. To facilitate the
creation of the Sia le machine, we enumerate the sequence of events underlying the measurement
system:
The system powers on. Initi all y. the system's laser is off and the system outputs a distance of
o melers.
The system should then wai t for the user to init iate measurement by pressing a button. B.
Arter the bUllon is pressed. the system should tum the laser on. We'll choose to leave the
laser on for one clock cycle.
Aflcr the laser is pul sed. the system should wai l for the sensor lO detcctlhe laser's reflection.
Meanwhil e. the system should count how much lime passes from me lime the laser was
pul sed unti l the refl ecti on is sensed.
Aft er the refl ecti on is detected. the system should use the amount of time passed since the
laser was pul sed to compule the distance to the obj ect of i nteresL The system should then
return to waiting for the user to press the bunon so that a new measurement can be taken.
The above sequence guides our construc-
ti on of a hi gh-level Slate machine. \Ve begi n
with an iniLi al state. which we call SO. SO's task
i to ensure lh31 when our system powers on. it
does nol output an incorrect distance. and it
does not tum the laser on (possibl y injuring the
uns uspect ing user). Speci fying this behavior as
a high-level Slate machine is straighlforward
and seen in Figure 5.1 1. Olice that the high.
level state machine differs from an FSM in that
Inpuls: B. S (1 bit each)
Outputs: L (bit). 0 (16 bits)
0-?
L = 0 (laser off)
0 = 0 (distance = 0)
Figure 5.11 Panial high-level state macnine for
the late's acti ons use u dUla type that is larger measuremenl system: initialization.
than one bit (namely. D is 16 bits). However. the
hi gh-level slale machine itself follows the convention thai every tr::lOsition implicitly A1 Ded ,,; th
a ri sing clock edge. so the state machi ne onl y transi tions during clock edges (just like for an FSMt
Note that even though the assignl1lents L - 0 and D = 0 look the same. the assignment L = 0 :L<signs
a 0 bit to the one bit output L, whereas the assignment 0"" 0 assigns the l6-bit binaT) number 0
(which is actuall y 0000000000000000) to the 16-bi t output D. ome other n ullions distingUlsb
bil assignments from dala assignments usi ng different notations. such a en -losing a bit in singk
quoles. For the bit assignment L - 0 could be \\ rinen instead :b L - ' 0 ' .
After initiali zation. the measuremenl system wailS for the user to pre:,.., the!' bunon S. \\ hJ h ini-
tiales the measurement process. When the user pre" e.IIi:i the bUHon. B \\ ill l'qUal 1. .U1,j th
mcnSUfement sy. tem should proceed to acti":lte the laser. To perronn the \\ aiting. \\ e add :1
aft er O. which we cull SI. shown in . L . The shO\\1l mmsitj os C.3U_ th' .... tatc ... tufl(' {('I
remuin in '({He I whi le B - 0 (mc:ming B' is trod.
232 Register-Transfer Level (RTL) Design
When B= I. the laser should
slay on for one cycle. In olher
words. when B= 1. the state
machine should transition 10 a
Slat e that IUrns the Inser Oil. fol-
lo\\'ed by a slate that turns the
Jaser ofr. \Vc'lI call the laser-on
st:lte 52 and the laser-ofr slate
53. Figure 5.13 shows how 52
and 53 afe connected in the
high-level state machine.
In Male 53. the slat e
machine should wait until the
!\cnsor Ihe laser's renee-
lion (S=] ). The SIJIC machine
remains in 53 while S=O. As
mentioned in the earlier sequence
of events. the state machine
should meanwhile count the
duration between the laser bei ng
pul sed and the laser's reflection
being sensed. From the di scus-
sion of timers in Chapler 4. we
know Ihm with a given clock
period. we can measure time by
counting the number of clock
Inpuls: B, 5 (1 bit each)
Outputs: L (bi t) . D (16 bits)
B' (buNon not pressed)
0--8,-,
L=O
D=O
(buNon
pressed)
Figure 5.12 Parti al hi gh-level slate machine for
measurement system: wai ting for a button press.
Inputs: B. 5 (1 bit each)
Outputs: L (bit). D (16 bits)
B'
"0-0,0-8
L= 0
D=O
L= 1
(laser on)
L=O
(Iaserolf)
Figure 5.13 Partial hi gh-level state machi ne ror
measurement systcm: pulsing the laser for one cycle.
cycles and multiplying that number by the cl ock peri od (time = cycles' ( I/clock frequency. Thus,
\\e use a locrt! register. which we' lI cnll Detr. to count clock cycl es. The slate machine increments
Dc t r as long 3S the state machine is wailing for the laser' s reflect ion. (For si mpli city. we ignore the
possibility that no refl ection is ever detected.) We must also initialize Dc t r to D. which we choose to
do in State 51. \"lith these modifications. our hi gh-level state machine is seen in Figure 5. 14.
Inputs: B, 5 (1 bit each) Outputs: L (bit), D (16 bits)
Local Registers: Dctr (t6 bits)
B' 5' (no reflection)
50 51 52 53 ?
B
L = 0 Dctr = 0 L = 1 L = 0
D = 0 (reset cycle Dctr = Dctr + 1
count) (count cycles)
Figure 5.14 Partial hi gh- level state machine ror measurement sy tem: wai ting ror the laser
reflecti on and counti ng clock cycles.
Once the rcnecti on i, detected (5-1), our high-level state machine should compute the distance
o that i, being mea,ured. From Figure 5.9: we know that 2*0 = Tsec 3x 10" mlsec. We also know
that the time T in second, is Octr ( I/clock rrequency). To 'i mpliry the system's design, let's
"" ume the clock rrequency i, )x lO" Hz. or 300 MHz. Since li ght (fttvcb )x l o" meters pcr second,
5.2 RTL Design Method 233
each clock cycle would thus correspond to one meter. Thus wi th a 300 MHz clock. Octr counts the
number of (hal the lascr beam traveled from the measurer t.o the object and back to the mea-
sureLTo COunt Just the distance rrom the measurer t.o the object, we divide Octr by 2 (algebraic
Simpli ficat ion of the equations in this paragraph veriry that D = Dc t r /2). We' ll pcrfonn this cal-
cul all on III a state we Will call S4. Our fina l hi gh-level state machine is shown in Figure 5. I 5_
Inputs: B, S (1 bit each) Outputs: L (bit). D (16 bits)
Local Registers: DClr (16 bits)
S
S4
L=O
D=O
Dctr =O L= 1 L=O 0 = Dctrl2
Dctr = DctH 1 (calculate D)
Figure 5.15 Hi gh level staLe machine for measurement system: calculating the value of D.
We can summarize the behavior of the hi gh-level state machine in Figure 5.15 as follows:
50 is the initial state. In state 50, the state machine initi alizes the laser to off by setting L =0
and sets the output 0=0 too. The machine then transiti ons to 5l.
51 clears Dc t r to 0 and then waits unti l the bUllon is pressed. When the button is pressed.
the machine transi ti ons to state 52.
52 turns on the laser. The machine then transition to 53.
53 turns off the laser and increments Dctr every clock cycle (with a 300 MHz clock. every
cycle corresponds to one meter). The machine stays in 53. incrementing Deo- during each
clock cycle, until the refl ection is sensed, at which time the machine transition to Stale 54.
54 sets the output 0 to the count ed number of cycles di vided by two, which corresponds to
the measured di stance in meters. The machine then returns to state 51. which waiLS for the
bUllon to be pressed again.
A real laser-based di stance measurer mi ght use a faster clock frequency in order [0 measUJ'l!
di stance with a greater precision than just 1 meter.
The hi gh-level state machine de cribed above is just one type of FSM \-ariation. A dif-
ferent state machine variation that was previously qui te popular was called Algorithmic
Stale Machines, Or ASMs. ASM are similar to flowcharts. except that A M include a
noti on of a cl ock that enables transi ti ons from one slate to another (a traditional flow hart
does not have a n explicit clock concept). ASMs, like flowchans, comain more "srru lUre"
than a Slate machine. A tate machine can transition from any Slate to an) other lale,
whe reas a n ASM restri ts transiti ons in a way Ihat cau es the omputation I look more
like an algorithm-an ordered sequence of instructions. An AS 1 u e ' several type of
boxes. including s tate boxe , condition boxe , and output box' . A Ms Iypicall) nls
all owed local data storage and data operations.
The advent of hardware desc'ription languages (see Chapter 9) 10 hu\c large!)
re pl aced the use of A Ms. as hardware de cripti n language, contain tht" nSlru IS sup-
poning algorit hmi c structure, and much more. Thus, we do not de, critx M' funh r.
234 5 Regist er-Transfer LevellRTLI Design
Step 2-Creating a Datapath
Gi ven a hi gi1 -lcvcJ slate machi ne. we wanl to creatc a data pat h lhat .can all the
data storage and computati ons on non-Boolean dat a types present III the high-level state
machine_ Doing so will enabl e us to then replace the state machme an FSM
that merely controls the datapath. We can decompose the create a datapath step Into
severa l substeps:
Step 2: Create a datapath
(a) Make all data input s and out put s to be datapath inputs and output s.
(b) Implement the data storage by adding a register component into the datapath for
every declared regi ster in the high-level stat e machill e. Furthermore, we tYPIcally
want to add a register component for every data output.
(e) Methodi call y examine each state and each transiti on, adding and connecting new
dat apath components to implement new data computat ions. We add mUltiplexor.;
in front of component inputs as they become necessary III order to share a com-
ponent among multi ple signals that use the same component in different states.
Sometimes we find that a component already eXIsts (e.g., a regIster) but that we
need to add a new control inpUlto that component (e.g., a clear input on a register
to set the register to 0).
A common term used to describe the adding of a component int o a design is ;nstan-
tiation . Thus. we say that we "instantiate a new regi ster" rather than we "add a new
register."' Using the term "instanti ate' rather than "add" hel ps avoid possible confusion
with the use of the term "add" to mean arithmeti c additi on (e.g. , saying "we add two reg-
isters ' could otherwise be confusing). When we instantiat e a new component , we should
give that component a name that is unique from any other datapath component name. So
if we instantiat e a regi ster. we mi ght call it "Reg;ster} .' If we instanti ate another regi ster,
we mi ght call it "" Register2.' Actuall y, we should give meaningful names whenever pos-
sibl e. So we mi ght call one register "Telllperatl/reReg.' and anot her register
"" HI/I//;dityReg.""
When we instant iate a new component_ we may create addit ional datapmh inputs cor-
responding to the control inputs of the component. For exampl e, instanti ating a register
will create a new datapath input corresponding to the register' s load and clear control
inputs. We should give uni que names to each new datapath control input. ideally
describing whi ch component the input controls and the control operati on performed. For
exampl e. if we instantiate a register named Register}. we mi ght then create two new data-
path inputs named Register} _load and Register! J lear. Li kewise. we may need to utilize
control outputs of a component. li ke the out put of a comparator. in whi ch case we should
give tho. e outputs unique names 100.
EXAMPLE 5.3 Laser-based distance measurer-Creating a datapath
We now continue Example 5.2 by proceedjng to the ,econd ' tep of Ihe RTL design method.
Step 2--Create a data path
We can foll ow the , ub<leps of thi , step to creale the d""'path , howli in Figure 5. 17:
5.2 RTL Design Method 235
(a) Output 0 is a data output (16 bits), so we make D an outpul of the dalapath. as shown in
Figure 5. I 6(i).
(b) We need a register to implement the 16-bi l local register Dctr. Noting thaI the operations
on Dc t r are clear (in Slate SI ) and illcremelll (in state 53). we can implement that register
by instantiating a 16bit upcounter, as shown in Figure 5. 16(ii ). Furthennore. as we Want
1.0 cont rol when the output 0 changes (noli ce that we onl y change 0 in slate 54). we instan.
tiale a 16-bit regi ster Dreg at the OUlpu t D. as shown in Fi gure 5.1 6(iii). We extend the
Dc tr COunter and Dreg register control signals to be inpulS to the dalapath_ wi th each
signal having a unique name, as in Fi gure 5. 16(i v).
(iv)
Dreg_cl' _ t-D_a_ta_p_a_th_________
Dreg_ld - 1-- - --- -------.
Dctr_clr
Detr_cnt
Q
(ii)
Oct,: 16-bit
up-counter
(iii)
Figure 5.16 Partial dalapath for the laser-based distance measurer.
Q
Dreg: t 6-bit
register
16
(i) 0
(c) Noting that S3 wri tes 0 wi th Dc t r di vided by 2. we insert a ri ghl shifter between Dc t r
and 0 reg 10 implement the divide by 2, as shown in Figure 5.17.
Dreg_clr - t-----------+--,
DregJd - +-----------4.
Detr_el'
Detr_cnt
o
Figure 5.17 The dmapath for the laser-based distance measuremenl sySlem.
The resulting datapath in Figure 5. 17 is <1 very simple dalapmh. but a d3lapath noneLhel
The previous example did not require any multipl exors_ so we -II illu trate separatel) \\ h)
sometimes multipl exors must be instantiated. Consider the ample high-le\eI , tate nla -hine
porti on shown in Figure 5. 18(a). Figure S. 18(b) show- the daropnth :lftcr implementing the
<Iclions of' state TO. Tho e a lions require an adder. with the E and F regi sters ronne.."ted to
the A and B inputs of that adder. Figure 5.18(c) shows that datapath after implementing th
acti ons of state T! . That state also requires an add r. blll because one alread) e,ists III the
datapmh. we need not instanti ate another udder. H \\ e\ cr_ the R anJ G regisl'rs must
236 Reg ister-Transfer Level (RTL) Design
connect 10 Ihe A Hnd B inputs of that adder. bUI Ihose input s of Ihe adder already have con-
nccli o,,, from E and F. We Iherefore need 10 instanli ale multipl exors. as shown in Figure
5.18(d). I a li ce Ihm we creale uni que names for each mux's control input.
Local regIsters:
E. F. G. R (16 bils)
(a) (b) (c)
add_A_sO
add_B_sO-+--='F::.J
(d)
Figure 5.18 1"' lanlialing dalapmh Illuxes: (a) sample high-level Slate machine portion, (b) dalapath
aflcr implcmcnling TO's aClions. (e) datapath afl er implementing TJ's actions. resulting in two
sources for each ;'Idder input. (d) dalapalh after instanti ating muxes 10 handle the multiple sources.
Step 3-Connecting the Datapath to a Controller
Slep 3 of the RTL design melhod is actuall y quile straight forwa rd. We simply create a
controller block having the system's Boolean input s and output s, and we connect the con-
troll er block with Ihe datapalh conlrol inputs and outputs.
EXAMPLE 5.4 Laser-based d,stance measurer-Connecting the data path to a controller
COlllinuing Ihe previous example. we proceed 10 step 3 of Ihe RTL design method:
Step 3-Connect the datapath to a controll er_
\Ve connect (he dalapalh to a controller as shown in Figure 5. 19. We connect the control inputs and
oUlPUIS (B. L. and 51 to Ihe controll er. and Ihe dala OUlpUt (D) 10 the datapath. We also connect the
controller to the d"tapath control inputs (Dreg_dr, Dreg_ld, DClr _dr. DCIr _CII/) . Normally we don't
draw (he clock generator block. but we've explicitly shown the clock generator in the figure 10 make
clear that the generator must be exactly 300 MHz.
from buno
10 d'Splay
0
f6
f-
Controller
Dreg_elr
Dreg_Id
r-
1>
Detr elr
Dctr_eOl
-{ 300 MHz Clock r-. .>
L
S
Datapath
to laser
from sensor
Ftgure 519 COOlrollcr/dnWp"th for the laser-based d"tnncc measurer.
5.2 RTL Design Method 237
Step 4---Deriving the Controller'S FSM
If we created Our d atapath correctl y, deri ving an FSM for the controller is traightfa r-
ward. The FSM wt ll have the same states and transitions as the high-level state machine.
We merel y defi ne the FSM's inputs and outputs (all wi ll now be single bits). and replace
any data computations in the actions and condit ions by the appropriate datapath control
SIgnal values. Remember, we created the datapath specifically to carry OUt those compu-
tattons, and therefore we should onl y need to appropriately configure the datapath control
stgnals to Implement each pani cular computati on at the right time.
EXAMPLE 5.5 Laser-based distance measurer-Deriving the controller's FSM
We continue the previ ous exampl e by goi ng to slep 4 of Ihe RTL design method.
Step 4-Derive the conlroller's FSM.
The last step is to design the comroll er's internals. We can describe the comroller's behavior by
refining our high-level Slate machine from Figure 5. 15 inlo an FSM. replacing the "high-level:'
acti ons and conditi ons. li ke Dc t r""O. by actual controller input and output signal assignments and
condilions, like Dctr _c 1 r=1. as shown in Figure 5.20. Olice that the FSM does nOl directly
indi cate the computations that are happening in the datapath. For example_ loads Dreg with
Dctr /2. but Ihe FSM itself onl y shows Dreg 's load signal being activated. Thus. the overall
syslem behavior can be determined from Ihe FSM by looking also at the datapath.
Inpuls: B, S Oulpufs: L, Dreg_elr. Dreg_fd. Dctr_e1r. Delr_ent
L=O
Dreg_elr= 1
Dreg_Id = 0
Detr_cl r = 0
Delr_enl = 0
(laser off)
(clear Dreg)
L= 0
Dreg_elr=O
Dreg_fd = 0
Detr_etr = 1
Dctr_cnl = 0
(clear counl)
L=l
Dreg_elr = 0
DreQ.,ld = 0
Detr_elr = 0
Delr_eOl = 0
(laser on)
L=O
Dreg_e1r = 0
Dreg_ld = 0
Detr_elr = 0
Detr_ent = 1
(laser off)
(count up)
L=O
Dreg_elr = 0
Dreg_Id = 1
Dell_e1r = 0
Detr_cnt = 0
(load Dreg with Dctrl2)
(slap counling)
Figure 5.20 FSM description of the controlier for the laser-based distance measurer. The desired
action in each state is shown in itali cs in the bouom row: the c rresponding bit signal assignment
that achi eves thal acti on is shown in bold.
HOW DOES IT WORK?-AUTOMOTIVE ADAPTIVE CRUISE CONTROL
The earl y 2000s saw the advenl of automobi le crui se
control systems that not onl y maintained n paniculnr
speed, but also mainlained a particular dislQrlce from
the car in front-thus slowing the automobi le down
when necessary. Such "adaptive" cruise control thus
adapls to changi ng hi ghway Irnflie. Adaptiv. erui e
controllers must measure the dislonce to the car in
front . One way to me,:uure th:n db-lance a
based distance mea urer. "ith the and :: n: r
placed in the front grill of the C"'. ""nil( -led to
circuit and/or mkroproce sor that th
distan e. The distance is then mput to the
control "hich dett!'nlline, \\h n lO or
automobilt",
238 Register-Transfer l evel (RTl ) Design
Recall from Chapler 3 thaI we typically foll ow the that output signals not
explici tl y assigned in a state arc implici tly assigned O. F01l 0W1I1 that the FSM
look as in Fi gure 5.21 . We mny still choose 10 expli ctl y show the a,sslgnll1Cnt of 0 (e.g .. L = 0 10
state 53) that as:-.ignmcnl is a key action of a stale. The key aCll ons of each stale were bolded
in Figure 5.20.
Inpu/s: B, SOu/puis: L. Dreg_elr, Dreg_d, Dctr_elr, Detr_cnt
L=O
Dreg_clr 1
(Iaserolf)
(clear Dreg)
Dctr clr= 1
(Cle;;r counl)
L= 1
(laser on)
L= 0
Dctr_cnt 1
(Iaserolf)
(count up)
Dreg_ld 1
Dctr_cnt = 0
(load Dreg with Dctrl 2)
(stop counting)
Figure 5.21 FSM descripti on of the controller for the di stance the
conventi on that FSM output s not expli cit ly assigned a value 10 a state are Implicitly aSSIgned O.
\Ve would complete the design by implementing thi s FSM, using a 3 bit state register and
combinat ional logic to describe the nex t stat e and output logic. as was described in Chapter 3.
5.3 RTL DESIGN EXAMPLES AND ISSUES
RTL design involves a cert ain amount of creati vity and insight. Thus, a good way to begin
to leam RTL design is perhaps through seeing everal exampl es. We now provide additional
exampfes of the RTL design method. through which we also explain some detailed issues.
Simple Bus Interface Design Example
EXAMPLE 5.6 Simple bus interface
Processors typically need 10 transfer data to and from other processors. They typically communicate
such data over a bus, to reduce wire congestion probl ems that mi ght otherwise occur (see Section
4. 10). Suppose 16 different processors each has a 32-bit output connected to a single 32-bit bus
named D. Suppa e another processor, a master processor. may want to read the output of any of
those 16 processors. (Let's call those 16 processors per ipherals. which is a common term for a pro-
cessor that is aux.iliary to a master processor). The maMer processor output s a 4-bit addrc s, A. that
all the 16 peripherals can read. with each peripheral having it s own unique address (0000, or
0001. or 0010, etc.). Because the ma' ter proces>or must always set the 'tddress lines to a value.
but might nOt always want to read, the ma' ter processor has another output. rd, that the muster pro-
ce"or sets to I when reading, and 0 when not reading. So if the mOMer proce sor wonts to read the
value in periphcml number five. the maMer proccs>or would 'et the addres, lines A to 0101, then
'et rd to 1. The mast er procc"or would then rcad the datu lines D (perhaps storing the read data
into a local regi'ter), and then return rd to O. Additionally, the value on D ,hould not change while
the m3)tcr procc\sor i\ reading.
A block diagram of the system is shown in
Figure 5.22. Such an arrangement is very simi lar
to the arrangement in a desktop compuler, where
a master processor can read peri pheral processor
registers-peripherals might include a disk drive,
a CD-ROM drive, a keyboard, a modem, etc.
We have just described what is known as a
bus protocol. A bus protocol defi nes a sequence
of actions over a set of data, address. and control
li nes, 10 carry out a data transfer over those lines
from one processor to another.
A bus interface implements a bus protocol
for a processor. Let 's implement the bus interface
for one of the peripheral processors. Fi gure 5.23
provides a block diagram for a peripheral di vided
imo a main part and a bus interface part. The main
part's output 0 is an input to the bus interface.
Let' s assume the peripheral's own address is
another input. called Faddr, to the bus interface.
Fad d r mi ght come from a DIP swi tch. or
perhaps another register. The bus interface also
has inpuls and out puts that connect to the bus
signals rd, D, and A.
Step J of our RTL design method is to
create a hi gh-level state machine. Based on the
bus protocol we defined. a peripheral's bus inter-
face part sends data only if the address on input A
5.3 RTl Design Examples and Issues 239
Figure 5.22 Bus int erface example.
to/from processor bus
rd 0 A
Peripheral
Figure 5.23 Bus interface block diagram.
matches the address on input Faddr AND the processor requests a read by sening rd to 1. While
the bus interface waits For an instruct ion from the master processor to send data. the bus interface
should not interfere with what another processor may be writing to the hared darn lines. D. Thus.
whil e waiti ng for a matching address and rd= 1. the bus interface should drive 0 with no value
(known as high impedall ce. represemed as "Z").
When the bus interface detects a matching address and rd=1. the bus interfa e should output
data from the input 0 (from the mai n part) to the output D. However. we must also ensure that 0 does
not change while the master processor is reading from the bus interface. \Ve can keep the \"aJue on 0
stable by storing 0 into a local register Ol. As long as the bus interface is not sending data. the bus
interface updates 01 wi th the current
value of O. When the bus interface is
sending data, the bus interFace does
not upd3le 0 I and out puts 0 I on D.
causing 0 to not change during a send.
We cun see that the bus inter-
face's implementation of the bus
protocol can be described by a high-
level siale machine using IWO states,
shown in Figure 5.24: a tate in
whi ch the bus interface waits to be
able 10 send data ( lI'ai/MyAddl"l'ss)
and (\ state in which the bus
sends dRw ( eIltIOn/a) .
Inputs-. rd (bit); a (32 bits); A, Faddr (4 bits)
Outputs-. 0 (32 bits)
Local register. at (32 bits)
\
o ='"Z"
01
_ _ --.'rd
Figure 5.24 High-le\el st3te mnchine of the sending
half of a simple bus inter!'""".
2-10 RegisterTransfer Level lRTLI Design
Fi gure 5. 25 provide... :1 sampl e
timing di 32.r3111 of the machine's
beh:l\lor. tw for state IVail/Hy-
Address. SO for Sell dDow ). As long as
the system is in the \V state. the system
OUlpUIS Z on D. When r d= 1 and
A"" Fa d dr. the system outputs the con-
tents of 01 beginning at the next cl ock
cyc le's rising edge. The system con-
tinue:;. to output 01 as long as rd= 1.
\Vhen read returns to O. the system
returns to the lVailMyAddress slate at
the next ri sing cl ock edge and hence
Inputs rh : rt----"l !
rd ---' : 4------J j 4--
State [ I w I w I SO I w I w I SO I SO I w I
--z--I1-0-1+1--z -11---
0
-
1
-+I--.z I
Figure 5.25 Bus interface liming diagram.
outputs Z again. . . . .
Slep 2 to crealt:: a datnpath . 3S shown on the nght III Figure 5.26. The datapath con tams a
equalil y comparalOr 10 compare A and Faddr . a 32 bil regiSler 01. and a 32 bil wide !hree
Slale dri ver 10 enable dri ving of D by nOlhing or by 01. A, Fadd r, and 0 are Ihe dalapa!h's dala
inputs. and 0 is the onl y data out put.
rd
Inputs: rd, A_eQ_ Faddr (bit)
Outpurs: Ql _ld, D_en (bit)
o en= 1
Ql _'d = a
32
Controller "D:..; a"ta ",p",a:.: th"-_ -I-....J
Interface
o
Figure 5.26 Dalapalh (righl) and conlroll er FSM de!ocripli on (Iefl) for Ihe simple bus inlerface.
Step 3 i!o 10 conneCI Ihe dalapath to , controll er, a, shown in Fi gure 5.26. The controller has
one eXlemal comrol inpul. rd, "nd also gels a conllol inpul from Ihe d,wpal h, A_ eq_Faddr, indio
caling whelher A cqual' Fa ddr . The conl roller has IWO cOlllrol OUlPUIS 10 Ihe dalapalh, with
o L 1 d causing 01 10 be loaded wilh 0, and O_ en controlling Ihe Ihree-Slale dri ver,
Step i, 10 deri ve Ihe coni roller' , FSM. We simply replace Ihe dOl" operali ons in Ihe hi gll
level \lale machine of Figure 5.24 by Ihe appropriale conlrol , ignal; , . , sh wn on the ler. ide of
Figure 5,26. We replace A- F addr by Ihe , ign31 A .eq Fadd r, Ihe aClion, of 0- " Z" and of 0-0
by D en-D and D_en-1. and Ihe acti on of 01 - 0 by 01 l d- 1. We would Ihen implemenl the
FSM u' '''g a "ale regi"er (in Ihi, case only I bil ) and cornbinalionulloglc.
You may have heard of , everal popul ar bu'"" like Ihe P I (Peripheral olll ponenl Interface)
bu, '" a pef\onal com pUler Thole are Ihe buse, Ihal a PC "card" plug' "" 0 In a pc, like !he canl
, hown in Fi gure 5,27. You can \Co on Ihe card the lI1ewl pad, of Ihe bu,", IIch pad corresponds
to one WlfC of the hUI . The bu, prolocol for PCI " fllr morc tolllpl., Ihan Ihe prolocol in Ihe abolt
example. Hundreds of OIher "Slan.
dard" bus prolOcols exis!. Designers
not needing to interface to other chips
often defi ne their own bus protocol
for communication,
5,3 RTL Design Examples and Issues
Figure 5.27 PCI card plugged inlo a PC's PCl SIOL
241
ALL =5 ARE NOT EOUAL.
Figure 5.24 showed two di stinci uses of !he "_ "
symbol. In a stal e's actions, ":::" meant "assign the
value oflhe righl side 10 the lef! si de," e.g. , D = 01. On
a transi tion, ""," meant "the left and right sides are the
same," e.g" A- Fa dd r . Be careful nOI 10 confuse
Ihose two meani ngs of Ihe "- " symbol. Some
languages use differe", ymbols 10 distinguish !he two
meanings. For example, Ihe C language uses "=" for
"assign" and " ==" for "!he same: ' VHDL uses " : ="
(or" <-") for "assign", and " m" for "!he same."
Video Compression-Sum-of-Absolute Differences (SAD) Design Example
EXAMPLE 5,7 Video compression-sumofabsolute differences
A/ter(12004
flatl/ral disaster ill
Indonesia. a n'
news reporter
reported from rhe
SCCll t! by "camero
phone... rhe video
was smoolh as
10llg (IS Ihe scelle
wasil " changing
signijiC(lIIfly,
When 'he sum!
chat/sed (like
pa1l1liflg across
the I01ldscape),
the video became
\'el)' jul.) .,
because the
camera pholl e had
to trallsm;t com-
plete piclllres
rotll er tlltm j llst
differences.
resulting ill / t' u'er
frames trans-
milled ol'er tht
limited band-
width a/the
camurl phont".
Di gitized is becomi ng increasingly commonpl ace. like in Lhe case of the increasingl y popul ar
DVO (see Secllon 6.7 for further infoml alion on DVDs). A slraighlforward digiti zed video consislS
of a sequence of di giti zed pictures. where each picture is known as aframe. However. such di sti-
ti zed video resul ts in huge data fi les. Each pi xel of a frame is stored a' everal b)tes. and let's sa; a
frame contains about a mill ion pixels. Let's assume Lhen Lhat we require about I Mbyte per frame.
and we play approximalely 30 frames per second (a nomal rale for a TV), 0 tha.-s I MbYlelframe
30 frames/sec = 30 MbYles/sec. One minUle of video would require 60 sec' 30 Mb},e sec = I.
GbYle , and 60 mi nules would require 108 GbYles, A 2hour movie would require o'er _00 Gbn es.
Thai 's a 101 of dala, more Ihan can be downloaded quickl y O\'er the l",emel, or Slored on a DVD,
whi ch can onl y hold between 5 Gbytes and 15 Gbytes. In order 10 make practical use of digitized
Video wiLh web pages, di gital camcorders, cellular telephones, or even with DVD.., we need to com-
press those files into much smaJler fil es. A key technique in compressing "ideo to recognize that
successive frames often have much similarity. so instead of sending 3 sequen e of digitized pi rures.
we can end one digilized piclure frame (a "base" frame), follo\\ed by dala descrihingju I !he dif.
ference be,ween !he base frame and !he nexi frame. We can .end j U' 1 the difference <!ala for
numerous frames. before sending another base frame. uch a method in some 1 -- of quallt).
bUI as long as we send base frames frequentl y enough, the quali lY mal be ", eptable.
Of course, if Ihere's a major change from one frame 10 !he (like for a hange f scene, or
lots of 3ctivi ty). we can' t use the difference method. Video compressi n de\lC'eS therefore need (Q
qui ckly estimnte the similarity between two successive die.iti1.ed frames to determine
frames can be sent using the difference method. A common ) to detennlOC the , tnulant) of ( \\ 0
frame is 10 compute what is known n the " "(,(' ( 0 ). For c3('h p1\el in
frame I, we compute the di fference between th3t pi'\;cl \\ ilh the p1\el In
Each pi xel is represented b n number, so differen e means difterence In numbef'.. uPPc \\
represent a pi e1 with a byte (real pixels are usuall) repres.nle<! b) al 1<3-<1 !hR..., b) tc' \, .m.:l \\ e "'"
comparing Ihe pi xel ", Ihe lIpper ler. offr:lme, 1 and 2 in Figure J ) frame I ' , upr<!r-I fI
pixel has:l volm' of ... 55. Fr.J.UlC _'s pixel clearl) the ,lInc. ' l' \\ ould h3h' l \ alue If '= -5 ;11'\....,
2-'2 Register-Transfer LevellRTLI Des ign
Digitized Digitized
Digitized Difference of
frame 1 frame 2
frame 1 2 from 1
[J 8
S8
;::s
1 Mbyte 1 Mbyle 1 Mbyte 0.01 Mbyte
(al
Ibl
Figure 5.28 A key principle of video compression recognizes that successive frames have much
similarity: (a) sending every frame as a di stinct digi ti zed picture. (b) instead. sending a base frame
and then difference data. from whi ch the origi nal frames can later be reconSlfll ctcd. I f we could do
this for 10 frames. (a) woul d require I 10 = 10 Nlbyt.s. whil e (b) (compressed) woul d
requi re only I Nlbyte + 9 0.0 1 Mbyte = 1.09 Nlb)' tes. an almoSt lOx size reducti on.
the difference of these two pi-xcls is 255 - 255 = O. We might compare the next pixels of both
frames in that row. finding the difference lO be 0 agai n. And so on for all the pi xels in that row for
both frames. as well as the next several rows. However. when we compute the difference of the lefl
most pixcl of the middle row. where thai bl ack circle is localcd. we see that frame I' s pixel wiIJ be
black. say wi th a value of O. On the other hand, frame 2's corresponding pi xel wi ll be white, say
with a val ue of255. So the difference is 255-0 = 255. Li kewi se. somewhere in the middle ofthm
ro\\. well find another di fference, thi time with frame I 's pixel white (255) and frame 2's pc,.1
black (D)-the difference is again 255-0 = 255. Note that we onl y care about the difference. not
which is bigger or small er. so we are actuall y looking at the absolutc value of the difference
between frame I and frame 2 pixel. By summing the absolute value of the differences for every
pair of pixels. we get a number representing the si milarity of the two frames-D means identical,
and bigger numbers means less simi lar. I f the resulting sum below some threshold (e.g., below
1.000). we mighl then appl y the met hod of sending the difference data . as in Figure 5.28(b)-we
don't explai n how to compute the difference data here. as that is beyond the scope of this example.
If the Sum is above the threshold. then the difference between the block is too great. so we might
in!ltead send the full digitized frame for frame 2. vidco with similarity among frames will
ac:hic\c a higher compression than video wi th plenty of
Actually. video compre ion mcthods compute , imilari ty not bctween two entire frames,
but rather between corresponding 16x 16 pixel blocks-yet the idea is the snme.
Computing the sum of absolute differences is !<ilow in software. thnt task may be done usine
a CU'i tom digital circui t. whil e other [ask may remain in ... oftware. For example. you mighl find
SAD circuit imide a digital camcorder. or a cellul ar telephone that supports video. Let',
de,ign \uch a circuit. A block di agram is . hown in Fi gure 5.29. The circuit " inputs wil l be
256-byte memory A. holding the conten!> of a 16.<16 bl ock of pixe" of frame I. and another
256-byte memory 8. holdi ng the corr.'ponding block of fr:llne 2. Memon., Will be di scussed in
Secllon 5.6. for now. conSIder Ihe memory a, a regier hie. and Ignore dcwti , of the interfuce to the
memo"e,. Anolher cirCUli input go lell , the circlI lI when to compuling. An OUlput sad will
pre,ent the re, ult after 'orne number of clock cycle' .
SAD
sad
(a)
5.3 RTL Desi gn Examples and Issues
Inputs: A, B 1256-byte memory); go (bit)
Oulput sad 132 bits)
Local registers: sum. sad_reg (32 bils); i 19 bils)
!go
i<256
sum=sum+abs(A{i]-B(i])
i::i+1
Ib)
243
Figure 5.29 Sum-of-absolute-differences (SAD) component: (a) block diagram. and (b) high-level
slate machine.
Step I of our RTL design method is to create a hieh-Ievel state machine. \ e can describe the
behavior of the SAD component using the high-Ie"el ; tate machine sbown in Fieure - 29(a)_ We
dec lare the inputs. outputs, and local regi sters sum. i . and sad_reg. The sum ;;'gister will hold
the running sum of differences; we make thi s regi ster 32 bi LS wide. The i reistcr will be used to
index into the current pixel in [he bl ock memories: i will range from 0 to 256. and therefore we'll
make it 9 bits wide. sad_reg will be connected to the sad (i!"s good procti e to register
your data outputs). so will be 32 bits wide, like the S ad output. The tate machine initiall) waits for
!.he Input go 10 become 1. The state machine then inirializes registers s urn and i to O. The st:Ue
machine then enters a loop: if i is less than 256. the state machine computes the absolute value of
the difference of the two blocks" pixels indexed by i (the notation A[ i) refers to the data in "ord
i of memory A) . updates the runnjng sum. increments i. and repeat:s. Otherwise. the stale machine
loads sad_reg with the sum. which now represents the fi nal sum. and rerurns to me fin" srate to
wait for the go signal to become 1 again.
One poin! to reemphasize is that the order of actions in a state does not impact the resul .
because nil those actions occur si multaneously. Thus. for the tnu: in ide the loop. arranging me
acti onsas "Sum: sum + abs(A[i)-8[i) ) :i : i T I"oras"i = i T I: 5 urn = Sun
+ a b s ( A [ i ) - 8 [ i ) r ' does not impact the results. Either arraneemem u " the old vnlue of ; .
Slep 2 of our RTL design method is to create a darapath.-We see from the high-level - e
machine that we' l! need a subtmctor. an absolute-value omponem (\\hich \\e 001 designed
earl ier. but is . traightforward to design). an adder, and a comparison of i to 256. We build the datn-
path shown in Figure 5.30. TI,e adder will be 32-bits wide. so the -bil input conling from the abs
component wi ll need to have as appended for its high _4 bits.
Step 3 is to connect the datapath to n controller block .. < sho\\ n in Figu!1." 5.30. that
we've defi ned the interfnce 10 the A nnd B memories. consisrimz of 9. read line. 3ddre .. d
dat a lines. Also note that we hawn't explici tl y listed the inputs a';,d outputs t,r the
a, they can be secn at the periphery of the controllers blo.:k.
Step -' i, to convert the high-Ie" el stnte machin to an FSM. We 'ho\\ the \1O th I ft "J
of Figure 5.30. For comcnicnce. \,e\e shown the original h.i.gh-le\el J. , .. "u). and
\\e've ... hO\\ 11 their b) F t action\".
lU 5 RegisterTransfer Level (RTL) Design
sad
Figure 5.30 SAD data path and controll er FSM.
To complete the design. we \\ ould convert the FSM to a controll er impl ementati on (3 state reg
and combinati onal logic). as described in Chapter 3.
Comparing Software and Custom Circuit Implementations
In Example 5.7. we said Ihat the output appears after some number of clock cycles. Lei 's
determine exact ly how many cycles. After go becomes 1. our state machine will spend
one cycle initiali zing registers in 5 /. then will spend two cycles in each of the 256 loop
iterati ons (states 52 and 53). and finally one more cycle to update the output register in
state 5.J. for a total of 1+ 2*256 + I = 514 cycles.
If we executed SAD in software. we wou ld likely need more than two clock cycles
per loop iteration. We would need perhaps two cycle to load internal registers, then a
cycle for , ubtract. perhaps two cycles for absolut e va lue. and a cycle for sum. for a total
of six cycle per iteration. The cu torn ci rcui t we buil t. al two cycles per iteration, is thu
about three times faster for computing SAD. as uming equal clock frequencies.
We' ll see in Section 6.5 that we could aCluall y build a SAD circuit that is IIlllCh fasler.
DIGITAL VIDEO- IMAGINING THE FUTURE.
People 'Cern 10 have an '"'au able appeti le for good
quality \ideo. and thu, much allention " placed on
de'elopmg f..,1 and/or power-efficienl encoders and
de.;ode" for dlgHll l video device,. like DVD players and
recorde". dlgH.f VideO came"". cell phone, , upponing
d'gH.] "deo. 'Ideo confcrenc'"8 UnlL'. TV,. TV ,.t. top
Ix" ... '" It\ ,"",re,I,"S 10 Ihmlloward the fUlUre-
.. ." umlng "Iden enuxhng/det.:cxhng become, even more
p<J\'ocrful and d1ill.a1 cOmmUOICiJlIOn 'peed, IOcrea\C. \Itt
mighl imagine video di'play (With audiol on our wafls
al home or Ihal conlinuaJly di spfay whm'
happening at anolher home (perhaps our mom's house)
or al a panner ollice on the other ide of the counlJ)'-
like a vi nuaf window to an Iher place. Or we mighl
Imagine ponahJc device; that enahle u; 10 continually
\CC what wmcone eJ\C 'Wcann!! n tiny camcm- pcrhnps
our child or 'I"'U"''''",. TI,o"", de' elopmcnL' could
\ ignlficanlly change nur 11\ 109
5.3 RTL Design Examples and fssues
RTl Design Pitfalls and Good Practice
Pitfall: Assuming a R . t I
Written egLS er s Updated in the State in Which the Register Is
245
Perhaps the most com . k ' .
th t '. man ml sta e m Creallng a hi gh-l evel state machine is assuming
a a regl ter IS updated in the t . h'
. " s ate m w Ich the register is written. Such an assumn-
tlOn IS mcorrect and can lead t r
th
. . ' a unexpected behavior when the state machine reads
e register m the same state d I ' k .
. . . ,an I eWl se when the state machine reads the reo;qer
m a transitIOn condit ion lea' h .,...
. I h' vmg t at state. For exampl e. Figure 5.31 (a) shows a
simp e Igh level stat e mach ' E .
I
. . Ine. xamme the state machine, and then answer the fol-
owmg two questi ons:
What wi ll be the value of a after state A?
What will be the fi nal state: Cor D?
The answers may surpri se you.
The value of a will not be 99; a 's
value will actually be unknown. The
reason is illustrated by the timing
diagram in Fi gure 5.31 (b). State A
confi gures the datapath to load a 99
int o R on the next clock edge. and COn-
figures the data path to load the value
of register R into register a on the next
clock edge. When the next clock edge
occurs. both those loads Occur Silllll/'
ralleoll s/y. a therefore gets whatever
val ue was in R JUSt before the next
clock edge. which is unknown.
Furthermore, the final state will
not be D. but will rather be C. The
reason is illustrated by the timi ng
diagram in Figure 5. 3 I (b). tate B
configures the datapath to load 100
int o R on the next clock yele. and
configures the controll er to load the
Local ,egislero. A, 0 (8 brts)
(a)
o
(b)
Figure 5.31 High-fe,el sml. machine
thai behavcs diffenenl than some people
may e.'(peC'L due to reads of a reruter in
the arne tate as ",Tiles to that ;'cister:
(al smle m3 hine. (b) timing di3i.un,
next tate ba ed on the transition conditi on. R is 99. and therefore the transition ndition
R<lOO is true. meaning the controll er wi ll be configured to load tate C into the tale
regi . ter. not state D. On the next lock edge. R be ames 100. and the ne\ t tate become C.
The key i to alway remember that a srare' acriolls COllji uu rht! dorapuliJ and
cOllrroller slich rhar rhe lIexr clock lI'i1lload Ihe desirt'd \'Olue -bUllh,'St' \0/ ...
dOI/'t actually get loaded wllil rhar clock ed e. Thu . an) e\ pre .i n. m.1 st3le' s
actions r outgoing transition \\ill be u ing the pre,ious ,J.!ues f regJ'.
ler . . not the values being 3S igned in that state itself. B\ (he . me reasantn"!, all th
aCli ons of a state occur . irnultnneou, h on the ne\t d -. edlre. and thu,
written in an order. . -
2-16 Register-Transfer LevellRTlI Design
that (he designer
actuall y Q to equal 99 and
the Iinal swte to be D. then a sol u-
ti on is to add an ext ra swt e before
reading tbe value of a register that
we assi gn. Figure 5.31(a) shows a
new in which the
assi nmem of Q=R has been
mo\:ed 10 state B. after R=99 has
taken effect. Furthermore. the
state machine has a new state. B2.
that simply all ows R to be updated
with the new value before we read
that val ue in the transiti on condi-
tions. The liming diagram in
Figure 5.32(b) shows the behavior
th;t the designer expected.
An alt ernati ve solution for the
transition issue in thi s case would
Local regislers: R. Q (8 bits)
(b)
Figure 5,32 Hi gh-level state machine that avoids
reading j ust-assigned register : (a) state machine.
(b) timing diagram.
be to uti li ze comparison values that take into account that the old value is being used. So
instead of comparing R 10 100. the comparisons might instead compare to 99.
Avoiding this pitfall is the reason that we included stat e 52 in Example 5.7.
Pitfall: Reading Outputs
Another common mistake is 10
create a high-level state machine
in whi ch an external output i read
in the state machine. Out puts can
only be wrillen and cannot be
read. For exampl e. Fi gure 5.33(a)
shows an inval id high-level state
machine-the read of P in state T
is not allowed. If you wish to read
an output . then create and use a
local register. Fi gure 5.33(b)
shows use of a local register R 10
avoi d reading output P.
Inputs: A, B (8 bits)
Outputs: P (8 bits)
(a)
Inputs: A. B (8 bits)
Outputs: P (8 bits)
Locat register: R (8 bits)
(b)
Figure 5.33 (a) Reading an output is not allowed.
(b) using a local register.
Good Design Practice: Registered Data Outputs
fI 's a good idea to always en ure your design has a register at every data output. Doi ng so
prevent;, those outputs from displayi ng spurious values. For exampl e. the state machi ne of
Figure 5.33(b) could be implemented as a datapat h in whi ch output P is directly con
nected to the output of an adder, as shown in Fi gure 5.34. P wi ll therefore output spuri ous
values for 'ome time after R i loaded wi th A. whil e the addition is being computed. Fur
thermore. If B or A changes in some other states. P will also change. but such change is
hkely not the intended behavior of the state machi ne-P should only change when we
explicitl y assign P in a Mate. Another problem is that any proces 'or using the P output
-, -. - -- ------------
must take into account the adder when
computing longest register-to-register
delays to determine a circuit 's criti cal path
(see Section 5.4).
5.3 RTL Design Examples and Issues
247
(a)
(b)
Figure 5.34 (a) P will exhibit spurious
va/ ues. (b) regislering P solves the problellL
Therefore, we wi ll follow the design
practi ce of always pUlling a register directly
before the data output, as shown in Fi gure
5.34(b). Even if we don ' t explici tly declare
the register as a local register, we always
assume it is there in interpreting the high-
level stat e machine, and we always add that
register when creating the datapath. Alter-
nat ively, we can explicitl y decl are that
register, and then assume that the output is
directl y connected to that regi ster-thi s is the approach we took in Example 5.7. in whicb
we declared the register sad_reg. !t 's good practice to no/ read this regi ster. the reg-
ister's only purpose is to connect to the Output port. -
Regi stering data outputs does have the potential di sadvantage of delaying wriles to
the output port by one cycle, depending on the example_
Data-Dominated RTL Design
We can consider RTL designs as falljne into one of two Calegorie : contral-dontinated
designs and data-dominated designs. - -
A cOlltroldomillated desigll is a design whose controller comain mo I of the om-
plex ity of the desi gn. When creatine such a desi!!n a desi!!T1er focu es mostlv on the
desi gn of the controller, meaning design effort mo into defining state
behavior of the system. Once the desi!!ner has defined thaI tate behavior. hei be can
derive the datapath straightforwardl y from that stale behavior. A contral-dominated
design typicall y responds to eXlemal inputs in a precise anlQun! of time. and typi bas
a simple dat apath.
A datadomillated desigll is a de ign whose datapath contains mo t of the m-
plexity of the design. When creating such a design. a designer focu es on the
design of the data path. meaning de ign effort goes mostly into instantiating and inrerron-
necting datapal.h components. Once the designer has defined lhe dampath. h she an
define the controller' s state behavior straightforwardl y. A data-dontinated d -ign lypi :ally
has a lot of parall eli sm in its datapath. and the datapath ma_ be large. For a d:lla-doffil-
nated design. de igners of len ski p the first tep of our RTL d ign method of Table -.1.
The laser-based distance mea urer example in the uon \\ as an <!.umple
of control-dominated design. ince the compl ex it)' of the d ' ign \\ as reall) in th :on.
troll er, not tl,e datapath.
The tenns "controldominated" and "data-dominated" are descripti\ . and
an' t be used to tri ct ly categori ze de-igns. me de igns \\ill e:-.h.ibit propenies oitx'lh
types of de igns. It ' like the tenn "intra\'en" and "extra\en" for d s.:ribtng pIe--
whil e the temlS are useful. people an' t be ' trictly categorized ' either mtro\<'ID
ex troverts. since many peoplc are somewhere in bet\\een. or e\.h.ibit f ature, "f xh
248 Register- Transfer Level (RTL) Design
categori es. The example of lhe si mple bus interface was an exa mple of a lhat has
a similar amoulll of control and data design. The VIdeo compressIon SAD C" cull , at least
the way we designed it. was also a mix of control and .
RTL design is very mllch a creati ve process. Two desIgners may come lip wIth very
different desions for the same system. foll owing perhaps dIfferent desIgn methods, wllh
those in terms of performance. size, and oLher metrics.
FIR Filter Design Exampl e
As our previous examples were ei ther control-domi nated or a mi x of control and data, we
now provide an example of a data-dominated design.
EXAMPLE 5.8 FIR filter
A. digital fi lter takes a SLream of digital inputs and generales a stream of digita.1 outputs with some
feature of the input stream removed or modified. Figure 5.35 shows a block diagram of a popular
digital filter known as an FIR filter. X and Yare N-bits wide each. such as 12 bits each. As a fi ltering
x
y
digital fil ter 12
elk
Figure 5.35 General block di agram of an FIR fi lter.
example. consider the following stream of digital temperature values on X comi ng from a car
engine temperature sensor sampled every second: 180. 180. 18 1. 240, 180, 18 1. That 240 is prob-
ably nOf an accurate measuremenl. as a car engine's temperature can nOI j ump 60 degrees in one
second. A digital filter would remove such "noise" from the input stream. generat ing perhaps an
output stream on Y like: 180. 180. 181. 181. 180, 181.
An FIR filter (usuaUy pronounced by saying the leiters F ."f"" R"). short for 'Fi nite Impulse
Re ponse filter. is a popular general digi tal fi lter design that can be used for a wide variety of fi l-
tering goals. Figure 5.35 shows a block diagram of an FIR fi lt er. The basic idea of an FIR fi lter is
simple: the present output is obtained by multi plying the present input value by a constant, and
addi ng that resuh to the previous input value limes a con lant, and adding that resull to the next
earlier input val ue limes a constan t. and so on. In a sense. adding 10 previous val ues i n this manner
results in a weighted average. We describe digital fi ltering and FIR fi lters in more detai l in Secti on
5.11 . For the purpose of this example. we merely need to know lhat an FIR fi lter can be described
by the following equation:
y( r) = cOxx( r) + c l xx(t- I ) + c2xx(r-2)
An FIR filte r with three term. as in the above equati on. is known as a J-tap FIR fi lter. Real
FIR filter; typicall y have many tens of taps-we u,e only three taps for the purpose of illustration.
A filter de. igner using an FIR filter achieves a parti cular filtering goal s;mply IJY c1r00s;/l8 lire FIR
filter 's con.'ilGl/tr.
We wi h to a ci rcuit to implement an FIR filter. Becau." the FIR filter equation is ju t
data tran<formation and no control. lets skip Step I of the RTL de.ign method and go straight to
tep 2--<lesigning the datapath. We' ll need a for each tap to hold X(I ). x(I- I ). nnd x(I-2). On
each clock cycle. we ll want to move x(I- I ) to x(I-2). to move x(1) 10 x(I- I ). and to load .I"(r) wi th th.
pre.,.,m Input. We lhus <tart the datapath wilh three reglSlers. connected 0' . hown in Figure 5.36.
-- # - ------------
5.3 RTL Design Exa mples and Issues
249
Noti ce how lhe data moves to the ri ght on each clock cycle, so that register xtO holds the current
mput sample, X tJ holds the previous input sampl e, and x t2 holds the sample before the previous
one. For the exampl e, we'll assume data is 12 bits wide.
X(I)
xtO
12
12
3-tap FIR filter
x(I-I)
Xl1
12
x(I-2)
xt2
y
12
Figure 5.36 Beginning to bui ld the datapalh for the FIR fil ter-inserung and connecting thex(I).
x(l -f ). and x(I-2) registers.
Now we need another register for each tap to hold the constant value cO. c1. or c2-weU
worry later about how those registers will be loaded. We' ll also need a multiplier for each tap. to
mul tiply the taps X value by the Constant C val ue. The datapath with the constant registers and mul-
ti pliers is shown in Figure 5.37.
x
clk
Figure 5.37 Extendi ng the datapath for the FrR filter-inserung and connecting the cO_ c L and c2
registers, along with the multipl iers. for each tap. For simplicity. clock connections are DOl sho""Il.
and all data lines are assumed to be 12 bits wide.
The out put Y is the sum of each taps product. We can thus insert adders to compute !be sum_
and we can connect thal sum to the aUipUl Y. as shown in Figure -.38.
We have completed the heart of the FIR filter datapath design. We DOW need to pro\"idc: a
met hod for a user to load values into the constant registers cO. c1. and c2. LeCs rn:ate!lIlOlbet-
input C to the fi lter. a load line Cl. and a 2-bit address Cal and CaO_ that the filter user an use to
load a pani cular constant register. Ca I Ca 0-00 indicates that register cO should be loaded I
indicates that c I should be loaded. and 10 indi ates that c2 hould be loaded_ L ding of the ,
on input C into th. appropriate register occurs on a lock edge only when CL-l. We <= trnigbt-
forwardly design the circuit for such loading using a decoder. as shown in Figure _19. address
lines C a I and C a 0 feed into a 2x4 decoder. thus enabling the appropriate register (JlO{e that address
II is unused). The load input C L is connected to the decoder" enable input_ 'Ole that.. -\ 3IS<'
added a register at the Y output. \ hich is genernlly good design practi _ such l
ensures tlle output doesn t Rucluute a intemlediate products and sums are mputed. and rectu.;, -
the likelihood of the user accidentally extending the riti al path nne<:ting tttrough. \0( of
combinational I gic before loadi ng Y il1lO a register.
150 5 Register-Transfer Level (RTU Design
x
clk
y
. y ' Ih' FIR fill er as Ihe SlIlll of Ihe I"P produclS (all dala lines Figure 5.38 ComplI lIllg th.... output 111 t.:
arc 10 be 11 bi t:-- "ide).
Figure 5.39 Finalll.lOg the FIR fi ltcrdalapath wi th circuitry for loading the constalll rcgisters. We've
aJl,o added a Oil the Y output. which is good practice. The crit icnl puth- the longest
rcg/\ lCHo-rcg/\ lt:r delay- if., <, hown :1 ... a dotted linc.
Our RTL dC' ign melhod "lVolve, IIVO ' Ieps afl er de, igning Ihe dalap:ll h 10 compl ete Ihe con-
troll er. HO\l. cvcr. thl \ pUrll cular dc\ ign doc\ nOI requi re a cont roll er. nOI e\en n simple one! n,iJ
ewmplt' H "u",.o ed an l'. \frClI/e l'xm1lph, oj a (/ma- (/OI1lIlUll eti rle,\ix".
Compa ring Soft wa r e a nd Custom Circuit
It " Inl erc, tlllg to compare the performance of Ilte hardwa re "np/emellt ati on of a 3-lap
fIR filt er with a ,oft"'are implemcllI ati on. The criti cal path goc, from Ihe X t and c reg-
i, ter,. through nne multipli er. and through two add.". before rc.lchllt g the Y rcg;;tcr yreg.
________ <0 _
5.4 Determining Clock Frequency 251
HOW OOES IT WORK?-VOICE QUALITY ON CELL PHONES_
Cellul ar telephones have become commonpl ace over
the past decade. Cell phones operate in environments
far noisier than regul ar "I andlinc" telephones.
incl uding noise from 3U1omobiJes. wind. crowds of
lal king people, elC. Thus. fi heri ng OUI such noise is
especiall y important in cell phones. Your cell phone
cont ains at least one, and probably more like several,
mi croprocessors and custom digital circuits. After
converting the analog audi o signal from the
mi crophone into a di git al audi o stream of bits. part of
the job of those di gital syslems is 10 fi ller OUI the
background noise from the audi o signal Pay anenlion
next lime you talk [Q someone usi ng a cell phone in a
noi sy environment. and nOlice how much Ic\ noise
you hear Ihan is probabl y aClUall y heard by the
microphone. As ci rcuits continue to improve in speed
ize. and power, filtering will likel ) improve further.
Some slate-of-the-an may even use two
microphones. coupl ed wiLh beam forming techniques
(see Secti on 4. J 3). to focus in on a user's voice.
For hardware impl ementati on. let's as ume that the adder has a 2 ns del a). Let' also
assume that chaining the adders together results in the delays adding. SO that ("'0 adders
chained together have a delay of 4 ns (detailed analy is of the internal gate of the adders
could show the delay to actuall y be slightl y less). Let ' as ume the multipl ier has a 20 os
delay. Then the criti cal path. or longest register-t o-regi ster del ay (to be di cussed funher
in Secli on 5.4). woul d be from cO to yreq. going through the multipli er and two adders as
shown in Figure 5.39. That path 's length would be 20+4 = 24 ns. , ote that the path from
clIo yreg would be equall y long, but not longer. A critical path of 24 n means the data-
pat h could be clocked at a frequency of I / 24 ns = 42 MHz. In other words. a ne\\ ample
could appear 01 X every 24 ns. and new output would appear at Ye\'el)' 24 n .
Now let' s consider the hardware perfonnance of a larger ized fi lter: a 1000tap FIR
filt er rather than a 3-tap filt er. The main perfonnance difference i that \\ e- Il need to add 100
values rather than just three. Recall from Section 4.13 that an adder tree is a fast wa) to add
many values. One hundred values will require a tree wi th seven le\'els- - 0 addition _ then
25. then 13 (roughl y). then 7. then 4. then 2. then I . SO the total would be ns (for
the multipli er) pl us seven adder-delays (7*2ns = 14ns). for a total dela) of 3-1 05.
For a software implementation. we' ll aSSume 10 ns per instruction. _ -ume h mul -
tiplicati on or addition would require two instructi ons. A 1000tap filt er \\ ould need
approx imately 100 multipli cation and 100 addi tions . so the tota l ti me \\ ould be (100 mul-
ti pli cati ons 2 instr/ mult + 100 addit ions * 2 instr/add) * 10 ns per in-tru tion = -WOO os_
In ot her words. the hardware implementation would be mer 100 tim' Ia, ter (
3-1 ns) than the software implementat ion. hardware implementation uld therefore pm:
100 lime more dala than a software implementation. resulting in much bett er tiltenng.
5.4 DETERMINI NG CLOCK FREQUENCY
RTL de igll produces a processor. consisting of a datapJth und l controller. in' ld th
datapath and cont roll er are registers. and registe.." reqUI re ad'" , ign:ll . .-\ ...1.' ' lgn;1]
must have" panicular frequenc) . The \\ ill ho\\ f,bt th , ) , t III \\(11
exc ' ut e >pecilkd tlIS" . b\ iou,I) . a 10\\ 'r \\ ill re,ult III , 1.,\\ ' r \ Utlc'l\.
\\ hile a hi gher frequcnc) \I ill result in a fu>tcr c\ utl on. <'(1\ '" . J I.trg r
period i. , 10\\ <1'. \\ hilc 1I "Illllkr I 'nlxf I' fast'r.
252 Reg isler-Transfer LevellRTLI Design
Desio"ers of dioital circui ts often (but not always) want their systems to execute as
fast as a designer cannot choose an arbitraril y hi gh clock frequency
(meaning an arbi traril y small period). Consider, for thestmple ClrCU!! m Ftgure
5..l0. in which registers a and b feed through an adder Int o register c. The adder h as a
delay of 2 ns. me;ni ng that when the adder' s inputs change .. the adder's outputs WI ll not
be stable unt il after 2 ns-before 2 ns, the adder's out puts wtll have spunous values (see
Section -1.3 for a descripti on of spurious val ues appearing at an adder's outputs): If the
designer chooses a clock period of 10 ns, the circu!! should work fine. Shortentng the
period to 5 ns wi ll speed the execution. But
shortening the period to I ns will result in
incorrect ci rcuit behavior. One cl ock cycle
might lond new values into registers a and b.
The next clock cycle wi ll load register c I ns
Imer (as well as a and b). but the output of the
adder won' t be stable until 2 ns have passed.
The value loaded into register c will thus be
some spuri ous va lue that has no useful
meaning. and will not be the sum of a and b.
Thus. a designer must be careful not to set
the clock freq uency too hi gh. To determine the
highest possible frequency, a designer must Figure 5.40 Longest path is 2 ns.
anal yze the enti re ci rcuit , and find the longest
path delay from any register to any other regi ster. or from any circuit input to any register.
The longest register-to-register or input-to-regi ster delay in a circuit is known as the cir-
cuit' s critical pal". Designers then choose a clock whose peri od is longer than the
circuit's crit ical path.
Fi gure 5.4 1 illustrates a ci rcuit with at least four poss ibl e paths from any register to
any other register:
One path starts m register a, goes
through the adder, and ends at regi ster
c. This path's delay is 2 ns.
Another path starts at register a, goes
through the adder, through the multi -
plier, and ends at register d. This
path's delay is 2 ns + 5 ns = 7 ns,
Another path starts at regi sler b, goes
through the adde r, through the multi -
plier, and ends al register d. This
path', delay is al,o 2 ns + 5 ns = 7 ns.
The la' l path >tarts at register b, goe,
through the mUltiplier, and ends (It
regi"er d. Thi , palh \ delay i, 5 ns.
Max
(2,7,7,5)
= 7 ns
Figure 541 Dctenni nins the cri tical pOIh.
The longest path is thus 7 ns (there arc aCluall y two ,ueh path,), Thus, Lhe dock
penod mU\1 be al lea'l 7 n'>,
-- - -------------------
5.4 Determining Clock Frequency
The above analysis assumes that the onl y delay
between regIsters IS caused by logic delays. [n reality,
lVires also have a delay. [n Ihe 1980s and 1990s, the
of logic dominated over the del ay of wires-
Wire delays were often negli gible. But in modem chip
technologIes, the del ay of wires may equal or even
exceed the delay of logic, and thus wire delays cannot
be Ignored. Wire delays add 10 a palh's length just as
logIC delays do. Fi gure 5.42 ill ustrales a path length
calculatI on wllh Wire delays included.
Figure 5.42 Longest path is
253
3 ns con idering wire delays.
Furthermore, the above analysis does not consider
se!Up times for the regi slers. Recall from Secti on 3.5
that flip-flop inputs (and hence register inputs) must be
stable for a specified amounl of time before a clock
edge. The setup lime adds to the path length.
Even considering wire delays and etup times, designers typically choose a clock
period thaI is stiliioll ger than the critical path by an amount depending on how conserva-
ti ve the deSIgner wants to be with respect to ensuring the circuit works under a varie!)' of
operating conditions. Certai n conditi ons can change the delay of circuit components, con-
dilions like very high lemperature, very low temperature, vi bration, age, etc. Generally,
the longer the period beyond the critical path, the more conservative the design_ For
example, we might determine that the critical path is 7 ns. but we might choose a clock
period of IO ns, or even 15 ns, the latter being quite conservative,
If low power is a design goal , then a designer mi ght choose an even longer periO<i
such as lOOns, to reduce circui t power. Why reducing the clock frequency reduces power
will be di scussed in Section 6.6.
When analyzi ng a proeessor (controller and datapath) to find the critical path, a
designer must be aware that regi ster-to-register paths ex.i I not just \\ithin the datapath
(Figure 5.43(a, but also within the controller (Figure - .43(b), between the controUer
and dal apaLh (Fi gure 5.43(c, and even between the proce or and external omponems.
The number of possibl e paths in a circuit can be quite large. Consider a circuit \\ith .\'
registers that has paths from every register 10 every other register. Then there are S",V,
possibl e regisler-to-regisler paths. For example, if i 3 and the three regi ICrs are named'-\,
CONSERVATIVE CHIP MAKERS, AND PC OVERCLOCKING,
Chip makers usually publish their chips' mlL,imum
clOCki ng frequency somewhat lower than (he real
mMi mum- pcrhap 10%, 20"", or even 30% lower. ueh
conservatism reduces the chances thn! the chip \\i ll fuil in
unnnlicip3loo silualions. such as extremes of hot or Id
weather, or sli ght vnrinti om in Ul e chip m!U1ufucturing
process, Many pcrnlnal computer enthusiasts have taken
nd,".ullOge of such con",,,,,,,hm b} "overclocking" their
PCs, meaning to sct the clock frcquenC} higher than J
chip's published mal imum, b} cbanging !he PC's 81
(basic input/oulput S) tem) sening .. NUJ1lcrOUS v..
posl stnoso on !he su ;;es :md f:lil=- of
trying to o' 'erdock ne:ui) .' IJ PC
the norm is about 10'l- hIgher lII.ln !he puNosbod
' \\ . I don't f'e\"'QnUllef'kJ \erckx: ng ,flY
one, you !he ml="""", ,.. due 1(\
O\erhe:lting). but lOt re.ting tft tb-: '"
prescO\..'e of all\ dc--.Ign
5 Register-Transfer Level (RTL) Desi gn
Figure 5.43 Crit ical paths throughout a ci rcuit: (a) within a datapath . (b) within ;] controll er,
(c) be",een a controller and d.l. pnth.
B. and C. then the possibl e paths are: A- >A, A->B, A- >C. B->A. B->B, B->C,
C- >,-I. C->B. C->c. for 3*3 = 9 po sibl e paths. For N=50. there may be up to 2500 pos-
, ibl e paths. Because of the large number of possibl e paths. aut omated tools be of great
assi tance. Timing analysis tools automati call y anal yze all paths to determlOe the longest
path, and may also ensure that setup and hold times are salt sfied throughout the CIrcUIt.
5.5 BEHAVIORAL-LEVEL DESI GN: C TO GATES (OPTIONAL)
As pcr chip continue to increase and hence dc. igners build more complex digital
systems that use tho e additi onal transistors. di gita l ystem behavior becomes increasi ngly
difficult to understand. Frequentl y. a designer building a new digit al y tern finds it useful to
fi r t descri be the de ired system behavior u ing a programming language. like C. C++. or
Java. in order to fi rst get that de ired behavior correct. (Alt ernati vely. the designer may use
the hi gh-l evel programmi ng constructs in a hardware descripti on language. li ke VHDL or
Veri log. to fi N get the desired behavior correct. ) Then. the des igner convens lhat program-
mi ng language descri plion 10 an RTL design. by foll owing Ihe RTL design melhod Ihal
usuall y Sians wilh a high-level Siale machine RTL descripli on. Converting a syslem's pro-
gramming language de!>Criplion 10 an RTL descriplion is known as behavioral-level design.
We-li lOlroduce behavioral-level desi gn tIl, ing an exampl e.
EXAMPLE 59 SUI -of absolute-dlHerences In C for video compression
Recall bamplc 5.7. 111 which we crealed a ,um-or-ahsolutc-(hrrcrence, component. In Ihat
eJ;ample. we 'tdrted with hl gh.lc\' cl , late machine-but Ih.1I \ UlI C nmchlllc wa.,n't vcry easy. to
undeN nd We can more eaSily descnbe Ihe compul all on of Ihe , um of ab,ol ule d,rrerence,
C code. a, ,h""n In I.gure 5.4-1
Figure 5.44 C program description
or a sum-or-absolul e differences
computat ion- the C program may
be easier to develop and easier to
understand than a state machine.
5.5 Behavioral-Level Design: C 10 Gates (Optional) 255
int SAD (byte A (256J, byte B (256]) II not quite C syntax
{
uint sum; short uint i;
sum = 0;
i =0;
while (i < 256) {
sum = sum + abs (A/ij - B(ij);
i= i + 1:
relum (sum);
That code is much easier to understand ror mOSI peopl e than Ihe high-level stale machine in
Fi gure 5.29. Thu ror some designs. C code (or somelhing similar) is the mosl natural tarring poinL
To begin the RTL design melhod. we could conven Ihi s code to a high-Ie, el lale machine_ like thaI
in Figure 5.29, and then proceed to complele the RTL design method and hence design the circuiL
Ii is instructi ve 10 define a Struclured method for converting C code 10 a high-level stare
machine. Defi ning such a method makes clear 10 us thai C code can be autamatically com-
piled to either software on a programmable proces or. or 10 a cllsrom digiral circuit_ We
poi nt Oul lhal moSi designers lhal stan with C code and then continue with RTL design do
lIot nece saril y follow a particul ar melhod in performing such cOI1lersion. Howe\er. auto-
mated lools do foll ow a melhod having some similarities to the one we now describe. \>'-e
also poi nl OUI lhm lhe conversion melhod wiLl somelimes result in "extra" tates that you
might noti ce could be combined Wilh other slales-these extra states would be combined by
a later optimi zation slep. though we' lI combine some of them as we follow the method.
We consi der lhree Iypes of staiemenlS in C code-as ignment statemenLS_ while
loop. and condilion statements (if-lhen-else)-and provide higb- Ie\ el tate rna hine tem-
plales for each such Slalemenl.
An assignment tatement in C
Iranslates simply into a stale in a largel = expression: ..
Slate machine. wi th lhe Sl ate' s
actions carryi ng oUl lhe assignmenl.
as shown in Figure -.45.
An if-thell 131ement in C trans-
lates into a Slme Ihm checks Ihe
condilion of Ihe if Slmemen!. and
branches 10 lhe sime for lhe thell
part if Ihe condi li on is lrue. Olher-
wi se branchi ng pasl Ihose tutes to
an end hlle. as shown in Fi gure
5.-16.
We can tranJat e an if-rhell -else
stmemenl in inlo Il similar -late
machine wilh a stUi C Ihut lhe
onditi on of the if stmemOn! . but
Figure 5.45
statement.
.f (cond) {
II lhen stmts
..
(then Slm
+
256 Register-Transfer LeveII RTL} Design
this time branching to states for the
else pan if the if conditi on is fal se_
as shown in Figure 5.47.
The else pan commonl y con-
tains another if statement as C
programmers may have multiple
else if pans in a region of code.
Finall y. a while loop statement
in C translates into states simi lar to
if (cond) (
1/ then stmts
else (
II else stmts
+
--pond
(then stmts) (else stmts)
(end)
+ -.-J
an if-then statemenl. excepl that Figure 5.47 Templat e for if-then-else Slnternent.
after executing the while's state-
ments. if the while condition is true,
the state machine branches back to
the condition check. rather than to
the end state. as shown in Figure
5.-18. Only when the condition is
false can we reach the end slale.
Given these simpl e templates.
we can conven a wide variety of C
programs to hi gh-level state
machines. from which we already
know how to create circuit designs
following our RTL design method.
while (cand) (
II while stmts (while stmts)
+
Figure 5.48 Templme for while loop statement.
EXAMPLE 5.10 Converting an if-then-else statement to a state machine
We are given the C-like code shown in Fi gure 5.49(a). which computes the maximum of two data
inputs X and Y. We can translale that code to a state machine by first translaling the if-then-else state-
ment to states usi ng the method of Fi gure 5.47. as shown in Figure 5.-l9(b). We then translate the lhen
statements 10 states. and then the else statements. yielding the final state machine in Figure 5.49(c).
Inputs: uint X. Y
OutPUIS: uint Max
if(X >Y) (
r------------,
: Max = X; :
r-----------"
else (
r------------,
: Max=Y; :
r-----------"
(a)
(then stmts) (else stmls)
(b) (e)
Figure 5.49 Behavioral-level design slani ng from C code: (a) C code for compuling Ihe max of two
numbers. (b) translating Ihe if- Ihen-else Stalemenl 10 a hi gh- level 'tnte machine. ( ) translaling the
Ihen and else ,tatements 10 states. From the stale machine In (c). we could usc our RTL design
method 10 complete the deSIgn. Note: max can be Impl emenled morc efficiently: we u e max here
(0 provide an easY-lo-understand example.
EXAMPLE 5.11
5.5 Behavioral-Level Design: C to Gates (Optional) 257
SAD C code to high-level state machine conversion
We wish to Conven the C program de cription of the sum-of-absolute differences example of
Example 5.9 to a high-level state machine. The code is shown in Figure 5.5()(a)_ written as an infi-
nite loop rather than a procedure call. and using an input "go" to indi cate when the system should
compuLe the SAD. The "while ( I )" statement, afler some optimization, translates just to a transition
from the last state back to the first state. so we' lI hold off on adding that transition until we have
formed the rest of the state machine. We begin with the statement "while (!go):' which based 00 the
template approach translates to the states shown in Figure 5.50(b). Since the loop has no statemeots
Inputs: byte A(256).B[256) i--------------i
bit go; "
Output: int sad : : : I::
mainO ! ! !go
{ I:'" !.. ______ _ ___ .1
uint sum; short uint i;;1 :
while (1) { ,,.." ! !
1---------------, /" : !
: while (!go); (': :
----------------- I I
sum = 0; : :
i = 0; L _________________J
(b)
(c)
sum = sum + abs(A(i) - Bn)); :
___ _________________. _ J
(. ) !",o,"m I
; ______________ J. __
j
j l(k256)j.-.
.-.
: :./'
i I
L ______ __________ (9)
.go go
sum=O
i=O
(d)
Figure 5.50 Behavioral-Ie,'el design of the sum-or-absolute difleren,.., ,'Ode; (3) ongin31 C
code, wrillen an infinite loop. (b) lrnnsiating the statement ',\, hile l!g.o):' to 3. ,Ute
(e) simplified for "while (!go):" and states for the !bSignmcm ,tll "ment that (-.:
(d) merging tit two assignment into one. (e) insening the template fOf the nt"\ t \\hil 10l'P.
(f) inserting the SIBle, r th!lt \\ hil' loop. merging (\\ 0 ignmenl '19.t 'menlS tOto one. ,,) the rirul
hi gh-Ievd , tnt e ma hine. \\ ith the ',\, hi Ie (I)" inciud,'<! tran",uoolOg tn>m the '3>t - t
the fin-I SHIlt' . and \\ ith ob\ioll'\l) unnccessat: st ..ltc.:o,
258 Register Transfer LevelIRTL) Design
III the loop bod). \\c;! can simplify the loop's Slates a.s shown in Figure S.50(c), 5.50(c) also
thl! .. for the next IWO which are assignment MalCl11c nt s. SInce two
a .. ,i!!nmenb could be done si multanl!oll sly. we merge the IWO illlo Olle, as shown 111
5.56<d). We then the next lIhi/e loop. using the Il'hile loop to the SUi tes sh,own In
Figure :i.SO(e), We fill in the SH\ les for the wllile loop's statement s III Figure 5.50(0. merging the
J"si2l1mclll :-talemenl :-.131es into one stale since the assignments can be done simult aneously.
Fhwrc :11500 shows the state for the last statement of the C code. whi ch assigns sat/=sum.
\\ e eliminate obvious. ly unnecessary cmpty swtes. and add ;1 transit ion from the last slate 10
the state to account fo r the entire code being encloscd in a "while ( 1)" loop.
NOl ice the similarity between our final hi gh- leve l state machine in Figure 5.50(g) and the high-
le"el stiJle 1113chine we des igned from scratch in Figure 5,29.
\\'e will need to map the C data Iypes 10 bits at some point. For exampl e: the C code,
i to be a shan unsigned integer. whi ch means 16 bits. So we could dec lare 1 to be 16 bit s In the
high-level s t3le machine. Or. knowing the range of i to be a to 256. we could instead define i to be
9 bib (C doesn't have a 9-bil wide data lype).
\Ve could then proceed to design a controller and dalapalh from thi s stale machi ne. as was
done in Figure 5.30. we can translate C code to using a straightforward automatable
method.
Through the previous exampl es. you have seen howe code can be convened to a
custom digital circuit using methods that are full y aut omatable.
General e code can contain additi onal types of statement s. some of which can be
eas il y translated to states. For example. afar loop can be translated to states by first trans
fonning the for loop into a IIhile loop. A 51vitch statement can be translated by tirst
translating the 511itch statement to if the,, eI5e statements.
Some e construct s pose problems for converting to a circu it. though. For example,
pointers and recurs ion are not easy to translate. Thus. tool that automate behavioral
design from e code typicall y impose re tricti ons on the a ll owable e code that can be
handled by the tool. Such restri cti ons are known as suhsellillg the language.
Whil e we have emphas ized e code in thi s section. obviously any simi lar language,
such as e++. Java. VHDL. Veri log. etc .. can be converted to cu tom digital circuit s-with
appropriate language subsetting.
5.6 MEMORY COMPONENTS
Registertransfer level design involves instanti ating and con
necting datapath components to fonn data paths, controlled by
controllers. RTL design often uti lizes some additi onal compo
nenh Outside the data path and controller.
One such component is a memory. An MxN memory is a
memory component able to . tore M data it ems of N bit; each.
Each data item in a memory i. known as a word. Figure 5.5 1
depicts the storage avai lable in an MxN memory.
We can generall y categorize memoric, into two group' :
RAM memory. which can be written to and read from, and
ROM memory, which can on ly be read from. Howcver, a' wc
,ha ll sec. the distinction between the two categori c, is billr-
nng due to new technologic, .
------.
...
::;
B
Nb/IS
wide each
MxNmemory
Figuro 551 Logical
\ lew or a memQry.
5.6 Memory Components 259
Random Access Memory (RAM)
A RAM is logicall y the same as a register file (see Section 4.IO)--both components are
memories whose words (each of whi ch can be thought of as a register) can be individually
read and written using address inputs. The differences between a RAM and a regi ter file are:
The size of M- We typi call y refer to smaller memories (from 4 to 512 or perhaps
even 1024 words or so) as regi ster fi les. and larger memories as RAM .
The bit storage impl ementati on-For large numbers of word. a compact imple.
mentati on becomes increaSingly imponant. Thus. a RAM typically uses a very
compact implement ati on for bit storage. rather than u ing a Rip-Rop.
The memory' s physical shape-For large numbers of words, the phy ical shape of
the memory s impl ementation becomes imponant. A tall rectangular hape will
have some shon wires and some long wi res, whereas a square shape will have all
medium length wires. A RAM therefore typicall y ha a square hape. to reduce
the memory's critical path. Reads are perfonned by first readi ng out an entire row
of words, and then selecti ng the appropri ate word (column) out of that row.
There's no c1ear cut border between what defi nes a regi ster file and whal defines a
RAM. Smaller memori es (typi call y) tend to be call ed files , and larger memorie
tend to be called RAMs. But you' ll often see the tenns used quite interchangeably.
A typical RAM is single-ported. Some RAMs are dualponed. Adding more pons 10
RAMs is much less common than to register files, because a RAMs larger size makes the
delay and size overhead of extra pons much more costly. ' everthele . a
RAM can have an arbitrary number of read pons and wri te pons. ju t like a register file.
Fi gure 5.52 shows a bl ock diagram for a ID24x32
single-pon RAM (M= 1024. N=32). data is a 32-bit wide
set of data lines that can serve either as input lines during
writes or as output lines during reads. add r is a JDbit input
serving as the address lines during reads or wri te. rw is a I
bit control input that indicates whether the present operation
should be a read or a write (e.g .. rw = 0 means read. rw = l
means write). en i a Ibit control input th3t enables the
RAM for reading or writing-if we dont want to read nor
1024 x 32
RAM
writ e during a parti cul ar clock cycle. we set en to 0 to Figure 5.52 RA\I
prevent a read or write (regardless of the value of block symboL
WHY IS IT CALLED "RANDOM ACCESS" MEMORY?
In the early days of digital de, ign. RA i s did not exist.
If you had infomlntion you wanted your digi tal ircuit
to store. you stored it on a magnetic drum. or :l
magneti c tape. Tape drives (and drum drives too) had
to spin the 13pe to get the head. whi h ould read or
write onto the (ope. alx)\'c the de ired melllo!,)
location. I f the hend urTCI111y ubo\'c locution 900.
and you wanted to wri le t loclllion 999. the tape
would hnve to pi n P"'t 901. 902, ... 99 . until location
999 was under the head. In Nher \\ ords. the tape \\ as
acce sed requtlJ{ial/y. \\'ben R."'-M \\ firq rei a.cl
its Illost appealing feature \\J.!. that ''r.lndQ(1)'
address auld he a 't'sSt.'iI in the S!lJ11C lJ1l()unt of rune
as any other of the-
read addre_ . That" be<-au: then' '" no o.ad-- '" tt'
acres. n R. t. and no pinnlll, of or drum,
Thus. the Icnn ''rJndoOl JI.: \\3..' u..ed..
.tnd ha ... :-tlll'k to thi:-
260 Register-Transfer LevellRTLI Design
Fi gure 5.53 shows the logical internal structure of an MxN RAM. By "Iogical" struc-
ture. we mean that we can think of the structure bei ng implemented in that way, although
a real physical implementati on may possess a different actual structure. (As an analogy, a
logical structure of a telephone incl udes a microphone and a speaker connected to a
phone line. al though real phys ical telephones vary tremendously in their implementa-
ti ons. including handheld devices. headsets, wireless connections. built-in answeri ng
machines, etc.) The main pan of the RAM structure is the gri d of bit storage blocks, also
known as cells. A coll ecti on of N cell s forms a word, and there are M words. The address
inputs feed into a decoder. each output of whi ch enables all the cell s in one word corre-
sponding to the present address values. The enable input en can disable the decoder and
prevent any word from being enabled. The read/writ e control input rw also connects to
every cell to control whether the cell wi ll be wrill en wi th wdata. or read out to rdata. The
data lines are connected through one word 's cell to the next word 's cell , so each cell must
be designed to onl y output it s content s when enabled and thus output not hing when di s-
abled, to avoid interfering wi th another cell 's out put.
i5
"C
'"
addrIA-l)
clk
LetA = 1092 M
rdata(N-l) rdatalN-21 rdataO
Figure 5.53 Logical internal structure of a RAM.
Noti ce that the RAM in Figure 5.53 has the
same inputs and outputS as the RAM block diagram
in Figure 5.52, except that the RAM in Fi gure 5.53
has separate write and read data lines whereas
Figure 5.52 has a single set of data lines (a single
port). Figure 5.54 shows how the separate lines
might be combined inside a RAM having just a
single set of data lines.
data(N-l ) dataO
Figure 5.54 RAM data inpui/
output for a single port .
Bit Storage in a RAM
Compared to a register file, the key feature of RAM tS ItS compactness. Recall from
Chapter 3 that we implemented a bit storage bl ock using a D nip-Oop. Because RAMs
store large numbers of bits, RAMs utilize a bit torage bl ock that is more compact than a
flip-flop. We thus discuss briefly the internal design of the bi t storage blocks inside tWO
------------------
5.6 Memory Components 261
popul ar types of RAM-stat ic RAM and dynamic RAM. However. be forewarned that
the Internal deSIgn of those block ' I . .
'. S InVO ves electrontcs Issues beyond the scope of this
book, and Instead IS wi thin the scope of textbooks on VLSI or advanced digital design.
Fortunately. a RAM component hides the complexi ty of its internal elecrronics by using a
controlle:, and thus a digital designer's interaction with a RAM remai ns as dis-
cussed In the prevIOus ection.
Stati c RAM
RAM uses a bit storage bl ock involving
two Inverters connected in a loop. as shown in
Figure 5.55. A bit d will go through the
bOllom inverter to become d', then back
through the top inverter to become d again-
thus, the bit is stored in the inverter loop.
NOlice that thi s bit storage block has an extra
line, da ta '. passing through it , compared
with the "logical" RAM structure in Figure
5.53.
To wri te a bit into thi s inverter loop, we
set the data line to the value of the desired
bit , and d a t a' to the compl ement. So to store
a I, the memory controll er sets d a t a = 1 and
data ' =0, as shown in Fi gure 5.56. (To store
a O. the controll er would have set data=O and
data ' =l.) The controller then sets
Figure 5.55 SRAM cell.
enabl e=l, whi ch turns on bot h shown tran-
sistors. The data and data ' values thus '---------------------------'
appear in the inverter loop as shown (over- Figure 5.56 Wri ti ng a I to an
writing whatever value was there before). SRAM cell.
Fully understanding why thi circuit works
involves electri cal details beyond the scope of
thi s di scussion.
Reading the stored bit can be done by first elling the da ta and da ta' line bolh 10
1 (an act known as prechargillg). and then by serting enable 10 1. One of the enabled
transistors will have a 0 at one end. causing the precharged 1 on the da ta or da ta' 10
drop to a vol tage slightl y less than a regular logic 1. Both the data and data' lines
connect to a special circuit call ed a sellse amplifier that detects the \'oltage On
d a t a is sli ghtl y hi gher than data' . meaning IOf!ic 1 is stored. or whether the \' n
data ' is slight ly hi gher than on data . logic 0 is slOred. Again. detail -fthe
electronics are beyond the scope of this di scussion.
Notice that the bit storage block of Figure 5.-7 utili zes ix transistors-{\\O in ' ide
each of the two inverters_ and two transistors outside the in\'erters. ix transi_ t rs are
fewer than needed inside a D flip-flop. A tradeoff is that special must be used t
read a bit stored in thi s bit storage block. where:!., a D Hip-flop ourput ' regular logic
values directl y. uch special circuitry slows the access time f the SIOI\.-d bit.
262 Register-Transfer LevellRTLI Design
DRA'v1 ch/fJ'ifirrt
appeared Ifl the
early 197(}(, ufld
((Ju/d hold only a
tlwu wnd bm
W{)t!unDRA \tfs
('(In hold tnt",)'
hllllon\ of bill
RAM based on a six-transistor bi t storage
block. or similar such block, is known as a
sIalic RAM. or SRAM. A static RAM mai n-
tains the stored bit as long as power is
supplied to the transistors. Except. of course.
when the block is being written. the stored bit
does /lot change-it is (noL changing),
Dynamic RAM
An alt ernati ve popu lar bit storage block used
in RAM has only a single transi stor per block.
Such a block utili zes a (relati vely large)
capacitor at the output of the transistor. as
shown in Fi gure 5.58(a).
Writi ng can occur when enable is 1:
d a t a 1 will charge the top plate of the
capacitortoa L will make it O.
When enable is returned to O. a 1 on the top
pl ate will begin to di scharge across to the
bottom plate of the capacitor on to ground
(Why? Because that 's what a capacitor does.)
However. the capacitor is intentionall y
desi gned to be relativel y large, so that the dis-
charge takes a long time, during which time
the bi t d is effecti vely stored in the capacit or.
Fi gure 5.58(b) provides a timing diagram
illustrating the charge and di scharge of the
capacitor.
Reading can be done by first setting da ta
word
enabte
To sense amplifiers
Figure 5.57 Reading an SRAM.
cell
word
pi:-
enable
I d
,/ slowly
Tttapacltor
discharging
(al
Ibl
Figure 5.58 DRAM bit storage (a)
bit storage block. (b) discharge.
to a volt age midway between 0 and L and then setti ng enabl e to 1. The val ue stored in
the capaci tor will alt er the voltage on the data line. and that alt ered volt age can be sensed
by special circuit s connected to the data line that amplify the ensed value to either a
logic I or a logic O.
lt turns out that readi ng the charge stored in the capacit or di scharges the capacitor.
Thus. the RAM must immediately write the read bi t back to the bit storage block after
reading the bl ock. The RAM mu t cont ain a memory controll er that aut omat icall y per-
forms such a write back.
Because a bit tored in the capacit or graduall y di scharges to ground, the RAM must
refresh every bit storage block before the bi ts compl etely di scharge and hence the stored
bit is lost. To refresh a bit storage block, the RAM must read the block and then write the
read bit back to the bl ock. Such refreshing may be done every few mi croseconds. The
RAM must include a built-in memory controller that automati call y perform these
refreshes.
Note that the RAM may be bUl>Y refreshing it self at a time that we wish to read the
RAM. Funhermore. every read must be foll owed by an automatic writ e. Thu . RAM
based on one- Lra nl istor plus capacitor technology may be slower to "ecess.
Using a RAM
5.6 Memory Components 263
Because the stored bit challges (di scharges) even when power is upplied and we are
not writing the bit storage block, RAM based on the one transistor plus capacitor bit
storage block is known as dynamic RAM, or DRAM.
Compared to SRAM, DRAM is even more compact, requiring only one transistor per
bi t storage bl ock rather than six transistors. The tradeoff is that DRAM requires
refreshing, which ultimately slows the access ti me. Another tradeoff. not alluded to
above, is that creating the relati vely large capaci tor in a DRAM requires a special chip
fabricati on process. and thus incorporating DRAM with regular logic can be costl y. In the
I 990s, incorporating DRAM with regular logic on the same chip was nearl y unheard of.
Technology advancement s, however, have led to DRAM and logic appeari ng on the same
chip in more and more cases.
Fi gure 5.59 graphi call y depicts the compact-
ness advantages of SRAM over register fi les, and
DRAM over SRAM, for storing the sallie number
of biLs.
Figure 5.60 shows timing diagrams de cribing
how to write and read the RAM of Fi gure 5.52.
The timing diagram in Figure 5.60 shows how to
write a 9 and a 13 into locations 500 and 999
during clock edges I and 2, respecti vely. The next
cycle shows how to read locati on 9 of the RAM,
MxN memory
implemented as a:
register
file
SRAM
DRAM
Figure 5.59 Depiction of compacrnes
benefits of SRAM and DRAM (not to
by setting and scale).
(meaning read). Shonl y after r w become 0, data becomes 500 (the value we had previ-
ously stored in location 9). Notice that we had to disable our writing of data first (by
setting it to Z). so a not to interfere wi th the data being read from the RAM. AI 0 notice
that Lhis RAM' s read functionali ty is asynchronous.
, , ,
1 l
addr
, , ,
rw 1 means write : "
, ,
, , ,
h h :
l RAM{9] i RAM{13] i
now equals 500 now equals 999
(a)
addr R setup i
data 500
rw 1
I I time :
I I aa:ess
! ! bine
1 '
(b)
Figure 5.60 Rending and writi ng a RAM: (a) timing diagroms. (b) setup. hold. and J< ss time-
The delay between our setting the rw line to read and the rend datu stabilizing ut the
da ta output is known as the RAM's access time or read tillle_
We now provide ,m example of using a RA t in an RTL design.
5 Register-Transfer level (RTl) Design
EXAMPLE 5.12 Digital sound recorder using a RAM
Let's design a system that CJIl record sound. and can pl ay thai recorded Such a recorder
i!'> found in various toys. in telephone answering machines. In cell phone announcements,
and numerous Dlher devices. \Vc'lInccd an analog-todigital convener (0 dIgiti ze the sound, a RAM
(0 store the digi tized sound. a digital .lo-analog convener to output the digitized sound, and a pro-
cessor 10 both convert ers the RAM. Figure 5.61 shows a block diagram of the system.
If
microphone
4096x16
RAM
speaker
Figure 5.61 Utilizing a RAM in a digital sound recorder system.
To slOrc digitized sound. the processor block can
implement the hi gh-level stale machine segment shown in
Figure 5.62. The machine fi rst inti ali zes its internal
address counler a to 0 in state S. Next. in state T. the
machine loads 11 value inl o the analog-Io-digital convener
to cause a new analog sample to be digitized. and sets lhe
three-state buffer to pass that digitized value to the
RAM's da ta lines. That state also sets the RAM address
to the counler a's value. and sets the control li nes (0
enable writing. The machine lhen transitions to slate U.
whose transitions check the value of a against 4095. That
Slate also increments a. (Remember that the transi tions
from U will use the old va lue. not the incremenlcd value,
of a. Thus. the transiti ons compare wit h 4095. not 4096.)
The machi ne returns to Slat e T and hence cOnli nues
writing samples in memory addresses as long
as the memory is nOt yet fill ed (a < 4095). Notice that
the comparison is with 4095. not 4096. Thi s is because
the action in Slale X of a - a + 1 does nOI occur until
the next clock edge. so the compari son of a < 4095 on
, tate K s outgoing uses the old value of a, not
the incremented value (See Secti on 5.3 discussion of
common pitfallq
To playback the stored digititcd .ound. the processor
block can implement the hi gh-level Mate machine
segment hown in Figure 5.63. After initializing the
counl er a in stale V. the machine Male W St:Jle tV
Figure 5.62 State machine for
stori ng di giti zed sound in RAM.
a=O
Figure 5.63 Stat e machine for
playing ,ound from the RAM.
5.6 Memory Components 265
di sables the three-state buffer. to avoid interfering with the RAM's output data that wi ll appear
dUflng RAM reads. That state also sets the RAM address lines. and sets the RAM control lines to
enable reading. Read data will thus appear on da ta lines. The next state X loads a value into the
convert er, 10 convert lhe data jusl read from RAM to the analog signal. That stale
also IOcrements the counter a. The machine returns to state W to continue reading. until the entire
memory has been rcad.
Read-Onlv Memory (ROM)
A Read-Onl y Memory (ROM) is a memory that can be read from. but not written to.
Because of being read only, the bit-storage mechanism in a ROM can be made to have
several advantages over a RAM, including:
Compoct/less-a ROM's bit slorage may be even small er Ihan a RAM' s_
NO/l voIOlility--A ROM's bi t storage mai ntains its contents even after the power
suppl y to the ROM i shut off-when turned back on. the ROM's contents can be
read agai n. In Contrast. a RAM loses its contents when power is shut off. A
memory Ihat loses its cont ent s when power i shut off is known as volatile. while
a memory Ihal maint ains its contents wi thout power is known as nonvolatile.
Speed-A ROM may be faster to read than a RAM. e pecially Ihan a DRAM.
wIV-polVer- A ROM does not consume power to maintain its contents. in con-
trast to a RAM. Thus, a ROM consumes less power than a RAM.
Therefore. when the data stored in a memory will not change. we might choose to
store that data in a ROM to gain the above advantages.
Fi gure 5.64 shows a bl ock symbol of a I024x32
ROM. The logical internal structure of an MxN ROM
is shown in Fi gure 5.65. Notice that Ihe internal
structure is very imi lar to the internal structure of a
RAM shown in Figure 5.53. Bit storage blocks
forming a word are enabled by a decoder output. with
the decoder input being the addres . However.
data
en
because a ROM can onl y be read and cannO! be Figure 5.64 10.4x3. ROM
written, there is no need for a rw input comrol to block symbol.
specify read versus write. nor for wdata inputs to
provide data being written. Also. because no synchro-
nous writ es Occur in a ROM. the ROM does not have a clock input. In fact. not only is a
ROM an asynchronous component. but in fact a ROM can be thought of as 3 combina-
tio/l ol component (when we only read from the ROM: we'lI see variation later).
Some readers mighl at this poi nt be wondering how we write the initial ntents of a
ROM lhal we then can onl y read. After all. if we can't write the content of a at all.
then the ROM is reall y of no u e to us. Obviously. there must be a \\ 3) 10 write the con-
lents of a ROM. bUI in ROM terminology. the writing of the initial contents of 3 i
known a ROM programmillg. ROM types differ in their bit storage bl k implemenm-
tions. which in lurn causcs differen es in the methods used r RO;\1 programming. We
now describe several popul ar bil slomge block implementations for R
266 Register Transfer Level (RTL) Desi gn
ROM Types
addrO
-0 addrl
u
'"
addr(A ' )
en
LetA = log2 M
word i5
enableai
dO --- --- - - ---
--- ------
(t__ L_ J
I - , --- I
data(N' ) dala(N2) dataO
Figure 5,65 Logical int ernal structure of a ROM.
Mask-programmed ROM
bit storage
block
(a"ceW' )
word
word word
enable-enabi8
data
data line o data line
Fi gure 5.66 illustrates the bi t storage cell
for a mask- programmed ROM. A mask-
programmed ROM has its contents pro-
grammed when the chip is
manufactured. by directl y lIIirill g Is to
cells that should store a I , and Os to
cell s that should store a O. Recall that a
"I" is ac tuall y hi gher-than-zero
voltage coming from one of everal
power input pins to a chi p-thus. wiring
a I means wi ring the power input pin
directl y to the cell. Likewise. wiring a 0
Figure 5.66 Mask programmed ROM
cell s: teft cell programmed with 1. ri ght
cetl with O.
to a cell means wiring the ground pin
directl y to the cell . Be aware that Fi gure 5.66 presents a logical view of a mask.pro-
grammed ROM cell- the actual phys ical design of such cell s may be somewhat
diffe rent-for example. a common design strings several vert ical cell s together to form a
large NOR-like logic gate. We leave detail s for more advanced textbooks on CMOS
circuil design.
Wires are pl aced ont o chips during manufacturi ng by using a combinati on of light.
sensiti ve chemi cals and li ght passed through len es and "masks" that block the li ght from
reaching regions of Ihe chemi cals. (See Chapter 7 for funher det ails. ) Hence the term
"mask" in mask-programmed ROM.
Mask-programmed ROM has Ihe best compactness of any ROM type. but the con
of the ROM must be known during chip manufacturing. This ROM type is best
suited for high-volume well -established products in whi ch compactness or very low cost
is critical, and in which programming of Ihe ROM will never be done after the ROM's
chip i, manufact ured.
-
5.6 Memory Components 267
Fuse-Based Programmable ROM-One.Time Programmable (OTP) ROM
Fi gure 5.67 illustrates Ihe bit storage cell
for a fuse-based ROM. A /use-based ROM
uses a fuse in each cell. A fuse is an elec-
tri cal component that initially conducts
from one end to the other just li ke a wi re,
but whose connecti on from one end to the
other can be destroyed ("blown") by
pass ing a hi gher-than-normal current
through the fuse. A bl own fuse does not
conduct and is instead an open circuit (no
connecti on). In the figure, the cell on the
left has its fuse int act, so when the cell is
enabl ed. a 1 appears on the data line. The
data line data line
word
__ __ it-t __ __ -tr
fuse blown luse
Fi gure 5.61 Fuse-based ROM celt s: left cell
programmed with t . ri ght celt with O.
cell on the ri ght has it s fuse bl own. so when the cell is enabled. nothing appears on the
data line (special electronics wi ll be necessary to conven that nothing 10 a logic 0).
A fuse-based ROM is manufactured with all fuses intact, so the initiall y stored con-
tents are all I s. A user of thi s ROM can program the contents by connecting the ROM to
a special device, known as a programmer. that provides hi gher than normal currents 10
onl y those fuses in cell s that should store Os. Because a user can program the contents of
thi s ROM. the ROM is known as a programmable ROM, or PROM.
A blown fuse cannot be changed back to its initi al conducting form. Thus. a fuse-
based ROM can onl y be programmed once. Fuse-based ROM are therefore al so known as
olle-lime programmable (OTP) ROM.
Erasable PROM-EPROM
Figure 5.68 depicts a logi cal view of an
erasable PROM cell. An erasable PROM.
or EPROM. cell uses a special type of
transistor, having what is known a. s a
floating gate, in each cell . The details of a
floating gate transistor are beyond the
scope of thi s section. but briefly-a
fl oat ing gate transistor has a special gate in
whi ch electrons can be "trapped:' A Lran-
sistor with electrons trapped in its gate
stays in the nonconducting siruation. and
thus is programmed to store a O. Other-
wise, the cell is considered to store a 1.
"''''
word
enable
data line data line
celt
trapped electrons
Fi gure 5.68 EPROM celt s: left celt
programmed with L right celt \\ ith O.
pecial electronic circuitry convens sensed current on the dat a line' a; logic I or O.
An EPROM cell initially has no electrons trapped in any fl oating gate transistors. -
the initially stored contents are all I s. A programmer d \ ice applies higher-than-nonnal
volt age to those transistors in cell s that should store Os. That high \'olt:\g" ' ause, d -
trons to IlI/l1l e/ through a slllall insul ator into the fl oating gate region. When th' \ Itnge is
removed. the electrons do not have enough energy to tunnel ba k. and thus are trapped as
shown in the right cell of Figure 5.6 .
268 RegisterTransfer LevellRTLI Design
The electrons can be freed by exposing the electrons
to ultraviolet (UV) light of a part icul ar wavelength. The
UV light energizes the electrons suc h that they tunnel back
through the small insulator, thus escaping the floating gale
region. Exposing an EPROM chip lo UV li ght therefore
"erases" all the stored Os. reslOring the chip lo having all
1s as cont elllS. aftcr whi ch the EPROM can be pro-
grammed agai n. Hence the term "erasable" PROM. Such a
chip can typicall y be erased and reprogrammed about ten
thousand times or more, and can retain its content s without
power for ten years or more. Because a chip usuall y
appears inside a bl ack package thm doesn' t pass light. a
chip with an EPROM requires a wi ndow in that package
through which UV light can pass. as shown in Figure 5.69.
EEPROM and Flash Memory
Figure 5.69 The "window"
in (he package of a
microprocessor that uses an
EPROM 10 Slore programs.
An electrically erasable PROM, or EEPROM, utili zes the EPROM programming method
of using high voltage lO trap electrons in a fl oating gate tranSislOr. However, unlike an
EPROM that requires UV light to free the electrons and hence erase the PROM, an
EEPROM uses anot her high volt age to free the electrons. EEPROMs thus avoid the need
for placing the chip under UV li ght.
Because EEPROMs use voltages for erasing, those volt ages can be applied to spe-
cific cells only. Thus, whi le EPROMs must be erased in their entirely, EEPROMs can be
erased one word al a lime. Thus, we can erase and reprogram certain words in an
EEPROM wit houl changing the conl enlS of olher words.
Some EEPROMs require a special programmer device for programming. However,
most modem EEPROMs do not require special voltages to be applied to the pins, and also
include internal memory controll ers that manage the programming process. Thus, we can
reprogram an EEPROM's contents (or part of its contents) wi thout ever removing the chip
from the system that the EEPROM serves-such an EEPROM is known as being in-system
programmable. Most such devices can therefore be read and wrillen in a manner very
similar to a RAM.
Figure 5.70 shows a block di agram of an
EEPROM. Notice that the data lines are bidirectional.
just as was the case for RAM. The EEPROM has a
control inpul wri te-vlri te=O indicates a read
operat ion (when en=1), whi le write=1 indicates
thai the data on the data lines should be programmed
into the word at Ihe address specified by the address
linc . Programming a word into an EEPROM takes
time, though, perhaps several. dozens, hundreds, or
even thousands of clock cycles. Therefore. EEPROM
may have a control OUlput busy to indicate that pro-
gramming is nOI yet complete. While the device is
the EEPROM user should not try writing to a dif-
ferent word. Fortunalely, mOM EEPROMs will load
32
--+- data
10
_ en 1024 x 32
EEPROM
----.. write
_ busy
I>
Figure 5.10 1024x32 EEPROM
block symbol.
Using a ROM
5.6 Memory Components 269
the data to be programmed and the add '. . .
wnlmg the EEPROM f h' ress mto mternal regIsters, freemg the circuit that is
rom avmg to hold th a1 .
Modem EEPROM ese v ues constant dunng programming
s can be prog d .
more, and can retain thel' ramme tens of thousands to millions of time or
r Contents for tens t h d
Whil e erasing one word t . . 0 one un red years or more without power.
other applications need to a a tIme IS fine for some applications that utilize EEPROM
erase large block f ' .
camera application would d s 0 memory qUIckly-for example. a digital
. nee to era e a blo k f .
pIcture. Flash memory is a Iype of EEP . c a. memory correspondmg to an entire
memory can be erased ve ui ROM In whIch all the words with a large block of
time. A flash memory q ckly, perhaps sImultaneously, rather than one word at a
Many fl ash memories:a: erased by setting an erase control input to 1.
erased whil e other ' y a specifi c regIon, known as a block or sector. lo be
regIons are left untouched.
We now provide examples of using a ROM in RTL designs.
EXAMPLE 5.13 Talking doll using a ROM
We wish to design a doll thai s aks lh " .
moved. A block diagram of th pc e. message NIce 10 meel you" whenever the doll's righl arm is
e syslcm IS shown in Fioure 5 71 A 'b . .
ann has an output V that is 1 when vibr.ltion '. 0 " VI ration sensor In the dolr right
then output a digitized version of the "Nice IS sensed. A detects, vibration and houJd
attached to a speaker. The "Nice 10 mec " [0 meet message to a dlguaJ-to-analog converter
actress. Because that message 'II t you message wil l be the prerecorded voice of a professional
message in a ROM. WI nOI change for the li fetime of the doll producl, we can store thai
Figure 5.11 Utili zing a ROM
in a lalking doll system.
4096 x 16 ROM
Figure 5.72 shows a high-level stale machine
s.egment that plays the message after detecti ng vibra-
lI on. The machine starts in stale S. inil'i:liizing the
address counter a to O. and waiting for vibra-
tIOn be sensed. When vibration is sensed. the
machine to Slate T. which reads the current
locatIon. The I11Hchine moves on to state U.
whIch loads the digital -la-analog converter with the
read value fmm ROM. incremems a. and proceeds
back 10 Tas long as a hasn' l reach d -1095 (remember
thm the transilion fmm U uscs the value of a before
Ihe II1cremenl. so compares 104095. not to -1096).
speaker
vibration
sensor
270 Register-Transfer LevelI RTL) Design
Because thi:-. doll' s message wi ll never change. we choose, to usc a
ROM or an OTP ROM. We migiu uti li ze OTP ROM dUri ng protot yplll g or dUri ng IIlll1al sales of
the doll. :lIld then produce m3;k-prograllll11cd ROM versions during high-volume producli on of
the doll.
EXAMPLE 5.14 Dtgttal telephone answertng machine using a flash memory
\Ve are to design the olltgoing announcement part of a telephone answering mac.hi.ne (e.g .. "We're
Il ot home ri!:!llI now, leave a messnge."). That announccmcm should be stored digit all y. should be
recordable by the machi ne owner any nu mber of ti mes. and should be saved even if power is removed
from the illlswering machine. Recording begins immediately after the owner presses a record buncn,
which selS a signal rec 10 1. Because we must be abl e record the announcement . we cannot use
a maskprogrnlllllled ROM or OTP ROM. Because removll1g power shoul d not cause the announce-
ment to be lost. we cannot use a RAM. Thus. we might choose 3n EEPROM or a Aash memory.
We' lI u5e a nash memory. shown in Figure 5.73. Noti ce the fl ash memory has the same inte.r-
face as a RA1\ll. except that the nash memory has an extra Input erase. on
panicular nash memory cl ears the contents of the ent ire flash. \Vhll e the .nash memory IS erasmg
itself. the fla sh sets an output busy to 1. duri ng whi ch ti me we cannOl wnte to the fl ash memory.
4096 x 16 Flash
Figure 5.73 Utilizing a fl ash memory in a di gital answeri ng machine.
Fi gure 5.74 shows a hi gh level stale machine
segment for recording the nnnouncemeni. The
Mate machine segment begins when the record
bUlton i pressed. Slate S activates the erase of the
nash memory (e r =l ), and then state T waits for
the era, ing to compl ete (bu'). Such erasing
should occur in jusl n few mi ll iseconds. so we
shouldn' t mi ss any of the spoken an nouncement.
The state mnchine then transiti ons 10 Slale U.
which copies a digitized sample from the analog-
di gital converter to the fl a'>h memory. writing to
the current address a. State U also increments a.
The next 'tate f II) checks to 'ee if the memory i,
fill ed with , ample, by checking if d( 4096.
returning to ,tate U until the memory is fi ll ed .
. _------------------
Figure 574 State machine for storing
di gi ti zed .;;ound in a memory.
5.7 Queues IFIFOs) 271
Noti ce that. unli ke Examples 5. 12 and 5.13. this tate machi ne increments d before the state that
checks for the last address (state V) , so V"s Lransi li ons use 4096. not 4095. We how this version JUSI
for varlely. The version in Example 5. 12 may be Slightly bener because that version requires that d_
and the comparator, onl y be 12 bi ts wide (to represent 0 to 4095) rather than 13 bits wide (to repre-
sent 0 to 4096).
. state machine assumes thal writes to the fl ash occur in one cl ock cycle. Some flash memo-
nes requi re more ti me for writes, assert ing thei r busy out put unti l the write has complered. For such
a fl ash. we would need to add a slate between stat.es U and V. simil ar to the state T between Sand U.
To prevent mi ssing sound samples while wailing, we mi ght want to first save the entire sound
sample in a 4096x 16 RAM, and then copy the entire RAM contents to the flash.
The Blurring of the Distinction between RAM and ROM
Noti ce that EEPROM and Hash ROM blur the distincti on between RAM and ROM. Many
modem EEPROM devices are writ abl e just like a RAM. havi ng nearly the arne interface.
with the onl y difference being longer write times to an EEPROM than to a RAM. How-
ever. the difference between those time is shrinking each year.
Funher blurring the distinction are nonvolatile RAM (NVRAM) device, which are
RAM devices that retain their contents even without power. Unl ike ROM. NVRAM write
times are just as fast as regul ar RAM- typi call y one clock cycle. One type of NVRAM
simpl y includes an SRAM with a built -in battery. with the battery able to supply power to
the SRAM for perhap ten years or more. Another type of VRAM includes both an
SRAM and an EEPROM- the NVRAM controll er automaticall y backs up the SRAM's
contents into the EEPROM. typi call y just at the time when power is bei ng removed. Fur-
thermore, extensive research and development into new bit storage technologies are
leading to NVRAMs that are even cl oser to RAM in terms of performance and density
while being nonvolatile. One such technology is known as MAGRAM. shon for magnetic
RAM, which uses magneti sm to store charge. having access ti mes similar to DRMt. but
withoUlthe need for refreshing. and wi th nonvolatil ity.
Thus, di git al de igners have a tremendous variety of memory types available to them_
with those types di ffering in their cost. performance_ size. nonvolatility_ ease-of-use. write
time_ duration of data retenti on_ and other factors.
5.7 QUEUES (FIFOs)
Somerimes our data storage needs specifi -
call y require that we read items in the same
order that we wrote them, and that reading
removes the it em from the li st. For example,
a busy restauranl may mai n lai n a wail ing li sl
of customers-the host writes customer
names to the rear of the li st. but when a tabk
becomes avail abl e. the host reads the next
customer' s name from the fivlII of the li st
and removes that name from the list. Thus.
the fi rst customer wri tten to the list is the
first cu -tomer read from the list. A qlleue is
write items
to back
ofthe queue
back from
read (and
nemove) Items
from front of
the queue
Figure 5.75 C'onc'Cp1ual \ ie" of 3 queue.
272
PLEASE
QUEUE
FROM
THIS
END
Register-Transfe r Leve l lRTLJ Design
a list that i writt en at the rear of the list but read from the beginning of the list, with a read
also removi ng the read it em from the list, as illustrated in Figure 5.75. The common tenn
for a queue in American Engli sh is a "Iine"-for exampl e, you stand in a line at the grocery
store. with people entering the rear of the line. and being served from the front of the li ne.
In Bri ti sh English. Ihe word queue is used directly in everyday language (which somelimes
confuses Ameri cans who visit other English-speaki ng countries). Because the first item
wri tten int o the li st wi ll be the first item read out of the list, a queue is known as beingfi rst-
ill fi rst-out (FIFO). As such, sometimes queues are call ed FIFO queues, although that tenn
is redundanl because a queue is by definilion fi rst-in fi rst-out. The term FIFO itself is often
used to refer to a queue. The term buffer is also somelimes used. A wri te to a queue is
someti mes call ed apush or ellqueue, and a read i sometimes call ed pop or dequeue.
We can implement a queue using a 7 6 5 4 3 2 1 0
memory-either a register fi le or a RAM. :- -1: -1 :--1 :--1 :--1 :--1 r- -1 :--1
depending on the queue size needed. ::::!:! l! ! 1 ii Ii 1
When using a memory. the from and rear , __ J ,__ J , __ oJ ' __ J I __ J ' __ J '__ J ' __ J
wi ll move to diffe rent memory locations r I
as the queue is wrinen and read, as ill us- 7 6 4 3 0
trated in Figure 5.76. The fi gure shows an
initi all y empty eight -word queue with A---
fronl and rear bOlh set to memory address
O. The fi rst acti on on the queue i a write
of item A. whi ch goes to the rear (address
0). and the rear increments to address I.
The neXI aCli on is a writ e of it em B, B---
whi ch goes to the rear (address I). and Ihe
rear increments to 2. The next acti on is a
read. which comes from the front (address
0) and thus reads out it em A. and the front
increments to I.
6 2
I
o
,--- ,--' ,---,--, ,--- ,--- G G
' II II II '1 II ,
I II It II II II I
: :: :: :: :: :: : B A
I II II II II II I
1 __ .1 1 __ .1 I __ J t __ J 1 __ .1 ' __ J
7 6 3
r
2
r
I
o
Subsequent reads and wri tes continue
likewise, except that when the rear or front
reaches 7, its next value should be O. not 8.
[n other words. the memory can be thought
of as a circle. as shown in Figure 5.77.
Figure 5.76 Writing and reading a queue
implemented in a memory causes lhe front
(I) and rear (r) 10 move.
Two conditi ons of a queue are of
interest:
Empty: there are no items in the
queue. Thi s condition can be
detected as frolll = rear, as seen in
the topmost queue of Figure 5.76.
Full: there is no more room to add
items to Ihe queue, meani ng there
are N items in a queue of ize N.
This comes lIbout when the rear
wrap; around and catches back up
to the front. meaningfrollt = rear.
o
Figure 5.77 Implementing a queue in a
memory lreats the memory as a circl e.
A
- .. - - - - ._------
5.7 Queues {FIFOsJ 273
Unfortunately, not ice that the conditions detecting the queue being empty and the
queue beJllg full are the same- the front address equal s the rear address. One way to tell
the two conditi ons apart is to keep track of whether a write or a read preceded the front
and rear addresses becoming equal.
In many uses of a queue, the circuit writing the queue operates independentl y from
the CirCUli reading the queue. Thus, a queue impl ememed wi th a memory may use a two-
port memory havmg separate read and write ports.
We can implement an 8-word
queue using an 8-word two-port 8x16 register fil e
register fi le and additional compo- W a
16 16
data rdat
wdata rdata
nents, as depi cted in Fi gure 5.78.
A 3-bi t up-counter maintains the
front address, while another 3-bit
up-counter mai ntains the rear
address. Noti ce that these counters
will naturall y wrap around from 7
to 0, or from 0 to 7, as desired
when treating the memory as a
circl e. An equali ty comparator
detects whether the front counter
equals the rear counter. A con-
troll er writes the write data to the
register fi le and increments the
rear counter during a write, reads
the read data from the register fil e
and increments the front counter
~ -
~ f
~ ~
r
.2
e
c:
0
0
L....-
~
waddr
I--- ).r
f-
elr
I-
inc
3bil
up counter
> rear
+
eq I
.
raddr
~
rd h
r-
elr
inc
3-bil
up counter
> Ironl
+
-
I
I lull
em
S-word 16bit queue
duri ng a read, and determines Figure 5.78 Arehileclure of an S-word l6-bil queue.
PlY
whether the queue i full or empty based on the equality compari son a well as whether the
previous operation was a write or a read. We omi t further de cription of the queue' con-
troll er, but it can be built by start ing with an FSM.
A user of the queue should never read an empty queue or write a full queue-
depending on the controller design. uch an acti on might ju t be ignored or might put the
queue into a mi sleading internal state (e.g .. the front and rear addre ses may cross over).
Most queues come with one or more additi onal control output that indicate whether
the queue is half full . or perhaps 80% full .
Queue are commonplace in digit al system . Some exampl e incl ude:
A comput er keyboard writes the pressed keys int o a queue and meanwltile
requests that the computer read the queued keys. You might at ome ti me ha\'e
typed faster than your computer was reading the key. in which ase > our addi-
ti onal keystrokes were ignored-and you may have even heard beep, each time
you pre sed addi tional keys. indicating the queue \ as full .
A di gital video camera may write recently captured vi deo frames into a qUeue.
and concurrentl y may read those fmme.! from the queue. compre'. them. 3/ld store
them on tape or anotller medium.
A computer printer may store print job in a queue while th se j bs are waiting \0
be pri nted.
27.t Regist er Transfer LevellRTLI Design
A modem stores incoming data in a queue and requests a comput er to read .that
data. Likewi se, the modem writ es outgoing data received from the computer tnto
a queue and then sends that data out over the modem' s outgoi ng medium.
A comput er network router receives data packets from an input pon and writes
those packets into a queue. Meanwhile. the router reads the packet s from the
queue. ana lyzes the address information in the packel. and then sends the packet
along one of several output pons.
EXAMPLE 5.15 Using a queue 3 2 o
Show the internal stal e or a S-
word queue, and popped data
val ues. after each of the fol-
lowing sequences of pushes and
pops. assuming an in iti ally
empty queue:
I. Push 9. 5. 8. 5. 7. 2. and 3.
1. Pop
3. Push 6
4. Push 3
5. Push 4
6. Pop
Figure 5.79 shows the
queue's internal stales. After the
Initiallyemply
queue
1. Alter pushing
9, 5, 8, 5,7, 2, 3
2. Alter popping
first sequence of seven pushes 3. After pushing 6
(step I ). we see that the rear
address points to addre s 7. The
pop (step 2) reads from the front
address of O. returning data of 9.
The front address increments to 4. Alter pushing 3
I. Note that although the queue
might still contai n the value of 9
5. After pushing 4
7 5 2
765432
r
7 6
7 6 5 3 2
o
data:
9
8800080G lull
rl
ERROR! Pushing a full queue
results in unknown state
in address O. that 9 is no longer
accessibl e during proper queue
operat ion. and thus is essenti all y
gone. The push of 6 (step 3)
increments the rear address. Figure 5.79 Example pushes and pops of a queue.
which wraps around from 7 to O.
The push of 3 (step 4) increments the rear address to I. which now equals the front address,
meaning the queue is now full. If a pop were to occur now, it would read the value 5. But instead, a
push of 4 OCcurs (step 5)-this push should not have been performed. because the queue is Full.
Thus, thi s push puIS the queue into an erroneous state, and we cannot predi ct the behavior of any
subsequent pushes or pops.
A queue could of course come wi th some error- tolerance behavior built in, perhaps
ignoring pushes when full , or perhaps returning some panicular value (li ke 0) if popped
when empty.
- - . - .-- -------------
J
5.8 Hierarchy-A Key Design Concept 275
5.8 HIERARCHY-A KEY DESIGN CONCEPT
Managing Complexity
Through?ut thi s book, we have been utili zing a powerful design concept known as hier-
archy. HIerarchy In general is defined as an organi zati on with a few "things" at the top.
and each thing poss ibl y consisting of several other things. Perhaps the most widely
known type hi erarchy involves a Country. At the top is a country, which consists of many
states or provinces, each of which in turn consists of many cities. A hierarchy involvi ng a
country,. provinces, and citi es is shown in Fi gure 5.80. That figure shows all three levels
of the hterarchy-coumry, provinces, and cities.
Figure 5.81 shows the same country,
but this time showing only the top two
levels of hierarchy-countri es and prov-
inces. Indeed, most maps of a country only
show these top two levels (possibly
showi ng key cities in each province/state,
but cenainl y not all the cities}-showing
all the ci ties al so makes the map far too
detailed and cluttered. A map of a province/
state, however, might then show all the
ci ties within that state. Thus, we see that
hi erarchy plays an imponant role in under-
CityF
CityG
ountry
n
CD
'"
standing countri es (or at least their maps). Figure 5.80 Three-level hier.rrchy example: a
L' country, made up of provinces. each made up of
tkewi se, hierarchy plays an impor- ci ti es.
tant role in digital design. In Chapter 2, we
introduced the most fundamental compo-
nent in digital systems-the transistor. In
Chapters 2 and 3, we introduced several
basic components composed from transis-
tors, like AND gates, OR gates, and NOT
gates, and then some slightl y more
compl ex component s composed from
gates: multiplexers, decoders. flip-flops,
etc. In Chapter 4, we composed the basic Figure 5.81 Hierarchy showing j ust the top
components into a hi gher level of compo- two levels.
nent s, datapath component s, li ke registers.
adders, ALUs, multipliers, etc. In Chapter 5, we introduced components composed of data-
path components, including controllers. datapaths, proces ors (made up of controllers and
datapaths). memories. and queues.
Use of hierarchy enables us to manage complex design . Imagine trying to compre-
hend the design of Figure 5.30 at the level of logic gates-that deign likel\' con i IS of
several thousand logic gates. Humans can' t comprehend everal thousand thing at on .
But they can comprehend a few dozen things. A the number of things grow beyond 3
few dozen. we therefore group those things into a new thing. to manage the omplexity.
However, hierarchy alone is not suffi cient- \ e mu t also associate :lJl underst:lJldable
meaning to the higher-level things we create, a task known as absrrn ti n.
276 5 Register-Transfer LevellRTLI Design
Abstraction
Hi erarchy may not onl y involve grouping thi ngs into a larger thing, but may also involve
associat ing a hi gher-level behavior to that larger thing. So when we grouped transistors to
form an AND gate. we didn' t just say that an AND gate was a group of transi stors-
rather. we associated a specifi c behavior with the AND gate, with that behavior describing
the behavior of the group of transistors in an easil y understandable way. Likewise, when
we grouped logic gates int o a 32-bit adder. we didn ' t just say that an adder was a group
of logic gates-rat her, we associated a specifi c understandable behavior with the adder: A
32-bit adder adds two 32-bit number .
Associat ing higher-level behavior with a component to hide the complex inner details
of that component is a process known as abstractioll .
Abslract ion frees a designer from having to remember, or even understand, the low-
level detail s of a component. Knowing that an adder adds two numbers, a designer can
use an adder in a design. The designer need not worry about whether the adder internally
is implemented using a carry-ripple design, or using some compli cated design that is
perhaps fas ter but larger. Instead. the des igner just needs to know the delay of the adder
and the size of the adder. whi ch are further abstTactions.
Composing a Larger Component from Smaller Versions of the Same Component
A common design task is to create a larger version of a
component from smaller versions of the same compo-
nent . For exampl e. suppose you have 3- input AND
gates avail able to you, but you need a 9-input AND
gate. You could compose several 3-input AND gates to
form a 9- input AND gate, as shown in Figure 5.82. You
could compose OR gates into a larger OR gate, and
XOR gates into larger XOR gates, similarl y. Some
composi tions might require more than two levels-
composing an 8-bi t AND from 2-input ANDs requires
four 2-input ANDs in the first level , two 2- input ANDs
in the second level, and a 2-input AND in the third
level. Some compositions mi ght end up wi th extra
Figure 5.82 Composing a
9-inpul AND gate from
3- inpul AND gales.
inputs that must be hardwired to 0 or I-an 8-input AND bui lt from 3-input ANDs would
look similar to Figure 5.82. but with the bOllom input of the bOll om AND gate hardwired
to 1. After trying a few exampl es of composi ng AND gates into larger ones, you can
come up with a general rule to compose any size AND gates into a larger gate: fill the first
level with (the largest avai lable) AND gates until the sum of their inputs equal the desired
number of inputs, then fill the second level simil arl y (feeding first level output s to the
second level gates), until a level has just one gate (that's the last level). Connect any
unused AND gate inputs to 1. Composing NAND. NOR, or XNOR gates into larger gates
of the same kind would require a few more gates to maint ain the same behavior.
Multiplexers can also be composed together to form a larger mUltiplexer. For
example, suppose you had 4x I and 2x I muxes avai labl e, but you needed an 8x I mux. You
could compose the small er muxes int o an 8x I mux as shown in Figure 5.83. Notice that
-------------
5.8 Hierarchy-A Key Design Concept
52 selects among group i 0- i3 and i 4 - i 7 whil e 51
and 50 select one input from the group. You 'can check
that select line values pass the appropri ate input through,
for exampl e, 525 150 = 000 passes i 0, 525150 = 100
passes 14, and 525150 = 111 passes i 7.
. One particularl y commonl y occurring composi -
ti on problem IS that of creating a larger memory from
small er ones. The larger memory may have wider
words, may have more words, or both.
x
iO iO
i1 i1
i2
i3
i4
i5
i6
i7 i3
S1 sO s2
277
For example, Suppose you have avail able a laroe
number of 1024x8 ROMs, but you want a 1024x32
ROM. Composing the small er ROMs into the larger
one is straightforward, and shown in Fi gure 5.S4.
We' ll need four 1024xS ROMs to obtain 32 bits per
word. We connect the 10 address inputs to all four
ROMs. Likewise, we connect the enable input to all
four ROMs. We group the four 8-bit outputs into our
desired 32-bit output. Thus, each ROM stores one byte
of the 32-bit word. Reading a location, say location
99, results in four simultaneous reads, of the byte at
Figure 5.83 An 8x I mux composed
from 4x t and 2x I muxes.
location 99 of each ROM.
Figure 5.84 Composing a 1024x32
ROM from 1024x8 ROMs.
c:
"
1024x32
ROM
32
8
As another example using ROM. suppose you again have 1024x ROMs a\'ailable_
but thi s time you need a 2048x8 ROM. So you have an extra addre s line because y u
have twi ce as many words to address. Figure 5.85 haws ho\ to use two 1024x ROMs
to create a 2048x8 ROM. The top ROM represent the top half of the memory (10_4
words). and the bOllom ROM the bOllom half ( 1024 words)_ We u e the 11th addre line
(a 1 0) to enable either the top ROM or the bOllom RO 1-the other 10 bilS represent the
offset into the ROM. That 11th bit feeds into a Ix2 decoder. whose output reed into the
ROM enables. Fi gure 5.86 lI ses a table of addresses to show ho\\ the 11 th bit selects
among the two smaller ROMs.
278 Register Transfer level (RTlI Design
ACllIally. we could li se any bit
to scicct between the top RO I and
bOllom ROM. Designers com-
monl) use the lowest-order bit (aO)
to selecl. The lOp ROM would thus
represent all evenl y addressed
words. the bOllom ROM all oddl y
addressed words.
Finall y. since onl y one ROM
will be active at any time. we can
tie together the out put data lines 10
fOfm Ollr 8-bit output. as shown in
Figure 5.85.
As a tinal example using
ROM. suppose you needed a
-l096x32 ROM. but had onl y
102-lx8 RO Is available. In thi s
situation. we need bot h to creatc
more words. and wider words. The
approach is straightForward: fi rst.
create a -l096x8 ROM by using 4
ROMs one on top of the other and
by feed ing the lOp two address
lines to a 2x4 decoder 10 select the
appropriate ROM. and then
second. widen the ROM by adding
3 more ROMs 10 each row.
Most of the datapath compo-
nents we introduced in Chapter 4
can be composed into larger ver-
sions of the same type of
component.
,...- -- - - -- ------ ----- - -------------------,
11' ,
-0 add,
-g 1024x8
ROM
ij - - ---1
- j
L _________ ______ _
Figure 5.85 Composi ng a 2048x8 ROM from
1024x8 ROMs.
al0a9a8 aD
0000 0000000
0 0000000001
000000000 10
o 1 1 1 1 1 1 1 1 1 0
o 1 1 1 1 1 1 1 1 1
0000000000
0000000001
0000000010
1
1111
o
add,
1024xB
ROM
en data
add,
1024xB
ROM
en data
Figure 5.80 When composi ng a 2048x8 ROM from
two 1024x8 ROMs. we can use the highest address
bit ( 0 choose among the two ROMs: the remaining
address bits offset into the chosen ROM.
5.9 RTL DESIGN OPTIMIZATIONS AND TRADEOFFS (SEE SECTION 6.5)
Previous sections in thi s chapter described how to perform registertransfer level
de,ign to create processors consisting of a controll er and a datapath. This section,
whi ch phy,icall y appears in the book as Section 6.5. how to create proce
that are beller optimized. or that trade off one feature for another (e. g., size for
performance). One of this book covers such RTL optimi zati ons and tradeoffs
immediately after introducing RTL design. meaning now. Another use introduces
them later.
.. . . _____ _
5.10 RTl DeSi gn using Hardware Description languages (See Section 9.51 279
5.10 RTL DESIGN USING HARDWARE DESCRIPTION LANGUAGES (SEE
SECTION 9.5)
This section. whi ch physicall y appears in the book as Section 9.5, describes use of IfDLs
during RTL design. One use of this book describes such HDL use immediately after
introducing RTL design (meaning now). Another use describes use of HOLs later.
5.11 PRODUCT PROFILE: CELL PHONE
A cell phone, short for cellul ar telephone and also known as a mobile phone. is a portable
wireless telephone that can be used to make phone calls whil e moving aboul a city. CeU
phones have made it possible to communi cate with di stant people nearl y anytime and
anywhere. Before cell phones, most telephones were ti ed 10 physical places like a home
or an office. Some cities supported a radi o-based mobile telephone ystem usi ng a pow-
erful central antenna somewhere in the city. perhaps atop a tall building. Because radio
frequencies are scarce and thus carefull y doled out by governments, such a radio tele-
phone system could only use perhaps tens or a hundred di fferent radio frequencies. and
thus could not support large numbers of users. Those few users therefore paid a very high
fee for the service, limiting such mobile tel ephone use to a few wealthy individuals and to
key government officials. Those users had to be within a certai n radiu of the main
antenna, measured in tens of miles, 10 receive service. and that ervice typicall y didnt
work in another ci ty.
Cells and Basestations
Cell phone popularity exploded in the
I 990s, growing from a few million users
to hundreds of millions of users in that
decade (even though the first cell phone
call was made way back in 1973. by
Martin Cooper of Motorola. the inventor
of the cell phone), and today it i hard
for many people to remember life before
cell phones. The basic techni cal idea
behind cell phones divides a ci ty into
numerous small er regions. known as
cells (hence the term cell phone).
Figure 5.87 shows a city divided into
three cell s. A typical city might actuall y
be divided into dozens. hundreds. or
even thousands of cell s. Each cell has its
own radio antenna and equipment in the
city
basestation
antenna
: ..c:: ..'
\ .."
''-____ tollrom
regular
-
'-------' phone
system
Figure 5.87 Ph nc 1 in cell can use th same
radio frequency as phone _ in cell C. in reasing
the number of po sible mobile phone u!!ocrs in 3
city.
center. known as a basestatioll . Each basestati on can u ' e dozens or hundreds of different
mdio frequencies. Each basestati on antenna only needs to transmit radio signal> po\\erful
enough to reach the ba, estations cell area. Thu . nonadjacent cell. can a ll"lSc' the
same frequenci es. so the number of radio frequ'ncies ullo\\ro for mob,l phone -
280 Register-Transfer l evel (RTl) Design
can bc thus shared by more than one phone at onc time. Hence. far more users can be
supported. lead ing to reduced costs per user. Figure 5.87 illustrates that phone! in cell A
can usc the same radio frequency as plwI/e2 in cell C. because the radi o signals from cell
A don't reach cell C. Support ing more users means greatl y reduced cost per user, and more
basestal ions means service in more areas than just major citi es.
Figure 5.88(a) shows a typical
basestntion ant enna. The basestation's
equiplllclll Jllay be in a small building
or commonly in a small box near the
base of the ant enna. The antenna
shown actuall y suppons antennas
from tWO di fferent cellul ar servi ce
providers-one set on the top. one set
just under. on the same pole. Land for
the poles is expensive. whi ch is why
providers share. or sometimes find
existing tall Slnlctures on whi ch to
mount the antennas. l ike buil di ngs.
park light posts. and oLher interesting
places (e.g .. Figure 5.8 (b)). Some
providers try to disgui se thei r antennas
to make Lhem more soot hing to the
eye. as in Figure 5.88(c)-the entire (a)
Lree in the picture is artifi cial.
All the basesLati ons of a service Figure 5.88 Basestations found in vari ous locat ions.
provi der conneCL to a central switching
office of a ci ty. The switching office not onl y links the cell ul ar phone system LO the regular
"Iandline" phone sysLem, bUL also assign phone call s LO specific radio frequencies, and
handles SwiLching among cell s of a phone moving beLween cell s.
How Cellular Phone Calls Work
Suppose you are holding phol/e l in cell A of Fi gure 5.87. When you turn on the cell phone,
Lhe phone li stens for a signal from a basestati on on a comrol frequency, whi ch i s a special
radio frequency used for communi caLing commands (raLher than voice data) between the
basestation and cell phone. I f the phone finds no such signal , the phone reports a "No Ser-
vice" error. I f the phone finds the signal f rom basestati on A. Lhe phone Lhen Lransmits its
own identifi cation (10) number to base taLion A. Every cell phone has its own unique lD
number. (Actuall y, Lhere is a nonvolatil e memory card inside each phone Lhat has Lhat lD
number-a phone user can potenti all y witch cards among phones. or have multiple cards
for the phone. switching cards LO change phone numbers.) Basestation A communi-
cates Lhis ID number to the cemral swi tching office's computer, and Lhus the service
provider compuLer database now record Lhat your phone is in cell A. Your phone intermit-
Lently sends a comrol remind the swi tching omce of the phone's presence.
I f '>omebody Lhen call s your cell phone's number. the call may come in over the regular
phone sY'tem. which goes to the switching office. The ,witching omce computer database
5.11 Product Profil e: Cell Phone 281
indi cates that your phone is in cell A. In one Lype of cell phone Lechnology, the swi Lching
office computer assigns a specific radio frequency supported by basesLaLion A LO the call.
Actuall y, the computer assigns two frequencies, one for tal king, one for Ii teni ng_ so that
talking and li stening can OCCur simulLaneously on a cell phone-Iet's call that frequency
pair a channel. The computer then tell s your phone to carry OUL the cal l over the assigned
channel , and your phone rings. Of course, iL could happen Lhat Lhere are so many phones
already involved wiLh call s in cell A Lhat basestaLion A has no available frequencies-in thaL
case. the caller may hear a message indi cating Lhat user is unavai lable.
Placing a call proceeds simil arl y, but your cell phone initiate the call , ulLimatel y
resulting in assigned radio frequencies again (or a "system busy" message if no frequen-
cies are presently avai l able).
Suppose that your phone i s presentl y carrying OUI a call with base LaLion A, and thai
you are moving through cell A toward cell B in Fi gure 5.87. BasesLation A wi ll see your
si gnal weakening. whil e basestation B will ee your si gnal strengLhening_ and the two
basestaLions transmit thi s informati on LO the switching office. AL some point the
switching office computer will decide to switch your call from base Lation A LO basesta-
tion B. The computer assigns a new channel for the call in cell B (remember. adjacent
cell s use different sets of frequenci es to avoid interference)_ and sends your phone a
command (through base sLat ion A, of course) to switch to a new channel. Your phone
swi tches to the new channel and thus begi ns communicaLing wiLh basestaLion B. Such
swi tching may occur dozens of Limes whil e a car dri ves Lhrough a city dwing a phone
call , and is transparent to the phone user. SomeLimes the swiLching fails. perhaps if the
new cell has no available frequenci es. resulLing in a "dropped" call.
Inside a Cell Phone
Basic Components
A cell phone requires sophi sticated di gital circuiLry LO carry OUL call . Figure 5.89 how
Lhe insides of a typi cal basic cell phone. The printed-ci rcuit boards include evera! chip
implemenLing di giLal circuits. One of Lhose ci rcuit s performs analog-Lo-digital conversion
(a) (b) (e)
Figure 5.89 Inside a cell phone: (a) handset. (b) battery and ID card on left. pad JJld in
ccnler. digi tal ircuilry on n printed-circui t board on right , tc) the two side-s of the prinloo<u'Cuit
board. showing severnl digitnl chip package$ mounted on the bo:.ml.
282
F
Register-Transfer LevellRTLI Design
of a voice (or olher sound) 10 a signal Slream of Os and 1s, and anolher performs di gital-
lo-anal oll conversion of a received digital stream back (0 an analog signal. Some of the
circui ls. -lypicall y soft ware on a microprocessor. exeCUle lasks lhal manage lhe various
fealures of lhe phone. such as lhe menu syslem. address book. games, eiC. NOle that any
daw Ihal you save on your cell phone (e.g" an address book. cuslomi zed ring lones, game
high score information. elc.) will likely be slOred on a fl ash memory, whose nonvolalilily
lhe dat a Slays saved in memory even if Ihe ball ery di es or is removed. Anolher
imponanl lask involves responding 10 commands from lhe Swilching office. Anolher task
carried oul by lhe di gil al circuil s is fi ltering. One lype of filt ering removes the canier
radio signal from lhe incoming radi o frequency. Anolher lype of fillering removes noise
from lhe digili zed audi o Slream coming from lhe microphone, before transmitting lhal
stream on the outgoing radi o frequency. Let' examine fi ltering in more delail.
Filtering, and FIR Filters
Filtering is perhaps lhe moSI common task performed in di gi lal signal processing. Digilal
signal processing operales on a slream of digi lal dal a lhal comes from digitizing an inpul
si!:mal. such as an audio. video, or radio signal. Such streams of data are found in count-
le;s electronic devices. such as CD players. cell phones. hean monilors, ultrasound
machines, radios. engine conlrollers. eiC. Filterillg a dala slream is the lask of removing
panicular aspecls of lhe inpul signal , and OUl pulling a new signal wil hout lhose aspecls.
A common fi llering goal is 10 remove noise from a signal. You' ve cenainly heard
noise in audio signal s-ii 's thal hi ssi ng sound lhal 's so annoying on your slereo, cell
phone. or cordless phone. You ' ve also likely adjusled a fi ll er 10 reduce lhal noi se, when
you adjusled the "lrebl e" conlrol of your Slereo (lhough lhat fil ler may have been imple-
mented using analog mel hods ralher lhan di gil al). Noise can appear in any type of signal,
nOI j usl audio. oise mi ghl come from an imperfecl lransmilling device, an imperfecllis-
lening device (e.g., a cheap microphone), background noise (e.g., freeway sounds coming
inl o your cell phone). eleclrical inlerference from other eleclric appli ances, etc. Noi se
lypi call y appears in a signal as random j umps from a smoolh signal.
Anolher common filtering goal is 10 remove a carrier frequency from a signal. A
carrier frequency is a signal added lO a main signal for the purpose of lransmitting thai
main signal. For example. a radio slat ion mighl broadcasl a radio signal al 102.7 MHz.
102.7 MHz is lhe carri er freq uency. The carrier signal may be a sine wave of a panicular
freq uency (e.g" 102.7 MHz) lhal is added 10 lhe main signal, where lhe main signal is the
music signal ilself. A receiving device locks on 10 the carrier freq uency, and then
oul the carri er signal, leavi ng the main signal.
An FIR filler (usuall y pronounced by saying lhe lellers " P' "I" "R"), shon for "Finite
Impulse Response," is a very general filler design that can be used for a huge varielyof
fillering goals. The basic idea of an FIR fi lter is very simpl e: multiply the present inpul
va lue by a constan!. and add that re ul! 10 the previous inpul value limes a conslant , and add
thai result 10 lhe nexl earli er inpul value limes a con lant. and so on. A designer u ing an
FIR filter achi eves a parti cular filtering goal simply by choosillg Ihe F1Rfiller 's COll slalllS.
Malhematicall y. an FIR fi lter can be described as foll ow:
Y( I ) = cOx.r(t) + (' I xX(I - I ) + c2xx( I -2) + c3X . I -J) + c4 xx(I-4) + ".
I i, the pre\enl lime slep. x is lhe inpul signal. and y i, lhe OUlput signal. Each lenn
(c.g., CO*X(I)) is call ed a lap . So the above equation represenls a 51ap FIR filter.
5.11 Product Profile: Cell Phone 283
FIR some exampl es of lhe versalilil y of an FIR fi lter. Assume we have a 5.tap
I d
ter. or slaners, 10 Simpl y pass a signal lhrough lhe filter unchanged, we sel cO 10
,an we el cl=c2- c3- c4-0 "'0 I' f . .
I h - - -. " amp I y an IOpUI SIgnal, we can sel cO 10 a number
arger t an I, perhaps sell ing cO 10 2. To creale a moothino fil ler thai OUlputs the averaoe
of the present val ue and lh " . "
. I e pasl our IOpUI values. we can SImply sel all the conSlants 10
equl va enl lhat add 10 I, namel y, c!=c2=c3=c4=c5=0.2. The results of uch a filter
applied 10 a nOI sy IOpul Signal are shown in Figure 5.90. To smoolh and amplify. we can
sel all conSlalll S 10 equi val I h .
c!=c2=c3=c4= ' _ enl va ues I at add W omethlOg grealer than I. for example,
. c5-1, resultlOg 10 5x ampllfi call on. To creale a smoothing filter thai onl y
IOciudes lhe previous lWO rather lhan four inpul values, we simply sel c3 and c4 10 O. We
see that we can build alilhe above different fill ers j usl by changing the conSlanl values of
an FIR fi lter. The FIR fi ller is indeed quile versatil e.
1.5
1
----------------------,
____ original
---...- noisy
-+- fir_av9-out
Ilil
- 1.5'------_______________
Figure 5.90 5tap FIR fill er wilh cO=<:I=c2=<:3=c4=0.2 applied 10 a nois)' signal. The
ongmaJ signal IS a slOe wave. The noi sy signal has randomjump _ The RR output i
m.uch than noisy sig.nal. approaching the original signal. Olice that the FLR output i
sll ghlly shIned 10 Ihe nghl. meaning Ihe OUlPUI is sli ghtl y delayed in time (probably a ri ny fra rian
of a second delayed). Such slighl shifling is usual ly nOI imponanl 10 n particular application.
. Thai versalilily eXlends even further. We can actually filter OUI a carrier frequen y
uSlOg an FIR filter, by selllllg lhe coeffi ciellls 10 different value. carefully chosen 10 filter
OUI a pani cul ar frequency. Figure 5.91 shows a main signal. ill I . thai we \\ am 10 transmit
We can add that to a carri er signal , ill 2, 10 oblain the composile ignal. ill _lotal. The
SIgnal III_lOra/IS lhe SIgnal lhal woul d be the signal lhal i transmi ned by a radio lation.
for exampl e. wi lh illl being lhe signal of the mll sic. and ill2 lhe carrier
Now ay a lereo receiver receives that composile signal. and needs 10 filter OUI the
carrier signal, so the music signal can be sent 10 the slereo peakers. To delermine h \\ I
fill er OUI lhe carrier signal. look carefull y at the am pies (the small tilled squares in
Figure 5.9 1) of that carri er signal. Olice lhal lhe sampling rale i' such that if \\e lake :10'
sample. and add il 10 a sample from three time lep back. \\ e !!el O. That's be,:au,e f '"
po ili ve poil1l. lhree sampl es earlier wa a negative poinl of the same magnitude. For a
negalive poil1l. lhree samples earlier was a positive point of lhe same magnitude. nd for
a zero poin!. lhree samples earlier was also a zero poin!. Like\\ ise. adding a "artier .ignal
Register-Transfer Level (RTL) Design
2.5
1.5
[J
l i n
0,5
o
-0,5
Il..r-ll L\
1If 'r
1
.l!
-1 .5
-2
-2.5
H M
.. u
-+- in1
-
___ in 2 -
\ f"+
____ in_total -
.Jr\ R rI rI
M rI 11
\ fN....1 'J. Jtj \ JT'U.
\"1 'i r,
u ffll Joi
\ }
V. l.-'
Figure 5.91 Adding 3 main signal. iI/I. (0 a carri er signal. i1l2. resulting in a composi te signal
ill_fOfa!.
sample to a sa mple three steps later also adds to zero. So to filt er out the carri er signal , we
can add each sa mpl e to a sample three time steps back. Or we can add each sample to
112 times a sample three steps back. plus 112 times a sampl e three steps ahead. We can
achieve this using a 7-tap FIR fi lter wi th the foll owing seven coeffi c ient s: 0.5. 0, 0, 1, 0,
0. 0.5. Since that sums to 2. we can scale the coefficients to add to I, as follows: 0.25, 0,
0.0.5. O. O. 0.25. Applying such a 7-tap FIR fi lter to the composite signa l results in the
FIR output shown in Figure 5.92. The main signal is restored. We should point out that
we chose the mai n signal such that thi s example would come out very ni cely--{)ther
signals might nO! be restored so perfect ly. But the exa mpl e demonstrates the basic idea,
2.5r------------------- - ----,
2f---------,--------------- ---- in_total
__ fir_out
-1.51--------------=---\:+--I-+---\+_ ...... ---j
-2.5L---___________________ ---"
Figure 5.92 Filtering out the carrier signal using a 7-tap FIR filte r wi th constants 0.25, 0, 0, 0.5, O.
0.0.25. The sli ght delay in the output signal typicall y poses no problem,
While 5-tap and 7-tap FIR fi lters can cenai nl y be found in practi ce, many FIR filters may
contai n tens or hundreds of taps. FIR fi lters can cenai nl y be implement ed using software (and
often are). but many applications require that the hundreds of llluitipli cations and additions
for every sample be executed faster than is possible in soft ware, leading to custom di gital
ci rcui t impl ementations. Exampl e 5.8 ill ustrated the des ign of a c ircuit for an FfR filter.
Many types of filte rs exist other than FIR fi lt er;. Di git al signal fi lt ering is pan of a
larger field known as digital signal process ing, or DSP. DS P has" ri ch mathematical
foundation and is a field of study in itsel f. Advanced fi lt eri ng methods are what make cell
phone conver>ations as c lear as they are today.
- - _. - - ._-----------
5.12 Chapter Summary
285
5.12 CHAPTER SUMMARY
In this chapter, we described (Secti on 5. 1) that much digi tal design today involves designing
processor-level components, and that design is done at what is called the register-transfer
level (RTL). We Introduced (Secti on 5.2) a four-step RTL design method for convening
RTL behaVior to a processor implementation, wi th that implementat ion consisting of a data-
path controll ed by a Controll er. The RTL design method made use of the datapath
components defined In Chapter 4, and the controller design proce s defined in Chapter 3,
whi ch buil t on the combinat ional design process of Chapter 2. We provided several exam-
ples .of RTL design (Secti on 5.3), whil e poi nting out several pitfall and good design
praCll ces, and dl SCllSSlng the characteri sti cs of control- versus data-dominated designs. We
discussed (Secnon 5.4) how to set a circuit 's cl ock frequency based on the circuit's critical
path. We demonstrated (Secti on 5.5) how a sequent ial program. like a C program. could
conceptuall y be convened to gates using some straightforward transformati ons that trans-
form the C 11110 RTL behavior, which as we know can then be converted to gates using the
four-step RTL deSign method. That demonstration shoul d make it clear that a di"ital
system's functi onality can be impl emented as either software on a microprocessor or a
custom di gital circuit (or even as both). The differences among software and custom circuit
implementati ons are not related to what each can implement-they can both implement any
functionalit y. The differences are instead related to design metrics like system performance.
power consumpti on, size, cost, design time, and so on. Modem digi tal designers must there-
fore be comfonabl e migrating functionality between software on a microprocessor and
custom digital circuits, in order to obtai n the best overall impl ementation with respect to
constraints on design metri es. We introduced (Secti on 5.6) several memory components
commonl y used in RTL design, including RAM and ROM components. We also introduced
(Secti on 5.7) a queue component that can be useful during RTL design. We took a moment
to di scuss (Secti on 5. 8) a general technique that we've been using throughout the book.
hierarchy, whi ch helps a designer to manage complexiry.
In Chapters I through 5, we have emphasized straightforward design methods for
increas ingly complex systems, but we have not emphasized how to de ign those sy terns
well. Improving on Our designs will be the focus of the next chapter.
5. 13 EXERCISES
Any problems noted with an asteri k (*) represent especially chal lenging problems.
SECTION 5,2: RTL DESIGN METHOD
5. 1
5,2
PLUS
(a) Create a high-level Slate machine that describes the following system beha\-jor. The '} tem
h'15 an 8-bi l input A. a single-bit input d. and a 32-bit ompUI S. On every clock C) Ie. if
d= 1. the system shoul d add A 10 a running sum and output thut sum on S. If d=O, the
system shoul d instead subtract. Ignore issues of overflow and underllo\\ , Oon'l forgel to
include an initi ali zation state. Him: Declare and use an internal register (0 keep the sum.
(b) Add u I-bit input rs t to the system. When r s t = 1. the system hould dear its sum back to O.
Create a hi gh-level state machine for a simple data encryption/decryption dc\'i c. If:1 bit-input
b is 1. the device stores the data from 3 J2-bi t input I as \\ hat is kno\\T1 as an off \"3lue. Lf
b is 0 and another bit -input e is 1. then the devkt! "en [,)plS" its input I adding the stored
olTsct val ue to 1. and OUlput$ this encrypted "'title o\er 3 out Ul J. If ifure':.1d anothi'r
286 Register Transfer LevelIRTL) Design
r---.
PLUS
".-....
PLUS
bit-input d i'\ 1. the device should "decrypt" the data on r by subtr<lct ing the offset value
before outputting the decrypted value over J. Be sure to explicitl y IWlldle nil possible cambi-
of the three input bits.
5.3 Crca.tc a hi2h-l evc l stale machine for n digital bath-water conl roll er. The syste m has ::J. 3-bil
input ra t i-O indicating the desired ratio of cold water to hal water. and a bit input on indi-
cating that (he water should flow. The system has two 4-bit outputs hfl ow and efl ow,
the hal water now rJte and the cold water fl ow rale. The sum of these two rates
should equnl 16. Your hi gh-level slate machine should dClcnnine the output values for
h f 1 01,01 and c flow such that the r3tio or hot water to cold w;lter is as close as possi ble to the
desired rrt ti o. whil e the total now is always 16. Him: As there are only 8 possi ble rat ios, a rea
sonablc solution may use one statc ror each ratio.
5A Create a high-l eve l Slllt e machine that initializes a 16x32 register fi le's contents to all Os,
beginning the initi al iz ..llion when an input rs t is 1.
5.5 (a) Create a high-level state machine that adds each register or one 128x8 register file to the
corresponding registers or another 128x8 register file. storing the results in a third 128x8
register file. The system should onl y begin the addit ion when a bit-input add is 1. and
should not perrOnll the addition again until it has finished adding (onl y adding again if
add is I).
(b) Extend this system to ei ther add or subt ract. using an additional bit-i nput OPt where
op = I means add. and op = 0 means subtrac!.
5.6 Design a hi gh-level state machine ror a 4-bit up-counter with count control input cnt. count
clear input C 1 r . and a terminal count output tc. Use the RTL design met hod of Table 5.1 to
cOI1\'en the high-level state machine to a controll er and :l dntapath. Use a register and incre
mcntcr in the d:lIapath. not :l count er itself. Design the controll er down to a state register and
logic gates.
5.7 Compare the up-counter designed in Exercise 5.6 with the up-counter design shown in Figure
4.48.
5.8 Creme a datapath for the
hig h-level state machine in
Figure 5.93.
Inpuls: A, S, C (16 bils) ; go, rsllbit)
Outputs: S (16 bits)
Local registers: sum
5.9 Slaning with the soda
machine di. penser design
described in Example 5. 1,
create a block diagram and
highlevel state machine for
a soda machine dispenser
that has a choice of t \vo soda
types. and that also provides
sum<5096
0-
sum:
sum+C
Isum<5096)'
change to the consumer. A Figure 5.93 Sample hi gh-level state machine.
coin detector provides the
circuit wi th a I-bit input c that becomes 1 for one clock cycle when a coin is detected, and an
8-bit input a indicating the coin's value in cents. Two 8bit input s s I and s2 indicate the coS!
of the two soda choices. The user s soda selecti on i controlled by two bUllons b I and b2 that
when pushed will output I for one clock cycle. If the user has inserted enough change for their
<election. the ci rcuit set either output bit dl or d2 to I for one clock cycle. causing the
,elected soda to be di spensed. The soda di spenser circuit should also set an output bit cr to I
for one clock cycle if change is required. and should output the amount of change requi""
using an 8 bit output ca. Use the RTL design method ,hown in Table 5.1 to convert the high
level ' tate machine to a controller and a dataputh. Design the datapath to ,tructure. but design
the controller to the point of an FSM only. as wa, done in Fi gure 5.26.
_ . o' i . . .. ., _ _ ____ _
5.10 (a) Use the RTL design method of
Table 5. 1 to conVert the hi"h.
level stale l1l:lchine in
5.94 to a COntroller and a data-
path. Design the datapath to
Slmcturc. bUI design the con-
troll er to an FSM only, as was
done in Figure 5.26.
(b) "Design the COntroll er s FSM
down to structure.
5.13 Exercises 287
Inputs: slart(bil) , datal8 bilS), addr(8 bits), W wail(M)
Outputs: w_dalalB bits). w_addrlB bits), w_.wlbil)
w_wr::1
w_addr=addr
Figure 5.94 Hi gh-level stJte machine of bus
interface with bus wait signal.
5.1/ Create an FSM that interfaces
with the datapath in Figure 5.95.
The FSM should use the datapath
to compute the average value of
the 16 32-bit elements of any
A;:rra
y
A is stored in a memory. with the first element at address"5 the second at
a ress - . ,md so On .Assume that putting a new value onto the address line-s M addr causes
mcmf ory to almosl lI11mediarcly Output the read data on the M_data lines. leno-re the po i.
I lIy 0 overflow. -
average
Figure 5.95 Datapmh for computing the :lverage of 16 elements of an arm) .
5.12 Using the RTL design method shown in Table 5. 1. create an RTL desien of 3 reaction timer
circuit that measures the time elapsed between the illumin3lion of a ligh; and pressing of a
button by ;1 user. The reaction timer has three inputs. a clock inpUi elk. 3 fCSet input rsl. and :1
bUllon input B. and three OlHpUIS.:1 light enable output lell. a IO-bi t rea tion time output nime.
and a slol1' Output the lIser was not f:lst enough. The reaction timer \\ orks 3..\ fol-
lows. On reset. the reacti on timer waits for 10 seconds before iIIuminatine the lieh! b\ scltine
lell to I. The reaction timer then measures the len!.!.lh of Lime in ' the
presses the button B. outputt ing the time as n I_-bit binan number on mme. If me user did
not press the button within 1 seconds (:2CXXl milJi sc the reaction timer \\ ill set the-
output slow 10 I and output 2O<XJ on rrimt'. ssume) our clock input ha$ :1 of I kHz.
Him: This is " cont rol-dorni nnted RTL design problem. Dc,ign the dat3p;!th to structure. but
design the controll er to un F l\ t only. as W3,'\ done in Figure 5._6.
288 Register-Transfer Level (RTL) Design
5.13 Usc the RTL design method shown in Table 5. 1 to convert the hi gh-level stal e machi ne in
Figure 5.74 to a controller and a datapath. Design the dawpalh 10 structure. but design the con-
troller 10 the point of an FSM only. as was done in Figure 5.26.
SECTION 5.3: RTL DESIGN EXAMPLES AND ISS ES
For the following problems. design the datapat h to structure. bUI design the cont roll er to an FSM
only. as done in Figure 5.26.
Usi ng the RTL design method shown in Table 5. 1. create an RTL dc!\ ign thai computes the
sum of all positi ve numbers within a 512-word register Hie A consisting of 32-biL numbers
stored in IWO'S compl ement form.
5.15 Using the RTL design method shown in Table 5. 1. create an RTL design that computes the
sum of all positive numbers from a set of 16 separate 32-bit registers storing numbers in two's
complement form. Make the design as fast as possible by performing as many computations
concurrent ly as possible. Him: Thi s is a data-dominated design.
5.16 Using the RTL design method shown in Table 5.1. create an RTL design that outputs the
maximum value found wit hin a regi ster fi le A consisting of 64 32-bit numbers.
5.17 Using the RTL design method shown in Table 5.1. creme an RTL design that outputs a
warni ng signal whenever the average temperature over the past four samples exceeds a user-
defined value. The circuit has a 32-bit input CT indicating the current temperature reading, a
32-bi t input \VT indicating the user-specified temperature at which the warni ng should be
enabled. and a button input eI,. that will disable the warning. When the average temperature
exceeds the user-specified warning level. the ci rcuit should assert the output W to enable the
warning. The warning output should remain high unti l the elr button is pressed. Him: You can
use a right shift to implement the divide within your datapath.
5.18 Using the RTL design method shown in Table 5. 1, create an RTL design for a di gital filter that
outputs the average of the current 32-bit input and the previous 32-bit sample. Him: You can
usc a ri ght shift to implemcnt the divide within your datapath.
SECTION 5.4: DETERMINING CLOCK FREQUENCY
5.19 Assuming an inverter has a delay of I ns. all other gates have" delay of 2 ns. and wires have
a delay of I ns. determine the cri ti cal path for the full-adder circuit shown in Figure 4.3 I.
5.20 Assuming an invener has a delay of I ns. all other gates have a delay of 2 ns, and wires ha\'e
a delay of Ins. detennine the crit ical path for the 3x8 decoder of Fi gure 2.50.
5.21 Assuming an inverter has a delay of I ns. all other gates have a delay of 2 ns. and wires have
a delay of Ins. detennine the cri ti cal path for a 4x I multiplexer.
5.22 Assuming an inverter has a delay of I ns. and all other gates have a delay of 2 ns. detennine
the cri ti ca l path for an 8-bit carry-ripple adder:
(a) assuming wires have no delay.
(b) assumi ng wires have a delay of Ins.
5.23 (a) Convert the laser-based dis tance measurers FSM. shown ill Figure 5.21, to a state register
and logic.
(b) Assuming all gates have a del ay of 2 ns and the 16-bit up-counter has a delay of 5 ns. and
wires have no delay, determine the critical path for the laser-bascd distance measurer.
(c) Calculate the corresponding maximum clock frequency for the circuit.
SECTION 5.5: BEI-IA VIORAL-LEVEL DESIG : C TO GATES (O(yrIO AL)
5.24 Convert the following C-like code. which calculates the greate,t C0l111110n divisor (GCD) of
the two 8-bit a and b. into a hi gh-level state machine.
--. # - - - - - -----
j
Inputs : byte a . byte b. bit go
Outputs : byte ged . bi t done
GCD:
whi le(])
whi le( !go ) :
done: 0 ;
While ( !: b )
b ) I if(
- b;
el se (
b b - a :
ged = a:
done : 1:
5.13 Exercises 289
5.25 Use the RTL design method shown in Table 5.1 to convert the high-level state machine you
in Exercise 5.24 to a controll er and a datapath. Design the dalap:llh to structure. but
deSIgn the COntroll er to the point of an FSM only.
5.26 Conven C-like code, which calculates the maximum difference between any two
numbers wlthm an array A consisting of 256 8-bi t values. into a high-level Slate machine.
Input s : byte a(256). bit go
Outputs : byte max_di ff. bi t done
MAX_D I FF:
whi I e(]) (
while( !go);
done: 0:
i = 0:
max : 0:
min - 255 : II largest 8-bit va lue
while( i < 256 ) (
if( ali] < min) I
min = ali]:
if( ali] > max) (
max - ali]:
- i + 1:
max_ diff - max - min:
done - ]:
290 Reg ister Transfer Level (RTL) Design
5.27 Use the RTL design method shown in Tabl e 5. 1 to convert the hi gh-level Siale machine you
in Exercise 5.26 to il controll er and a datilpillh. Design the dawpa(h to structure, but
design the controller to the poi nt of an FSM onl y.
5.28 Convert the foll owing C-likc code. which calcul ates the number of limes lhe value b is found
within an array A consist ing of 256 8-bi t values. into a hi gh-level stat e machi ne.
Inputs : byte a[256] . byte b . bit go
Outputs : byte freq . bi t done
FREOUENCY :
"hi 1 e( 1) (
while( !go) :
done = 0 :
i = 0 :
freq = 0 :
while ( i < 256 ) (
i f ( a [i] == b ) (
freq = freq + 1:
done l '
5.29 Use the RTL design method shown in Table 5. 1 10 convert the hi gh- level st ate machine you
created in Exercise 5.28 to a controll er and a datapath. Design the data path to structure, bUI
design the cont roller to the point of an FSM onl y.
5.30 Develop a template for converting a dol )while loop of the foll owing form to a highlevel
state machine.
do (
II do while statements
) while (cond) :
5.31 ' Convert the while ( a ! = b ) loop within the C code description of Exercise 24 into a
doe )",hile loop as described in Exerc ise 5.30. Using the doe Jwhile loop templ ate
you created in Exercise 5.30. convert the revised C code into a high-Icvel statC machine. Use
the RTL design method shown in Table 5. 1 to convert the hi gh level state machine you created
in the previous problem to a conlroller and a datapath. Design the datapruh to structure, but
design the controll er to the point of an FSM onl y.
5.32 Develop a template for converting a for () loop of the foll owing form to a hi gh level state
machine.
for(i=start : i<cond : i++)
1/ for s ta ements
5.33 ' Convert the "'hile ( a ! = b ) loop within the C code descript ion of Exercise 5.24 to a
f or ( ) loop as described in Exerci se 5.32. , ing the for () loop template you created in
- - - . - - ._----------
j
5.13 Exercises 291
Exercise 5.32. convert the revised C code into a high level state machine. Use the RTL design
method shown in Tabl e 5. J to convert the hi gh-level Si ale machine you created in Lhe previous
probl em to a controll er and a datapmh. Design the dalapalh (0 structure, but design the con-
troll er to the poi nt of an FSM onl y ..
5.34 * Convert the while ( i < 256 ) loop within the C code description of Exercise 5.26 to
a for () loop as described in Exerci se 5.32. Using the for () loop template you created in
Exercise 5.32, convert the revi sed C-like code into a hi gh-level state machine. Use the RTL
design method shown in Tabl e 5. 1 to convert the hi gh-level stale machine you created in the
previous probl em to a controll er and a datapath. Design the data path to structure. but design
the controll er to the point of an FSM onl y.
5.35 Compare the time required to execute the foll owing computation using a custom circuit versu
using soft ware. Assume a gate has a delay of I ns. Assume a microprocessor executes one
instrucLi on every 5 ns. Assume that n:::: I 0 and 01::::5. Estimates are acceptable: you need not
design the circuit, or determine exactly how many software instructi ons will execute.
for (i = 0 : i<n . i++) (
s = 0 :
for (j 0 : j < m. j++)
+ c[i]*x[i + j] :
y[ i] s :
SECTION 5.6: MEMORY COMPONENTS
5.36 Calcul ate the approximate number of DRAM bit storage cell s that wi ll fit on an IC with a
capaci ty of 10 milli on transistors.
5.37 Calculate the approx imate number of SRAM bit storage cell s that will fit on an IC with a
capaci ty of 10 million transistors.
5.38 Summari ze the main differences between DRAM and SRAM memori es.
5.39 Draw a complete logic internal Slructure for :l 4:<2 DRAM (four words. 2 bilS each). clearly
labeling all internal components and connecl.i ons.
5.40 Draw a compl ete logic internal structure for a 4x2 SRAM (four words. _ bits each). dead)
labeling all internal components and connections.
SA l * Design an SRAM memory cell with a reset inpUi that when enabled \\ ill set the !TIernoI')
cell' s contents to O.
SECTION: READ-ONLY MEMORY (ROM)
5.42 Summarize the main differences between EPROM and EEPROM memories.
5.43 SUl11marize the main differences between EEPROM and Hash memories.
SECTION 5.7: QUEUES (FLFOS)
5.-'4 For an 8-word queue. show the queue's intemal state and provide the value of popped datu for
the foll owing sequences of pushes and pops: (I) push A. B. C. D. E. (2) pop. (3) pop. H) push
U, V. W. X. Y. (5) pop. (6) push Z. (7) pop. (8) pop. (9) pop.
5..15 Create nn FSM describing the queue cont roller of Figure 5.7 . careful :JHeution t, I. )r-
rcctl y sell ing the full and empty OUlputS.
292 Register-Transfer l evellRTl J Design
5A6 Create an FSM describi ng the queue controll er of Fi gure 5.78. bIll wilh error-preventing
behavior lhal ignores ;1I1Y pushes when the queue is full. and ignores pops of an empty queue
(outpuuing 0).
SECTION 5.8: HI ERARCHY-A KEY DESIG ' CO ' CEPT
SA7 Compose a 20- inpul AND gale from 2- inpul AND gales.
SAS Compose a 16x I IllUX from 2x I l1l uxes.
5A9 Compose ::I -tx 16 decoder wit h enable from 2x4 decoders with enable.
5.50 Compose a 1024x8 RAM using onl y 5 12x8 RAMs.
5.51 Compose a 512x8 RAM using onl y 512x4 RAMs.
5.52 Compose a 1024x8 ROM usi ng onl y 512x4 ROMs.
5.53 Compose a 2048x8 ROM using onl y 256x8 ROMs.
5.54 Compose a I 024x 16 RAM using only 512x8 RAMs.
5.55 Compose a 1024xl2 RAM using 512x8 and 5 12x4 RAMs.
5.56 Compose a MOx 12 RAM using only 128x4 RAMs.
5.57 *Writc a program that takes a parameter ,and 3utommicall y builds an N-i npul AND gate
from 2-inpul AND gotes. Your program merely need indi cate how many 2-inpul AND gales
exist in each level. from whi ch we could easily detenninc the connections.
-- - - - ------------
Chi -Kai staned coll ege as
an engineering major, and
became a Computer
Science maj or due to his
developing interests in
algorithms and in net-
works. Aft er graduating.
he worked for a Sili con
Valley stanup company
that made chips for com-
puter networking. His first
task was to help simulate those chips before the chips were
buill. For over 10 years now, he has worked on multi ple
generati ons of networking devices that buffer, schedule,
:md switch ATM network cells and Internet Protocol
packets. "The chips required to implement networking
devices are complex components that must all work
together almost perfectl y to provide the bui lding blocks of
tel ecommunicati on and data networks. Each generati on of
devices becomes successively more complex."
When asked what skill s are necessary for hi s job. Chi -
Kai says "More and more. breadth of one's skill set
matt ers more than depth. Bei ng an effective chip engineer
requires the ability to understand chip architecture (the bi g
picture), to design logic, to verify logic. and to bring up
the silicon in the lab. All these pans of the design cycle
interpl ay more and marc. To be trul y effecti ve :1I one
part icul ar area requires hands-on knowledge of the others
as well . Also, each requires very different skills. For
exampl e. verification requires good software programming
abil it y, while bring up requires knowing how to use a logic
analyzer-good hardware ski ll s:'
5.13 Exercises 293
Hi gh-end chips. like those involved in networking, are
quite costly. and requi re careful design. "The software
design process and the chip design process are
fundamemall y different. Software can afford to have bugs
because patches can be applied. Silicon is a different
story. The one time expenses to spin a chip are on the
order of $500.000. If there is a show-stopping bug. you
may need to spend another $500,000. This constraint
means the verification approach taken is quite different-
effecti vely: there can be no bugs." At the same time, these
chips must be designed quickly to beat competitors to the
market. making the j ob "extremely challenging and
exciti ng:'
One of the biggest surpri ses Chi-Kai encountered in his
job is the "incredible imponance of good communication
ski lls: ' Chi-Kai has worked in teams ranging from 10
people to 30 peopl e, and some chips require teams of over
100 people. "Techni calJ y outstanding engineers are
useless unless they know how to collaborate with others
and di ssemi nate their knowledge. Chips are only getting
more complex-individual blocks of code in a given chip
have the same complexity as an entire chi p only a few
years ago. To architect, design. and implement logic in
hardware requires the ability to convey complexity."
Funhermore. Chi -Kai points out that 'just like any social
entity, there are politics involved. For example, people are
worried about aspiration for promotion. financial gain.
and job securi ty. In thi greater context. the team still
must work together to deliver a chip:' So, contrary to the
conceptions many people have of engineers. engineers
must have excellent people skill . in addition 10 strong
technical ski ll s. Engineering is 3 socia] discipline.
294
6
Optimizations and Tradeoffs
6.1 INTRODUCTION
The previous chapters descri bed how to design di gital circui ts using straightforward tech-
niques. Thi s chapter will describe how to design belle,- circuit s. For our purposes, beller
means circuits that are small er. faster. or consume less power. Real world design may
involve additional criteria.
16 transistors
'il' =Dlgate-delays
y - U Fl
'il'::f'\ r
y=-LJ
F1 = wxy + wxy'
(a)
4 transistors
1 gate-delay
W- D F2
x-
F2 = wx
(b)
e Fl
2L
10
"' c
'" :::. 5 eF2
1 2 3 4
delay (gatedelays)
(e)
Figure 6.1 A circuit transformalion that improves both size and delay. (hal is, an optimization:
(a) original eireui !. (b) optimi zed circui t. (c) plot of size and delay of each circui!.
Consider the circuit for the equati on involvi ng Fl shown in Figure 6. I(a) . The
ci rcuit 's si ze. assumil/g tlVO t/'{l l/ sistors per gate iI/ put (a nd ignoring inverters for
simplicity), is 8 * 2 = 16 transistors. The circuit 's delay, which is the longest path
from any input to the output , is two gate-delays. We could algebraicall y transform
the equation into that for F2, shown in Figure 6. I(b) . F2 represent s the same
fun cti on as Fl. but requires onl y four transistors (instead of 16) and has a delay of
onl y one gate-delay (instead of two) . The transformation improved both size and
del ay, as shown in Figure 6. 1 (c). When we perform transformati ons that improve
all crit eri a of interest to us, we have performed an optimizatioll.
Now consider the circuit for a different fu ncti on, implementing the equation for Gl
in Fi gure 6.2(a). The circuit's size (assuming 2 transistors per gate input) is 14
and the ci rcuit', delay is two gate-delays. We could algebraicall y transform the equation
Into that shown for G2 in Figure 6.2(b). whi ch result$ in a circuit having onl y 12 transis-
to". However, the reducti on in transiMors comes at the expense of a longer delay of three
.- -- - - ------------
A tradeoff
improl'es some
criteria at the
expellJe of (Jlher
criteria oj imerest
101lJ. A"
oplimiznl ioll
improl res (II/
criteria of illlereJI
to liS, or improves
.wme of rhoJe
crirerioll'ir/wllr
U'orJell illg ril e
arhers,
6.1 Introduction 295
gate-delays, as shown in Figure 6.2(c). Which circuit is bener. that for Gl or for G2? The
answer depends on whether the size or delay criteri a is more imponant to us. When we
improve one criteria at the expense of another criteria of interest to us. we have per-
formed a tradeoff.
14 transistors
:grgate-delays
Gl
w
y
z
G1 =wx+wy + z
(a)
12 transistors
y G2
z- ___ --l
G2 = w(x+y} + z
(b)
20L:
'' 15 eGl
III eG2
'!?? 10
5
1 2 3 4
delay (gate-delays)
(e)
Figure 6.2 A circui t transformation that improves size bUl worsens de lay. lhal is. a Iradeoff:
<a) origi nal circuit. (b) transformed circuit. (c) plot of size and delay of each circuit.
You likely perform optimi zations and tradeoffs every day. Perhaps you regularl y
commute by car from one cit y to another via a particul ar route. You might be interested in
two cri teria: commute time and safety. Other criteri a. such as scenery along the route.
may not be of interest to you. If you choose a new route that improves both commute
time and safety. you have optimized your commute. If you instead choo e a route that
improves safety at the expense of increased commute time, you have made a tradeoff (and
perhaps a wise one at that).
Figure 6.3 illustrates optimi zations
versus tradeoffs for three different
staning designs, with the criteria of
delay and size, smaller being beller for
each criteria. Obviously, we prefer opti-
mi zations over tradeoffs, since
optimizati ons improve both criteri a (or
at least improve one criteria without
worsening another criteri a, as shown by
the horizontal and verti cal arrows on the
left side of the fi gure). But we can' t
always improve one criteri a without
delay
(a)
detay
(b)
Figure 6.3 (a) Optimizations, versu (b) tradeoffs.
worsening another crit eria. For exampl e, if a car designer wants to improve a car's fuel
efficiency, the designer may have to make the car smaller-a tradeoff among the criteria
of fuel efficiency and comfort .
Some general criteria commonl y of interest to digital sy tem designers include:
Performallce: a measure of executi on time for a computation on the stem.
Size: a measure of the number of transistors, or si lic n area, f a digital system.
POKIer: a measure of the energy consumed per second f a sy' tem,
relating to both the heat generated by the system and t the bane!) encr:,.!) n-
sumed by computati ons.
Dozens of ot her criteria exist.
296 Optimizations and Tradeoffs
Optimi zat ions and tradeoffs can bc made throughout nearly all stages of digital
design. Thi s chapter descri bes some common optimi zati ons and tradeoffs for some
common cri teri a. al various stages of di gital design.
6.2 COMBI NATIONAL LOGIC OPTIMIZATIONS AND TRADEOFFS
In Chapter 2. wc descri bed how to design combinat ional logi c, namely, how to conven
desi red combinational behavi or into a circuit of gales. There are optimi zation and tradeoff
methods we can appl y 10 make those circuits beller.
Two-Level Size Optimization Using Algebraic Methods
/ " rhe 1970s/
1980s. whe"
Transistors were
costly (l'.g .. cenls
each).
minimi:arion
!!1.fiJ.!J1. si:e
m;";",;:O/ion.
which dominated
digllal design.
Today 's cheaper
transistors (e.g ..
O.OOO} ufltseach)
make
optimi:tJrions of
other criteria
equally or more
crilical.
Implementing a Boolean function using onl y two levels of gates-a level of AND gates fol-
lowed by one OR gate-usuall y results in a circuit having minimum delay. Recall from
Chapter 2 that any Boolean equation can be wri llen in sum-of-products form, simply by
"multi plying out " the equat ion- for example, xy ( w+z ) xyw + xy z . Thus, any
Boolean functi on can be implemented using two levels of gates, simply by converting its
equation to sum-of-products fonn and then using AND gates for the products followed by
an OR gate for the sum.
A popul ar optimi zat ion is to minimize the number of transistors of a two-level logic
circuit implementati on of a Boolean functi on. Such optimization is tradi tionally called two-
level logic optimiwtion, or sometimes two-level logic millimiw tioll . We 'll refer to it as
two-level logic size optimization, 10 distingui sh such optimizati on from the increasingly
popular optimizations of performance and power, as well as other possi bl e optimizations.
To optimi ze size, we need a method to determine the number of transistors for a
given circui t. We' ll use a simple method for determining the number of transistors:
We' ll assume every logic gate input requires two transistors. So a 3-input logic
gate (whether an AND, OR, NAJ\fD, or NOR) would require 3 2 = 6 transistors.
The circuits inside logi c gates shown in Secti on 2.4 shoul d clarify why we assume
two transistors per gate input.
We' ll ignore inveners when determini ng the number of transistors, for simplicity.
We can view the probl em of two-level logic size optimi zation algebraically as the
problem of minimizing the number of literals and terms of a Boolean equation that is in
sllm-o!-products form. The reason we can view the problem algebraicall y is because,
recall from Secti on 2.4. we can translate a sum-of-products Boolean equati on direcOy to
a circuit using a level of AND gates foll owed by an OR gate. For exampl e, the equation
F ~ wxy + wxy ' from Fi gure 6. 1 (a) has six literals, w. x, y , W, x, and y' , and two
terms, vlXy and wxy " for a total of 6 + 2 = 8 literals and tenns. Each literal and each
term translates approx imately to a gate input in a circuit, as shown in Figure 6. I (a)-the
IlIera" translate to AND gate inputs, and the terms to OR gate inpuLs. The circuit thus has
3 + 3 + 2 = 8 gate inputs. With two transistors per gate input, the circuit has 8 2 = 16
transistors. We can minimize the number of litera ls and terms algebraically: F - wxy +
vlxy' = wx ( y+y' ) - WX , whi ch ha. only two lit era ls. W ;lIld x, resulting in 2 gate
IOput . or 2 * 2 = 4 transistors. as shown in Figure 6. 1 (b). (Note that a one-term equation
d o ~ n ' t require an OR gale.)
6.2 Combinational Logic Optimizations and Tr adeoffs
EXAMPLE 6.1 Two- level logic size optimization using algebraic methods
Minimi ze the number of literals and tenns in a two- level implementati on of the equation:
F - xy z + xyz ' + x ' y ' z ' + x ' y 'z
Let's minimi ze using algebraic transfonnali ons:
F - xy ( z + z ' ) + x ' y , ( z + z ' )
F = xy*l + x ' y ' * l
F - xy + x ' y '
297
There doesn' t seem to be any further minimization we can perform. Thus, we've reduced the circuit
from 12 literals and 4 terms (meaning 12 + 4 = 16 ga,e inputs. Or 32 transi ,ors), down to only 4
literals and 2 terms (meani ng 4 + 2 = 6 gate inputs. or 12 transistors).
The previous example showed the most common algebraic transformation us ed to sim-
pli fy a Boolean equation in sum-of-products form, a transformati on that generall y can be
wril1en as:
ab + a b ' ~ a ( b+b ' ) = a * l = a
Let's call thi s transformati on combining terms to eliminate a variable. More for-
mall y. thi s transformation is known as the ullitillg theorem. In the previous example, we
appl ied thi s transformat ion twice, once with xy bei ng a and z being b. and a second time
with x ' y' being a and Z being b.
Sometimes we need to duplicate a term in order to increase opportunities for com-
bining terms to eliminate a variable. as illustrated in the next example.
EXAMPLE 6.2 Reusing a te rm dur ing two-level logi c s ize opti mization
Minimize the number of literals and tenns in a two-level impl ementati on of the equation:
F - x ' y 'z ' + x 'y ' z + x ' yz
You mi ght notice twO opponunities to combi ne tenns to eliminate a variable:
I: x 'y'z ' + x ' y ' z - x ' y '
2: x' y ' z + X ' y z = x ' Z
Notice that the ' enll x ' y , Z appears in both opponuniti es. but that tenn onl y appears once in the
ori ginal equati on. We ll therefore fi rst repli ca,e 'he tenn in the original equation (such replication
doesn' t chnnge the fu ncti on, because a :: a + a) so thai we can use the tenn twice when rom-
bi ning terms to eliminrue a vari nble,:J. foll ows:
F - x'y ' z' + x ' y'z + x ' yz
- x ' y ' z ' + x'y'z + x'y'z + x ' yz
F - x ' y , (z+z ' ) + x ' Z (y ' +y)
F -x' y ' +x ' z
After we have combi ned terms to eliminate a vari abl e, the resulring tenn mi!!ht a1s
be combinable wit h other terms to eliminate a variabl e. as sho\\ n in the ~ -Uowing
example.
298 6 Optimi zations and Tradeoffs
EXAMPLE 6.3 Repeatedly combining terms to eliminate a variable
Minimi ze the number of literals and terms i n 3 two-level i mplementation of the equati on:
G : xy ' z ' + xy 'z + xyz + xyz '
\Ve can combi ne the first IWO terms to eli minate a variable. and the lasl Iwo terms also:
G = xy ' (z '+z) + xy(z+z ' )
G xy ' + xy
We can combine the twO remaini ng terms to elimi nate a vari abl e:
G
G
xy ' + xy
x(y ' +y)
G : x
In the previous examples, how did we "see" the opportuni ties to combine tenms to
eliminate a variable'? The examples' ori gi nal equations happened to be wri tten in a way
that made seeing Ihe opportuniti c easy-ternl s that coul d be combi ned were side-by
side. Suppose in; tead the equati on in Example 6. 1 had been writt en as:
F : x ' y ' z + xyz + xyz ' + x ' y ' z '
That's Ihe same fu nction, but the terms appear in a different order. We mi ght see that
the middle two ternl S can be combi ned:
x 'y ' z + xyz + xyz ' + x ' y ' z '
x ' y ' z + xy(z+z ' ) + x 'y ' z '
x ' y ' z + xy + x ' y ' z '
But then we might not see that the left and ri ght lenns can be combined. We Iherefore
might stop minimizing. thinki ng that we had obtained a full y minimi zed equati on.
There is a visual method to help us see opportunities to combi ne terms to eliminate a
variable. a method we now describe.
A Visual Method for Two-Level Size Optimization-K-Maps
Kamal/gil Maps, or K- maps for short , are a visual method intended to assist humans to
algebraicall y minimize Boolean equations having a few (two to four) variables. They actu
ally are not commonly used any longer in design practi ce, but nevertheless, they are a very
effective means for l/Iulersf(lIIdill g the basic opti mi zat ion methods underl ying today' s auto
mated tools. A K- map is essenti all y a graphi cal representati on of a truth lable, meaning a
K-map is yet anot her way to represent a function (the other ways including an equation,
truth table. and circuit). The idea underl ying a K-map is to graphicall y place mi nlenns
adjacent to one another if those mintenns differ in one variable onl y. so that we can actually
"see" the opportuni ty for combi ning terms to eliminate a variable.
Three-Va ri able K-Maps
Figure 6.4 shows a K-map for the equal ion:
F - x ' y ' Z + xyz + xyz ' x'y ' z '
IflaK-map.
adjacell1 cells
differ ;1/ ('.welly
olle mri(lble.
K-lIIl1PS enable
liS (osee
opportunities to
combine lerlllS
to eliminate a
mrioble.
EXAMPLE 6.4
6.2 Combinational Logic Optimizations and Tradeoffs 299
F yz
0
1 l
corresponds
to xyz;ooo,
or x'y' z'
i
00 "
Ot 1t
t t 0
0 0 1
notice not
/inorder
I
10
0
t
]\,
- ---- -------- ----""
treat left and right
edges as adjacent too
which is the equation from Exampl e 6.1 but wi th
terms appearing in a di fferent order. The map has
eight cell s, one for each possible combination of
vari abl e values. Let's examine the cell in the top
row. The upper-left cell corresponds to xyz:OOO,
meaning x ' y , z ' . The ne., t cell to the right corre-
sponds to XYZ:00 1, meaning x ' y ' z. The next cell
to the ri ght corresponds to xyz :011, meaning
x' yz. And the right mo t top cell corresponds to
xyz:010, meaning x ' yz'. Notice that the
orderi ng of those lOp cell s is 1I 0t in increasino
binary order. Instead. the order is ODD. 00 l. 01 t Fi gure 6.4 Three-variable K-map.
010. rather than ODD, 001, 010, 011 . The ordering
is such that adjacellt cel/s differ in exactly olle variable. For exampl e. the cells for X ' Y , z
(001) and x ' yz (011) are adjacent. and diffe r in exactl y one variable. namely. y. Like-
wi se. the cell s for x ' y , z ' and xy ' Z' are adjacent. and differ only in variable x. The
map is also assumed to have its left alld right edges adjacellt , so the rightmost top cell
(010) is adjacent to the leftmost top cell (00 D)- note those cells too differ in exactly one
vari abl e. Adj acent means abutted either hori zontall y or vertically. but 1I0t diagonal/y.
because di agonal cell s differ in more than one vari able. Adjacent bottom row cells also
differ in exactly one vari abl e. And cell s in a column also differ in exactly one variable.
We can represent a Boolean functi on as a K-map by placi ng Is in the cells conre-
sponding 10 the function's mimenns. So for the equati on F above. we place a 1 in cells
corresponding to minlerms x ' y' z, xyz, xyz ' . and x' y ' z ' . as shown in Fi2ure 6A. We
place Os in the remaining cell s. Noti ce that a K-map i j ust anotller repres;ntation of a
lruth table. Ralher than showing the output for every poss ible combination of inputs using
a tabl e. a K-map uses a graphical map. Therefore. a K-map is yet another representation
of a Boolean functi on. and in fact is anot her standard representation.
The usefulness of a K-map for size minimizati on is that. because the map is designed
such that adjacent cell differ in exactl y one vari able. then we know that (\\,0 adjacent 1s
ill {I K-map indicate tlial we can combine the {H'O m;llterms TO eliminate a l'ariable. 10
other words. a K-map lets us easil y see when we can combine two terms to eliminate a
variabl e. We indicate such combining by drawi ng a circle around two adjacent Is. and
then we show the resulting term aft er the differi ng vari able i removed. We iJlu ITate in
the foll owing exampl e.
Twolevellogic size optimization using a K-map
Mi nimi ze the number of literals and le mlS in a two- level F yz
of the equ:.uion:
F xyz + xyz ' + x ' y ' z ' + x ' y 'z
Ole that thi s is the same equation as in Example 6.1. \Ve
creme a K- map represcllI ing the runclion. shown in Figure
6.5. We see adjacent Is at the upper left of the map. so we
circle Ihose Is to yield Ihe Icn11 ' y ' -in olher \\ orus.
the circle is II sltorf/Illlld notation for).. I y , Z I + I Y . z
00 Ot t1 to
oC t 1 ) 0 0
1 0 o ( t 1
x'y'
""
Figure 6.5 Minimizing J
vnriabk fun 'tion l K-m.lp.
300 Optimizations and Tradeoffs
A/nap drau the
Circles
posJtble to "Over
the 1.1 In a Kmap.
'" x ' Y I. Likewi se. we see adjacent 1 s at the bottom right circle of the map. so we draw a circle
representing xyZ + xyz ' - xy. Thus. F x' y' + xy.
Recall from Exampl e 6.3 that someti mes terms can be repeatedly combined to elim-
inate a variable. resulting in even fewer terms and literal s. We can redo that example
using a diffe rent order of simpli fi cati ons as foll ows:
G xy ' z ' + xy ' z + xyz + xyz '
G x(y'z ' + y ' z + yz yz ')
G x(y ' (z ' +z) + y(z+z ' ))
G x (y ' +y )
G x
Not ice that Ihe second line above ANDs x wit h the OR of all possible combinations
of vari ables y and z. Obviously. onc of those combinati ons of y and z will be true for any
val ues of y and z. and thus the subexpression in parentheses will always evaluate to 1. as
we algebraicall y affi rmed in the latter lines above.
K-maps also help us graphicall y see Ihis situa- G yz
00 01 11 10
0 0 0 0 0
1 C 1 1 1 1
tion. In addi tion to helping us see when we can
combine two mi nlcrms 10 eliminate a vari able.
K-maps give us a graphi cal way to see when we can
combine four minterms to eliminate two variables.
We merely need to look for four adjacent cell s.
where the cell s form either a rectangle or a square
(bul not a shape like an " L"). Thosc four cell s will
have one variabl e the same. and all possible combi-
Fi gure 6.6 Four adj acent 15.
nati ons of the other two variables. Figure 6.6 shows the earli er functi on G as a three-
variable K-map. The map has four adjacent 1s in the bottom row. The four minterms cor-
responding 10 those Is are xy , z ' . xy , z. xy z. and xy z ' - note that x is the same in all
four minterms. whi le all four combinations of y and z appear in those minterms. We draw
a ci rcle around the bott om four 1s to represent the simplificati on of G shown in the equa-
lions above. The result is G x. In other words. the circle is a shorthand notation for the
algebraic simplifi cation of G shown in the five equations above.
'ate Ihat we could have drawn circles around
the left IWO 1s and the ri ght two 1s of the K-map.
as shown in Figure 6.7. result ing in G xy' +
xy. Clearl y, G can be further simpl ified to
x (y ' +y) Thus, we shoul d always draw the
biggest circle possi bl e. in order to best minimize
the equali on.
G yz
0
.. Y
xy
00
0
1
01 11 10
0 0 0
1 1 1
As another exampl e of four adj acent 1 s, con-
sider the equati on:
Fig ure 6.7 Nonoplimal circles.
H - x ' y'z + x'yz + xy ' z + xyz
xy
Figure 6.8 shows the K-map for that equati on's functi on. Circling the four adjacenl
Is yields the min imized equati on. H - z.
It 's OK 10 co\'er a
I more thcm ollce
to mi"imi:.e
mulliple terms.
Draw the fewest
ci,des possible. 10
mi"i", i:.e ,he
"umber of tenus.
6.2 Combinational Logic Optimizations and Tradeoffs 301
Sometimes, we need to draw circles Ihat include
the same 1 twice. That's okay. For exampl e, consider
the equation:
I x ' y ' z + xy ' z ' + xy ' z
+ xyz + xyz '
Figure 6.9 shows the K-map for that equati on 's
functi on. We can draw a circle around the bottom
four 1s to reduce those four mi nlerms 10 just x. But
that leaves the single 1 in the top row. corresponding
to minterm x ' y ' Z. We have 10 include that minterm
in the minimi zed equati on, since if we left that
mintenn out , we woul d be changing the funcl ion. We
could include Ihe mint erm itself. yielding I x +
x ' y , z. But that'S not minimized, because the ori o-
inal equation included mi nlerm xy , z. and xy ' z 0+
x ' y ' z (x+x ' )y ' z y ' z. On the K-map. we
draw a circle around that top 1 that also includes the
1 in the cell below. The minimi zed function is thus
I x + y ' z.
H yz
00 10
o 0 o
o o
Figure 6.8 Four adj acenl Is.
yz y'z
00 01 ) 11 10
0 0 1 0 0
x
1 ( 1 1 1 1
Figure 6.9 Circli ng a 1 twice.
It 's OK to include a 1 twice-that doe n' t change the functi on. Think about it: the
funcLi on doesn' t change if we dupli cate a minlerm (don ' t forgel. a a + a) _ and dupli -
cating a minterm can all ow for more opt imization. In other words:
x ' y ' z + xy ' z ' + xy ' z + xyz + xyz '
x ' y ' Z + xy ' z + xy ' Z ' + xy ' z + xy z + xy z '
(x ' y ' z + xy 'z) + (xy ' z ' + xy ' z + xyz + xyz')
(y ' Z) + (X)
We duplicated a minteml. which resulted in betler optimizati on.
On the other hand. there's no reason to circle 1 s more than once if the 1 are alread
included in a minimi zed term. For example. the K-map for the equation:
J x ' y ' z ' + x'y ' z + xy ' z + xyz
appears in Figure 6. 10. There' s no reason to draw the
circle resulting in the term y ' z. The other IWO
circles cover all the I s. meaning Ihose two circles'
terms cause the equati on to output 1 for all the
required input combinati ons. The third circle JUSt
result s in an extra term without changing the func-
ti on. Thus. we not onl y wanl 10 draw the large t
circles possible to cover all the 1 s. but we also want
to draw the f ewest circles.
yz
o
00
o
10
xz
We ment ioned earli er thot Ihe left and ri ght ides of a K- map are adja nt. Thus. we
can draw circles that wrap around the sides of a K-map. For example. the K-map for th
equati on:
K - xy'z' + yz' + ' y'z
302 Optimizati ons and Tradeoffs
appears in Fi gure 6. 11. The IWO cell s in the
with Is are adjacenL since the left and nght SIdes of
the map are adj acenl. and we can one
circle that covers both. resulllllg III the term xz .
Sometimes a I does not have any adj acent Is. In
that case. we simpl y circl e the single 1. 111 a
term that is a mi ntcfm. The tcrm x ' y ' z 111 Fi gure
6. I I is an example of such a term. .
A circle in a Lhree- vm'iabl e K-map musL Involve
one cell. two adjacenL cell s, four adj acenL cell s. or
eight adjacent cell s. A circle can lIot involve onl y
Lhree. fi ve. six. or seven cell s. The reason IS because
the circl e l11 ust represent algebraic lransform3t1,OnS
lilat eliminate variables appearing in all possibl e
combi nations. since Lhose variabl es can be facLored
ouL and Lhen combined La a 1. Three adj acenL cell s
don' L have all combinati ons of LwO variabl es-one
combi nation is mi ssing. Thus, the circle in Fi gure
6. 12 would not be va li d. since iL corresponds La
xy , z ' + xy , z + xy z. whi ch doesn' L simplify down
to one (crm. To cover that functi on. we woul d need
LwO circles. one around the lefL pair of 1 s, the oLher
around the ri ghL pair. .
If all the cell s in a K-map have Is. I1ke for the
funcLi on E in Fi gure 6. 13. Lhen we would have eighL
adj acent 1 s. We can draw a circle around those elghL
cell s. Since thaL circle represent s the ORing of all
possible combi naLi ons of the funcLion' s Lhree van-
abi es. and ince obviously one of Lhose combill all ons
wi ll be true for any combinaLi on of inpuL values, Lhe
equati on would minimize LO JUSL E = 1. .
Whenever in doubL as La whether a circle is val1 d,
j usL remember LhaL the circle represents a shorthand
K
E
F
yz
x'y'z
00 01 11 10
0 0 0
0 0
Figure 6.11 Sides are adj acenl.
yz
00 01 11 10
0 0 0 0 0
1 1 1 1 0
Figure 6.12 Invalid ci rcl e.
yz
o
Fig ure 6.13 Four adj acent 1s.
yz
for algebraic LransfonnaLi ons thaL combine Lerms LO
a vari able. A circle mUSL represenL a seL of
Lenns for which all possible combinaLions of some w
variabl es appear whil e ot her vari ables are idenLi cal in
x
00 01 11 10
all Lenns. The changing variables can be eliminaLed.
resulLing in a single Lerm wi Lhout those vari ables.
Four- Va riable K-Maps
K-maps are also usefu l for mini mizing fou r-variable
Boolean functions. Figure 6. 14 shows a four-variable
K-map for the following equaLion:
00
01
11
10
0
1
0
0
0 1 0
1 1 0
0 1 0
0 1 0
yz
"'--'
xz:
F = w' xy ' z ' + w' xy ' z + w' x ' yz + w' xyz
+wxyz+,tX ' Yz
Figure 6.14 Four-variable K-ntnp.
6.2 Combinational Logic Optimizations and Tradeoffs
303
Agai n, noti ce that every adj acent pair of cell s differs by exactl y one vari able. The left and
ri ght sides of the map are considered adj acenL, and the top and bottom edges of the map
are also adj acent- note that the left and ri ght cell s differ by onl y one variable, as do the
top and bOLl om cell s.
We COver the I s in the map with the two circl es shown in Fi gure 6. 14, resulting in
the terms w' xy ' and y z, so the minimi zed equati on is F w' xy ' + y z.
A circle covering eight adj acent cell s woul d rep-
resent all combinali ons of three vari abl es so
algebraic manipulati on would eliminate all three'vari-
abl es and yield one tenn. For exampl e, the function
in Fi gure 6. 15 simplifies to a single lenn, z, as
shown.
Legal-sized circles in a four-vari abl e K-map are
one, two, four, eight , or sixteen adj acent cell s. Cir-
cling all sixteen cell s yields a functi on that equals 1.
Larger K-Maps
G yz
wx
00
01
11
10
00
0
0
0
0
01 11 10
/, r;.,
0
1 1 0
1 1 0
1\1
1 0
,,?
K-maps for fi ve and six vari abl es have been pro-
posed, but are rather cumbersome to use effecti vely.
Thus, we do not di scuss them further.
Figure 6.15 Ei ght adjacent ceUs.
K-maps for two variables al so exi st, as shown in
Figure 6. 16. However, they aren' t particul arl y useful ,
because two-variable functions are very easy to mini-
mi ze algebraically.
Using a K-Map
Given any Boolean function of three or four vari -
ables, the foll owing method summari zes how to use a
K-map to minimi ze the functi on:
Figure 6.16 Two-variable K-map.
L COl/ vert the functi on's equation into sum-of-minternls fonn.
2, Place a 1 in the appropriate K-map cell for each mintenn.
3. Cover all the 1 s by drawi ng the 1I1il1i1l1UIII number of largest circle uch that
every 1 is included at least once. and wri te the corresponding tenn.
4, OR all the resulting tenns to create the mini mized function.
The first step. converting to sum-of-'ninternls fonn. can be done algebraical ly. as was
done in Chapter 2. Alt ernati vel y. many peopl e fi nd it easier to combine steps I and by
converting the functi on's equati on 10 sum-of-products fonn (where each tenn is not nec-
essaril y a mint enn), and then filling in the Is on the K-map corresponding to each tenn.
For exampl e. consider the four-variable function:
F = w' xz + yz + w'Xy'l '
The term \< ' xz corresponds to the two light l haded cdl in Figure 6. 17. so \\0 put
Is in tho e cell s. The tenn y l corre ponds to the entire dark- haded c lumn in the figure.
The lenn w' xy , z ' corresponds to the single unshaded cell shown on the left with a 1.
30"'
Optimizations and Tr adeoHs
Minimi zati on would proceed by coveri ng Ihe Is
wilh ci rcl es and ~ R i n g allihe lerms. The funclion in
Fi gure 6.1 7 is identical 10 Ihe function in Fi gure
6. 14. for whi ch we oblained the minimized equation:
F : w' xy ' + yz.
EXAMPLE 6.5 Two-level logic size optimization us ing
a three-variable K-map
Minimi ze the foll owing equation:
G : a + a ' b ' e ' + b* (e ' + be ' )
Lel"s begin by convening the equation to sum-of-products:
G : a + a ' b'e' + be ' + be '
\Ve place 1 s in a three-vari able K-map corresponding to
each teml. as in Fi gure 6. 18. The bottom row corresponds
to the term a. the top left cell to term a ' b ' e ' . and the
ri ght column to the teml be ' (whi ch appears (wi ce in the
equati on).
We then cover the Is usi ng the two circles shown in
Figure 6. 19. ~ R i n g lht.! resulting tenns yields the mini-
mi zed equation G = a + c '.
EXAMPLE 6.6 Two-leve l logic size opti mization usi ng
a four-variable K-map
Minimi ze the foll owing equation:
H: a 'b' (ed ' + c ' d ' ) + ab ' e ' d ' + ab ' ed '
+ a ' bd + a ' bcd '
Converting to sum-of-products form yields:
H : a'b'cd' + a ' b ' c 'd ' + ab ' c ' d' +
a b' cd ' + a' bd + a ' bcd '
We fi ll in the Is corresponding to each term, resulting in
the K-map shown in Figure 6.20. The term a ' bd corre-
sponds to the two cell s whose Is are in italics. All the
other (enns are minterms and thus correspond to one cel l.
We cover the Is using circl es as shown. One "circle"
covers the four comers, resulting in the tern' b ' d ' . That
ci rcle may look strange, but remember Lhal the top and
bottom cell s are adj acent , and the left and ri ght cells arc
adjacenl. Another circle results in the term a ' bd, and a
thi rd ci rcle in the term a ' be. The minimi zed two- level
equati on is thererore:
H - b 'd ' + a ' bc + a'bd
Ole the bolded 1 in Fi gure 6.20. We covered
that 1 by drawing a circle that included Ihe 1 10 Ihe
- - - ~
H
F yz
w x
00
~ ~
tt
to
w'xz
00 01
0
1\0\
t 1
0 0
0 0
yz
1t 10
1 0
1 0
r 0
1 0
Figure 6.17 IV ' xz and yz terms.
G be
a
00 01 It 10
0 t 0 0 1
1 1 t t t
Figure 6.18 Terms on the K-map.
G be
o
Figure 6.19 A cover.
cd
ab
00
b'd'
01
a' be
11
a'bd
10
Figure 6.20 K-mop example.
6.2 Combinational Logic Optimizations and Tradeoffs
305
left, yielding the lerm a ' bc. Alternatively, we could have drawn a circle that included the
1 above, yielding the term a' c d' , resulting in the minimized equation:
H : b ' d ' + a ' cd ' + a ' bd
NOI onl y does thai equal ion represent the same function as the previ ou equation, that
equation would also require the same number or transistors as the previ ou equation.
Thus, we see thai Ihere may be mUltiple minimi zed equations that are equally good.
Don't Care Input Combinations
Sometimes, we are guaranteed that cert ai n input combinati ons of a Boolean functi on can
never appear. For those combinati ons, we don' l care whether the functi on outputs a 1 or
a 0, because the function will never actuall y ee those input values-the output for those
inputs just doesn' l maHer. As an intuitive example. if you became ruler of the world_
would you li ve in a paJace or a castle? Your answer (the output) doesn't matter. because
the inpul (you becoming rul er of the world) simply won't happen.
Thus, when given a don't care input combination, we can choose whether to output a 1
or a 0 for each inpul combination, such that we obtain the best minimization pos ible. We
can choose whatever outpul yields the best minimization, becau e the output for those don' t
care input combinati ons doesn' l matter, as those combinations simply won'l happen.
Algebraically, we can use don ' t care terms by introducing them into an equation
during algebrai c minimi zation 10 create the opportuni ty to combine terms to eliminate a
variable. As a si mpl e example, consider a function F : xy ' l ' . for which we are for
some reason guaranteed that the ternl S x ' y , z ' and xy , z can each never evaluate to l.
We notice thai adding the firsl don'l care lerm to the equation would result in xy , z' +
x ' y ' z' (x+x ' ly ' l ' : y ' z'. Thus, introducing thai don't care term x ' y ' z '
into the equation yields a minimi zation benefit. However. introducing the second don' t
care term does not yield such a benefit, so we choose not to introduce that term.
In a K-map, don 'I care input combinations can
be easily handled by placing an X in a K-map for
each don't care mintenn. We don'l halle to cover the F yz 'fz'
Xs with circles. bUI we call cover some X if that
helps us draw bigger circles while covering the 1 s.
meaning fewer lit erals will appear in the term corre-
sponding to the circl e. For the above example, we
would draw the K-map shown in Figure 6.21 , having
one 1 corresponding to xy ' z '. when the func ti on
lilli S/ outpul l, and havi ng IWO XS corresponding to
x ' Y , z ' and xy , l, when the function ilia), OUtpUI 1
if thai helps us minimize the function. Drawing a
single ci rcl e results in the minimized equation F :
y , l ' . (Be careful in Ihis discussion not to confuse
the uppercase X. corresponding to a don't care. with
the lowercase x. corresponding to a variable.)
Remember, don't cares don 'I hare to be cov-
ered. The cover in Figure 6.22 gives an example of a
F
00 01 tt to
0 X 0 0 0
t 1 X 0 0
Figure 6.21 Map with don't cares.
yz 'fz' unneeded
00 01 11 10
0 X 0 0 0
1 1 X 0 0
Figure 6.22 Wasteful u>e of X
306 Optimizations and Tradeoffs
wastefu l use of don't cares. The circle covering the botl om X. yielding term xy ' , is not
needed. That tenn is not wrong, because we don ' t care whether the output is I or 0 when
xy ' evaluates to 1. But. that term would result in a larger circuit. because the resulting
equation is F - y ' z ' + xy ' . Since we don' t care, why not make the output 0 when
xy ' Z is I . and thus obtai n a small er circuit ?
EXAMPLE 6.7 Two-level logic size minimization with don't cares on a K-map
EXAMPLE 6.8
MinimilC the fo ll owing
F - a' be ' + ab c ' + a ' b ' e
given that tefms a ' be and abc are don't cares. Intuitively, those don' , cares mean that be can
be 11.
\Ve begin by the 3variable K-l11ap in Fi gure
6.23. We place ls in the three cell s for the functi on's mi n-
lenllS. \Ve then place Xs in the two cell s for the don't cares.
We c;.m cover the upper- left 1 using a circle that includes an
X. Likewise. includin2 the two Xs in a circle covers the (Wo
Is on the right with- a bigger circle. The resuhing mini-
mized equation is F a ' e + b.
F be
o
00
o
o
a'e
01
o
Wilhom don't cares. the equation would have mini-
mized to F = a ' b . c + be ' , Assuming two transistors
Figure 6.23 Using don' t cares.
per input and ignoring invcncrs, the equation mini - ,
mized wit hout don't cares would require (3+2+2) * 2 ;;;; 14 Inlllsistors (3 gate IIlputs for the first
AND gate, 1: fo r the second AND gate, and 2 for the OR gale, times 2 transistors per gate input), In
cont rast. the equati on minimized with don't cares requires only (2 + 0 + 2)*2 ;;;; 8 lransislOrs.
Don't care input combinations in a sliding switch example
Consider a sliding switch. shown in Figure
lhat can be in one of five positi ons. 3
with Ihree outputs x. Y, and Z indicati ng the 2,3,4,
positi on in bi nary. So xy Z can lake on the detector
values of 001. 010. al l. IDa, and 101. G
The other values for xy Z are nOt possible,
namel y. the values 000. 11 0. and III (or
x ' y ' z '. xyZ '. and xYZ ). We wish to Figure 6.24 Slidi ng swit ch example.
dCii ign combin:uional logic. with x. y, and Z
inpulS, that outputs 1 if the switch is in posi-
tion 2, 3, or 4, corresponding to xy z vlI lues
of 010. 011. or 100.
A Boolean equnti on describing the
de' ired logic is: G 2 x ' Y Z ' + x' y Z +
,(y , z ' . We can minimize the equation using
a K-map, a, shown in Figure 6.25. The mi n-
Imi/.ed equati on that rC'I uli s is: G .. xy ' l '
+ x ' y.
However, if we dan', we
can obtain a Simpler minimi7cd cqUlllion, In
part ic ul ar. we "'now th'H nOne of the thrce
G yz
00 01 11 10 x'y
0 0 0 1 1
xy'z'
1
1\.1/
'11 0 0
Figure 6.25 Without d n' t cares.
6.2 Combinational Logic Optimizations and Tradeoffs
mimerms x ' Y" l ' . xYl " and xy l can ever be true,
because. the switch can only be in one of the above-stated
five positi ons. So it doesn' t mailer whet her we omput a 1 G
or a 0 for those three other mi nterms. We can include yz y
o
Figure 6.26 With don' t cares.
307
these ca,rc input combinations as Xs on the K-map.
as shown III Fi gure 6.26. When coveri ng the Is in the top
fi ght. we can now draw a larger circle. resulting in the
term y. When covering the 1 at the bottom left , we can
draw a larger circl e also, result ing in the term z'.
Although we ended up covering all the Xs in thi s example.
recall ,that we do not have 10 cover the XS-we onl y use
them If they help us COver the Is wi th laroer ci rcles. The
minimized equal ion that results is: G "" yO + z ' .
That minimized equat ion lI sing don' t cares looks a lot different than the minimized equation
without don' t cares. But keep in mind the circuit still works the same. For example. if the witch is
in position 1. then xyz will be 001. so G - y + z' evaluates to O. as desired.
DOII'I cares II/IISI be IIsed w;lh call/ ;Oll . We must balance the criteri a of size with
other criteria, like reli able, error-t olerant , and safe ci rcui ts. when deciding whether to use
don' t cares. We must ask ourselves-is it ever possible that the don ' t care input combina-
ti on II/;ghl occur, even if in an error situation? And if it ;s possible. then do we really not
care at all what Our circuit outputs in that situation? Often. we really do care. and will
want to ensure Our circuit outputs a panicular value. For example, in the sliding witch
example above, perhaps temporary values could appear at the xy z outputs as the swi tch
is being moved. We might therefore want to ensure we output 0 for the don' t care val ues.
Several common situati ons lead to don't cares. Sometimes don't cares come from
physical limits on the inputs-a switch can' t be in two positi ons at once. for example. If
you' ve read Chapter 3, then you may reali ze that another common si tuati on in whicb don't
cares may appear is in controll er design, when a controller uses a state register that can
represent more states than the controller requires. For exan1ple. a controller with 17 tates
may use a 5-bit state register, meaning that 15 of the 32 possible states of the state register
would be unutili zed. Those 15 states could be treated as don' t cares (although to be safe.
we mi ght actually want to transiti on back to an initial tate if we ever enter one of those 15
unused states due to noise or some other error). If you've read Chapter 5. then you may
reali ze that another common situation where don' t cares arise i- in a controller controlling
a datapath. If we aren't readi ng or writing to a part icular memory or register file in a given
state, then we don' t care what address appears at the memory or register file during
that state. Likewise. if a mux feed into a register and we aren' t loading the register in a
given state. then we reall y don' t care which mux data input passes through the mux duri.ng
that state. If we aren' t going to load the output of an ALU into a register in a given statc-
then we reall y don' t care what function the AL computes during that state.
Automating Two-Level Logic Size Optimization
Visual sc of K-Maps Is Rather Limited
Although the visual K-map method is helpful in two-level optimization of three- and
four-variable functi ons. the visual method is unmanageable for functions \\ ith man> more
308
Optimizations and Tradeoffs
variables. One probl em is that we can' t effecti ve ly visuali ze beyond 5 or 6 vari-
ables. Another problem is that humans make mi stakes. and mi ght not draw
lhe biggest circl e possible on a K-map. Furthermore. the order 111 whi ch a deSi gner beginS
Is may resul t in a function that has more terlm than would have been obtamed
using a different order. For example. consider the functi on shown 111 the K-map of Fi gure
6. 27(a). Starting from the left. a designer mi ght first draw the circl e Yielding the term
y ' Z '. lhen the circle yielding x Y ' . then the ci rcle yielding y z. and finaHy tlhe wcie
yielding xy. for a towl of four terlns. The K-map in shows an a tematlve
cover. After drawing the circle y,eldll1g the lerm y z . the deSigner draws the Circle
yielding x z. and then the circl e yielding xy. The alt ernati ve cover uses only three terms
instead of four.
yz
yz
00 01 11 10 00 01 11 10
0 1 1 1 0 0 1 1 1 0
(a )
\\ 0
1 1 1 1
I I I
)
1
0 1 1 1 1
I I
(b
x'z y'Z: xy
y' Z: x'y' yz xy
Figure 6.27 A cover is nOI necessaril y oplimal : (a) a four-Ierm cover. and (b) a Ihree-Ierm cover of
the same funclion.
Concepts Underlying Automa ted 1\,'o-Level Size Optimization . .. .
Because of the above-menti oned problems, Iwo- Ievelloglc Size optimi zation IS done pnma
rily u ing automated compuler-based tools executing heuristic or exact algorilhms. A
heuristic is a problem solving melhod lhat IIslial/y yield a good solull on. whi ch IS
clo e to the oplimal. but II Ot II ecessarily optimal. An exact algoflthm . or Just algomhm. IS a
problem olving method lhat yields the optimal soluti on. An .optimatsollltion i as good or
better than any other possible soluti on. wilh respect to the cri teri a of Interest to us.
We firs t define some concepts underlying heuri stic and exact algorithms for two
level logic ize optimization. We wi ll illustrate lho e concepts graphicall y on K-map . but
uch illustration i onl y intended to provide the reader with an intuition of the concepts-
automated tools do not u e K-maps.
Recall that a functi on can be written as a um-of-mint erm equation. A minterm is a
product term that includes all the function' vari able exactl y once, in ei lher true or com
plemented form. The on-set of a function is the set
of minterms that define when the function should F
evaluate to 1 (i.e .. when the functi on is 'on"). For yz
the function in Fi gure 6.28. the on-set I x . y' Z.
/,y Z, xy Z ' I. The off-set of a functi on is all the
remaining minterms. For the functi on in Fi gure
6.28. the off- et is: I x' y , z ' . x ' y z " x' y Z.
o
':y'z
00
o
I Y , z ' . Jl.y . 1 J. V,j ng compact mintcrm'
tallon (<oee ection 2.6), the i, 11 .6.7}. and
the off-\et j- 10,2,3.4.5}.
Fi gure 6.28 Impliennl'.
6.2 Combinational Logic Optimizations and Tradeoffs
309
An implicant is a pd
'I bles b . ro uct term that may Include fewe r than all the function's vari-
, , ut IS a term that onl I .
d
'. y eva uales to 1 If the function should evaluate lO I -in other
War s, an Implicant of a f . .
. bl I unction IS a l.erm that should evaluale to 1 for a panicular set of
varia e va ues onl y if at I f ' , .
h
. bl east one 0 the funcuon son-set min terms evaluales to 1 for
lose varia e values Fl '
. I' . or examp e, the functi on F = x ' y ' Z + xyz' + xyz has four
IInp Icants: X ' Y ' z xy z ' '. . .
' 1 ' , xyz, and xy. Graphically, an Implicanlls any legal [but not
necessan y the bi ggest possible) circle on a K-map, as shown in Figure 6.28. All min-
terms are obViously impli cants, but not all impli cants are minterms .
We that the implicant xy covers minterms xy z' and xy z of function F. Graphi-
call y an Implicant 's circl . I h i' .
, e enCirc es t e s of the covered mlnlerms. Intuitively, we know
that we can replace the Covered minterms by the covering implicant and still obtain the
same function. In other words, we can repl ace xy z '+ xy z by xy. A sel of implicants that
covers the on-set of a func ti on (and covers no other min terms) is known as a caver of the
function. above function. one funclion cover is x ' y' z + xy z + xy z ' : another
cover IS X Y z + xy; yet another cover is x ' y ' z + xy z + xyz'+ xy.
RemOVing a variable from a term is known as expanding the term. which is the same
a expanding the size of a circle On a K-map. For example, for the functi on in Figure
6.28, expanding the term xy z to the term xy (by eliminating z) results in an implicant of
the func ti on: Expanding the term xy Z' to xy also results in an implicant (the same one).
But. expandl,n
g
xyz to xz (by eliminating y) doe not resu1l in an implicant-xz covers
mlnt ernl xy z, which IS not In the funclion ' s on-set.
A prime implicant of a function is an implicant with the property that if any variable
were elimmated from the implicant, the result would be a lerm coveriJlo a minterm not in
the function's on-set. Graphically. a prime implicant corresponds to ;ircles that are the
largest possible-enlarging the circie further would result in coverin!! as. which chanoes
the functi on. In 6.28, X Y . z and xy are prime implicants. Re;;'O\<ing any variable
from Impli cant x y z , say z, would result in a term (x' y . ) that covers a minlerm that is
not In the on-set-x ' y' covers x ' y , z . , for exanlple. which i not in the function' on-
set. Likewise. removing x ' or y ' from that term would cover a minterm not in the fun _
ti on' s on-set. Removi ng any variable from inlplicant xY. say y , would re ult in a lerm ( )
Ihat covers minterms not in the on- et. On the other hand. xy z is not a prime impli ant.
because z can be removed from that implicant without changing the functi on. since y
co.vers nllnteml s xyz and xy z' , both of which are in the on-set. Likewise. xyz' is nOla
prime Implicant. because z ' can be removed. There is no rea on to cover 3 function with
anything ot her than prime implicants, since a prime implicant a hie,'es the same function
wi th fewer literals than nonprime inlplicants (which is why we n!W:l) dra\\ the bi !! t
circl es po si bl e in K-mnp ). =
An essential prime implica"t is a prime implicant lhal is the mIl)' prime intplic3Dt
that covers a particular minteml in the fun tion' on-set. Graphicn!I). an e - ntin! prime
IIllp! lcant I the only circle (the largest PO' ible. f course. in e the circle rou ' represent
II prime Impli cant ) that covers a parti ular 1. In Figure 6.2 . x ' l is IlIl e ' ntial prime
impli cant. II i xy . because each i the only prime impli Wit vering n pani -ular 1.
nonessent ial pri me implicunt is a prime implicant \\ hose ,-o\ ered ruintenns are nJso
covered by one r more other prime implicllnts. Fig.ure shO\,s II different function
thnt has four prime implicant. but only two of which are e s ntial . ' 'is an e,' ntia!
prime implicant because it is th' only prime impJi ant that o'crs mint-eml \ '_ ':'. _
- - ._--
310 OptimIZations and Tradeoffs
j" nn c.." ential prilllt! illlplicani bt!cnllsc it is the only
prime impiicarll that CO\'\!f" minlenn xY Z ' . y' z a
nones ... elllial prime implicant because both of ItS
co\cred minh:: rm", are by other implicants
(lho;e other prime implicants mayor may not be
essential prime implicants). Likewise. Xl i not
The importance of essential prime illlpli-
C3nt" we know that we must include all
prime impiicanls in n function' s cover. 0111-
en' there would be .sOl11e minlcrms that could not
be covered. We mayor may nOl need 10 include non-
e"emial primc implicams 10 completely cover the
function. but we must include all essential prime
implicant <.
not essential
G yz y'z
00 10
o
x'y'
essential xz xy
not essential essential
Figure 6.29 Essential prime
impl icnnt<;,
Given the nOlion of prime implicants and essential prime implicants. a simpl e
approach for two-level logic optimization is given in Table 6.1.
TABLE 6.1 Approach for automated two-level logic size optimization.
tep Description
Deremlifle prime impliclIIw For e\cry mintcml in the function'", on-set. maximall y expand the tenn (meaning
eliminatc literal'i from the (eml ) such that the term still onl y covers minterms in the
on-set (like drawing the biggest circle possible around each 1 in a
Kmap), Repeat for each minterm. I f don' t cares exist, them to maximall y
expand mintenn\ into prime impli cants (like u:-. ing X's 10 create the biggest circles
for a given I in a K-map),
Add euefllial prim' imp/iclllII_\
to rhe fitllerion's cm'er
Find any minterms covered by only one prime impli cant ( i.e .. by an essential prime
impli c::mt), Add tho e prime impli canlS 10 the cover, and mark the minterms
co\ered b) tho\c implicanlS as already covered,
Cm"er remoinint: 11I/ntenllf hil" Co\cr the remaining minterms usi ng the minimal number of remaining prime
noneuellliul prune II11pliclllll5 impiicants.
The fir;t 1\\0 "eps are exact. The last tep is a bit tri cky. How do we choose which
pnme IInpli cants to u'e to cover the remaining minterm5? Recall the example of Figure
6.27. in v, hi ch the cover in Figure 6.27(a) used two prime implicants to cover the two Is
that would be left after adding cs;ential prime implicant'. whil e the cover in Figure
6.27(b) u,ed on ly one prime implicant to cover tho,c remaining two Is. When there are
on ly tv,o pO>'ibilitie,. we can try each po."ibili ty and pi k the one with fewe t prime
implicant; In the fi nal cover. But what if there were million, . or billion\. of possibilit ies?
We may not h:lve enough compute time to try al l tho;e For large functions
v,lth hundred, of mintcrm, and thou,and, of prime Impl lcnnt '. there moy indeed be mil-
lion, of po;"ble cover, to con;idcr III thc la\l ; tep.
If an npproach tnc, :111 ,uch po"ibilllie,. the :I pproach i, an :il gorithm. If an
approach )U,t tne, a few ,uch po"ibllitie,. the overall two-level ,i/e optimi1.lltion
approach may he a hcumtlc (unlc" the approach can guarant ee that the Ignored po sibil -
1I1C, couldn't IX""bly he pan of an optimal 'olution)
6.2 Combinational Logic Optimizations and Tradeoffs 311
We'lI demonstrate the approach for automated two-level logic size optimization with
the following exampl e.
EXAMPLE 6.9
Two-level logic size optimization with the approach of Table 6.1, illustrated on a K-map
Fi gure 6,30 shows a K-map for the function from Fi oure
6.27, for which we saw thai different covers yieldeddif_
ferent numbers of terms. The first step is to determine all
prime impJicams, shown in the top pan of the fi gure. For
each 1. draw every possible circle involving adjacent
Is. ensunng that each circl e is the largest possibl e.
The second step is to add essential prime impli-
cants to the function' s cover. Notice that the 1
corresponding to mi11lerm x ' y Z (the top righl 1) is
covered onl y by one prime impl icant. namely. x ' z.
know we' ll need to usc that prime implicant. so
we II Include prime implicant x ' Z in the cover. Also
notice that the 1 corresponding to mi11lerm xy z (the
bollom right 1) is only covered by one prime impl icant.
namel y, Xl I , so we' ll include thai prime implicant in
the cover 100. We mark all the 1 s covered by these essen-
tial prime impiicants. noted by italicized Is in the fi2ure.
The last step is 10 cover the remaining Is the
fewest number of prime implicant, There is only one 1
uncovered. and that 1 is covered by two prime impli-
cants, \Ve can choose ei ther prime implicant for the
cover-Iet's choose y I Z I , Thus, the final cover is:
I = x ' Z + xz' + y ' Z
This example uses a K-map merely to illustr:lte 10
the reader the sleps occurring wilhin an automated
tool-such :l 1001 does 1/ 0 1 use K-maps intemalJy, but
rather other means of representing the tenns of a
function.
Automated 1\"o-Level Logic Size Optimization
Using the Quine-McCluskey Method
yz
':z
00 01 1,] 10
0 1 1 1 0
(a )
v;-
1 t
1\ 0
0
,
y'z' ':y'
yz
x' z
(b)
y'z' ':y'
yz
(e)
o o
y'z'
Figure 6.30 liIuSlrntion of [\\0-
Ie'el optimi zation: (a) alJ prime
implic3I11S. (b) including e>.><:ntial
prime implicartlS in the C'O\er, tel
co\ ering remainmg :s..
The most well-known. and in fact the original. approach for automated t\\ o-Ie,e1 logic
size opti mi zation is the Quine-McCluskey method. sometimes ailed the tabular method.
The first step of thi s method finds all prime implicant . The step stan:. \\ ith thl.' func-
tion 's minterms- if we are minimizing a three-variable fun tion. then \\e mi2ht 'all these
three-variable terms. To find all the prime impli ants. the method first ea.:h
three-variable teml wi th every other three-variable teml. and if t\\ O tenus :Ire found that
diffcr by onl) one variable. the method adds a new tenn l\\ ithout thl.' dift'ering .. anablel t
a new set of two-variable tenm. For example. xy l ' and y: differ b, one ,mabl :.
rc. ulting in a new tenll xy being added to the t\\o-.. ariable <et. nc'c dc;nc , ... 'mparing all
three-variable tenns. the method pair of t\\o-"uiable tel111S fer tl.'l11l> that
differ by only one variable. in a :et of one-'mable tel11b. n ';lJ1aN t I11lS
can then be compared for teml> that dilTer b) one' ariable. but tf 'u 'h t'1111' .Ire fc'l1nJ.
J 12 Optimizations and Tradeoffs
then Ihe funct ion evalumes si mply to 1. Actuall y. nOI all terms in a sct need 10 be com-
pared-only tho,c terms whose Ililmber of uncompiemcillcd literals differs by one need
10 be compared. For example. x y z ' and xy z need not be compared. because the
number of uncomplemented lileral differs by two. not one. and thus can' t be simplified
to a new tenn by eliminating a vari able. If at any time in Ihi , step a term cannot be com-
bined wi th any olher term. we mark that term as a prime implicant. Thus. after thi s step,
all marked temlS represent all prime implicants. The method thus provides an approach
for fi nding prime implicants. more efficient than j ust maxi mall y expanding every term.
The second step is 10 add all Ihe essenti al prime implicants to the cover, and to mark
as "covered" all minl enns covered by Ihose pri me impl icant s.
The fina l step is 10 cover all remai ning uncovered mintemls by select ing the fewest
remaining prime implicants to cover Ihose mi nt erms. Trying all the pOSS ibi lities results in
a version of the Quine-McCluskey melhod that is an exact algorithm. Trying just a subset
may result in a heuri sti c.
Methods That Enumerate All Minterms or
Comput e All Prime Implicants May Be Inefficient
The Quine-McCluskey melhod works reasonably for functions wilh perhaps tens of vari-
ables. However. for larger funclions. just li sting all the mintemls could result in a huge
amount of data. A fu nct ion of 10 vari able could have up to 2
10
mintemls-that 's 1024
mintemls. which is fairly rea onable. But a funclion of 32 variables could have up to 2
32
mintemls. or up to about four bi lli on mintemls. Represeilling Ihose mintemls in a table
requi res prohibit ive computer memory. And comparing those minterms Wilh Olher min-
temlS could require on Ihe order of (four billi on)2 computat ions. or quadrillions of
comput ali ons (a quadrilli on is a Ihousand time a trillion). Even a computer performing
10 billion computations per second would require 100.000 seconds to perform all those
computation,. or 27 hours. And for 64 variables, Ihe numbers go up to 26-1 possibl e min-
temls. or quadrillions of mintemls. and quadrillions
2
of computali ons. which could
require a month of computation. Functions with 100 input . which are not that
uncommon. would require an absurd amount of memory, and many year of computa
tions. Even computing all prime impli cants. without first Ii ling all minterms. is
computationall y prohibiti ve for many modem-sized functions.
Iterati ve Heuristic for Two-Level Logic Size Optimization
Becau e enumerating all minterm of a functi on. or even ju>t all prime impli cant . is pro-
hibitive in temlS of computer memory and computation lime for functions with many
vari ables. mo t automated tools u e methods that instead just iteralively transform the
original function's equati on. in an attempt 10 fi nd improvement 10 the equation. Iterative
improvement means repeatedly maki ng small change. to an exisling solution umil we
decide 10 , top. perhap; because we can't find a better ,olution, or perhaps beeau e the
tool has run for en ugh time. As an exampl e of making small changes 10 an ex isting solu-
ti on. con"der the equation:
F - abcdefgh + abcde gh ' +
Clearl y. we can reduce th" equation 'imply by omh,nlng the Iirst two term and
rcmov lIlg ' anable h. re\ ultlllg III F - abcde f 9 + i 1 mnop. However, enumerating
the mllltcrm" J S reqUired III the carlier-de,cn1x:d ,ile optllnll311on methods. would have
6.2 Combin ational Log ic Optimizations and Tradeoffs 313
resulted in roughl y 1000 m' I d . . .
. r III erms an Ihen millions of computations to find the pnme
IInp Icant s-but such enumerali on and computation are obviously not necessary W mini-
mize thi S equalion.
Modem automated logic opt' '. . .
. ImlzaUon lools therefore don t try to enumerate aU the
mill terms for wi th many variables. Instead, those lools start with a given sum-
of-products equati on of the f t' I' k th "
. . unc lon, l ee descnpuon for F above. Those 1001 then try
to transform the equati on little by little into a better equation. meaning an equation with
fewer terms and/or fewer lil eral s. Those tools repeal, or iterote. until they find no further
Improvement or until some maximum time all ocated for Ihe 1001'S execution has expired_
. Heunstlcs for such two-level logic optimization
III modern tools can be quite complex. However a I yz
simple heuri stic Ihat is reasonably effective
repeated applicati on of Ihe expand operati on. The
expalld operati on means to remove a literal from a
teml and Ihen check whether the new teml is legal.
Removi ng a literal makes Ihat term cover more
temlS, like drawing a bigger circle on a K-map-
Ihus the name 'expand." For example. consider the
funclion F = x ' z + xy' z + xyz . We might
lry to expand the teml x' z by removing x '. or -by
removing z. Note Ihat expanding a teml reduces the
number of literals-the concept that expanding a
term redll ces the number of literals in a teml may
take a whi le for you to get used to. Thinkino of K-
map circles may help. as shown in Figure 6.31-1he
bigger Ihe circle. the fewer Ihe resulling literal . An
expansion is legal if the new teml covers onl y min-
terms in Ihe functi on' s on-set. or equivalentl y. does
lIot cover a mimeml in Ihe function's off-set-in
other words, an expansion i legal if the new teml i
o
(a)
yz
o
(b)
00 01 11 10
o
o o
r(z xyz
10
)(z
o
)(
o
r(z xyz
Figure 6.31 bpan ions of term
, Z in the fuoctioo F = x' z -
xy ' z + xyz: (a) Ie",,!. (bl 001
legal (because the tenn
cmer.; Os).
still an implicant of Ihe function. Figure 6.3 1 (a) shows that expanding term x' : to z for
Ihe given funclion is legal. as Ihe expanded teml covers onl y 1 . whereas expanding 'z
to x ' is not legal . as Ihe expanded teml covers at least one O. Lf an e: pansion is legal. "e
replace Ihe ongillal tenn by Ihe expanded teml. and we look for and an, OIher
lerm cOI'ered by tile expallded term. tn Figure 6.31 (a).lhe expanded term z terms
xy , z and xy z. so both Ihose latter tenns can be removed.
I ote that we illustrated Ihe expand operalion on a K-map merel) to aid in under-
tandi ng the intuition of Ihe operation-K-maps are nowhere to be found in heuri -[j I'\(}-
level logic size minimi zati on tool s.
As anolher example. for ti,e earlier inuodu ed function:
F - abcdefgh + abcdefgh '+ j lmnop
We might start b trying to expand the fir.;t tenl1. a bcde f gh_ ne "'pan i< n of th t t ml
b bcde fgh (i.e" we fCmoved the literal a ). Ho\\e\er. thaI term \ __ the teon
a 'bcde fgh. "hi h coven.mintenlls th31 are not in the fun tion' on-: t. $0 thaI ' pan-
sion not legal. We might try other e. pansions. finding them n t I '. until \\ e n:
OptimIZations and TradeoHs
aero" the to abcdefg (i.e .. we removed the literal h). That term strictly
CO\of' abcdefgh and abcdefgh '. both of whi ch are clearl y impli cants because they
appear in the origimll functi on. and thus the new tcrm Illust also be an implicant. There-
fore. \\e replacc the fir,t ten11 by the expanded term:
F = abcdefg4 + abcdefgh ' + jklmnop
and wc also rcmo\ c the second term. since that ten11 is covered by the expanded temJ:
abcdefgh + + jklmnop
abcdefg + jklmnop
Thus. lI;;ing j ust the expand operation. we have improved the equati on.
EXAMPLE 6.10 Iterative heUristic two-level logic size optimization using expand
tht: follo\\ ing equation. whi ch was also minimized in Example 6.4. using repeated appli-
cation of the oper:lIion:
F = xyz + xy z' + x ' y ' z ' + x ' y ' z
In other \\ orcl<. the on-,et consist.'> of the mintemls: 17.6. O. I I. and so the off-set consists of the
mlillerm" 3. 51
Let\ expand the !erms from left to righl. so we ll stan with xy Z. We can try to expand xyz
to xy. b that a legal expansion" xy covers minterms xy z ' (mi nterm 6) and xy z (minterm 7).
both III the on-,el. Thus. the expansion is legal. so we replace xy Z by xy. yielding the new
equation:
F = xyz + xyz ' + x ' y ' z' + x ' y ' z
We al ,o look for implicants cmered by the new implicant xy. xyz ' is covered by xy. so we
ehmillate xy Z ' . yielding:
F = xy + x 'y'z ' + x ' y ' z
Let\ continue ll)lng to expand that first lenn. \Ve can try expanding it from xy to x. The term
X co\ e" mintcrm' xy ' z ' (minlcrm xy' z (minteml 5). xy z ' (minterm 6). and xyz (min-
Icnn 7). The ICon X thul; covers mintenns "' and 5. which arc not in the onset. but instead in the
ofT-", 1. Thu,. that expan"on i, not legal. We can al,o try expandi ng xy to y, bUI we' ll find again
the not legal.
We ml ghl then the neXl term. x ' y , Z ' . Let"' try expand II to X ' Y , . That teren
co\." mlllterm' x ' y' Z ' (mi nterm 0) and x ' y , Z (minterm I). both in the on-sct. so the expan-
"on 1\ leg"1 We thu, replace the term by the expanded ne:
F - xy + + x ' y'z
We ched. fur other term, co\ered by the expanded tenn. and find lhat X ' Y ' z is covered by
/. ' I'. '0 v.e rcmO\c x ' y ' Z. Ica\lOg:
F - xy + x ' y'
\\e Cdn try c'pandlng the term x ' y ' further. hut ""ll hnd th.1I both PO' Ible expansions
(/ . or y , ) are not legal the above cquJllon rcprc\Cnl the mUIII1lI/cd equati on. Notice
rh..at Ihl' hJppcn, 10 he Ihe '<'Ole re,ull ..1\ v..c oht.uncd when we "H",fllI/cli the muial cqua-
flon In I '(.Impll' 6A
6.2 Combinational Logic Optimizations and TradeoHs 315
Even .though the heuri stic based on expand happened to generate the optimally minimized
equallon In the previous exampl e, there is no guarantee the results from the heuristic will
always be optimal.
. More advanced heuri sti cs utili ze additional operations beyond ju t the expand opera-
lion. One such operat ion i the reduce operation, which can be thought of as the opposite of
expand. The redll ce operation takes a tenn. and tries 10 add a literal to the tenn_ checking
that the equallon wllh the new tenn still covers the functi on. Addino a literal to a tenn i like
reduc ing the size of a circle on a K-map. Adding a literal 10 a te";' reduce the number of
mlllten11S covered by the tenn, hence the name redllce. Another operation is irredllndant-
which tries to remove a term entirely, checking that the new equation till covers the func-
li on. If so, the removed term was "redundant," hence the name irredlllldalll. Heuristic may
lIerate among the expand. reduce. irredundant. and other operati ons. uch as in the fol-
lowing heuristic: Try 10 random expansion operations. then 5 random reduce operations.
then 2 irredundant operations, and then repeat (i terate) the whole sequence until no
improvement occurs from one iteration to the next. Modem (wo-Ievel ize optimization
tools differ largely in their orderi ng of operati ons and their number of iterations.
Recall that we sai d that modem heuri stics don't enumerate all of a function' min-
terms. yet in the previous example we did enumerate aU the mintenns- actualh'. we "'ere
given the mimemls in the initial equation. When we don ' l initially kno" the -rninterms.
many advanced methods ex.ist to efficiently represent a functi on' on- et and off-sel
without enumerating the mimenns in those ets. and also 10 quick!) check if a tenn
covers lerms in the off-set. Those methods are beyond the cope of the book. and in tead
the subject of textbooks on digital design synthesis. But hopefull y you no\\ get the basi
idea of heuri stic two-level minimi zat ion.
One of the original 100is Ihat performed automated heuri tics as well as exacI two-
level logic optimi zation was called Espresso. developed at the University of California
Berkeley. The algorithms and heuri stics in Espresso fomled the basis of man, modern
commerci al logic optimi zation tools. -
Multilevel Logic Optimization-Performance and Size Tradeoffs
We have thus far di scussed two-level logic size optimi zation. H we\'er_ in pro rice_ \\e
may not need the speed of two levels of logi c. We may be \\ illing 10 use three_ four. or
more levels of logic if those additional level s reduce the amount of required log; _ A ' a
simple example. consider the equation:
Fl = ab + acd + ace
Thi s equation CHn ' t be minimized. The resulting two-Ie\e! ircuil is sh \\n in Figure
6. _(a).
, e could. howeva. algebraically manipul ate the equation a< follo\\s:
F2 - ab + ac(d + e) = a(b ... C( ... e
That equation 'Ill be implemented \\ ith the circuit ,ho\\ n in Figtlre tl.32(bl. mulu-
Ic\d logic implementation in fe\\er tmn Jt th \"f Ill\. gal
delays. li.' illustrated in Figure 6.32(c). The multile\ I nnpl'm nUll >n Ihu, rep >'nt,
/rc/{/eojJ compared to the t\\ l>-k\'d implemem,niOll _
316 Optimizations and Tradeoffs
b
FI = ab + acd + ace
(a)
FI
16 transistors
4 gale-delays
F2 = a(b+c(d+e))
(b)
,i::L F' .F2
'en 10
:. 5
I 2 3 4
delay (gale-delays)
(c)
Figure 6.32 muhilc\cllogic to tradeoff performance and ize: (::I) Iwo lcvel circui t.
(b) muhilcveJ circuit wit h fewer transistors. (c) illustration of the size versus delay tradeoff.
umbers in.. ide gales represent transistor CQunts.
Automaled heuri stics for multilevel logic opti mi zat ion iterati vely transform the
initial function's equati on. much like for two-level logic optimizati on, optimizing one of
the criteria at the expense of another.
EXAMPLE 6.11 Multilevel logic optimization
- - - -- ----------
Minimize lhe following function 's circuit ize. al the expense of perhaps slower performance. using
al gebraic manipulation. Pl ot the tradeoff of the initial and size-opt imi zed circuits with respect to
size and delay.
FI - abed + abeef
The ci rcuit corresponding to thi s equati on is shown in Figure 6.33(a). The circuit requires 22 tran
sistors and has a delay of 2 gate-delay.
a
b
c
d
a
b
c
e
f
22 transistors
2 gale-delays
F I = abed + abcef
(a)
FI
18 transistors
3 gate-delays
F2 = abc(d + ef)
(b)
F2
-W
2
LF'.
F2
C1) 15
10
:. 5
I 2 3 4
delay (gale-delays)
(c)
Figure 633 Mulul evel IOglc to tradeoff pe rformance and ,i7e: (n) two-level circuit . (b) multilevel
ClrC"'t IIo lih fewer lran, i' tor<. (c) tradeoff of Size ye"u< delay. umbe" in<lde gate, represent
Lran\ l \ tor COUOl\.
We can al gebraically manipulate Ihe equauon by factonng out the ab c term from the
term\. aco foll ow\
F2 abed + abee f - abe(d e f )
The CIrCUli for that equation" ,hown In Figur< 6.3J(b) The CIrCUli require, only 18 transi -
U",. but hJ longer delay of ) gate-delay, . The plot In figure (, 13(c) ,ho,", the sile and
performance ror ea( h
6.3 Sequential Logic Optimizations and Tradeoffs 317
EXAMPLE 6.12 RedUCing noncritical path size with multilevel logic
Usc multilevel logic to reduce the size of the circuit in Figure 6.34(a). without extending the cir-
cuit 's delay. Note that the circuit initially has 26 transistors. Furthermore. the longest delay from
any input to the output is three gate-delays. That delay occurs through the path shown by the dashed
line in the figure. The longest path through a circuit is the circuit's critical path.
26 transistors
a
3 gale-delays
b
d Fl
e
,
g
FI = (a+b)c + dIg + elg
(a)
22 transistors
3 gate-delays
25
L
F'
-W 20 -F2
CD 15
- 'l!' - - F2 '" c: 10
,,,"'- - 5
F2 = (a+b)c + (cJ+e)lg
(b)
1 2 3 4
delay
(e)
Figure 6.34 Multil evel optimizati on that reduces size without increasi ng delay. by altering a
noncriti cal path: (a) origi nal circuit, (b) new circuit with fewer tran istors but same dela) .
(c) illustration of the size optimization with no tradeoff of delay.
The other paths through the circuit are only two gate-delay . Thus. if we reduce the size of the
logic for the noncritical paths and extend those path to three gale-delay . we would nOl ha'.., extended
the overall delay of the circuit. We focus on the noncritical pans of the equation for F I in Fig=
6.34(a); the equation has its noncritical parts italicized. We can algebraically modify the noncritical
parts by factoring out the lenn fg , resulting in the new equation and circuit shown in Figure 6.34{b).
One of the modified paths is now also lhree gate-delays. so we now have tv.'o equally long critical
paths. both havi ng three gate-delays. The resulting circuit has only 22 transistors rompared to 26 in
the original circuit. yet sti ll has the same delay of three gate-delay . as illustrated in Fig= 6.34(c).
overall . we've pcrfonned a size optimization with no penalty in perfomlance.
Generally. multilevel logic optimization uses factoring (e.g .. abc
a b ( e+d)) to reduce the number of gates.
Multilevel logic optimi zat ion is probably more commonly u ed today than two-level
logic optimization. Multilevel logic optimization i also exten ivel) u ed automatic
tools that map circuits to FPGAs. FPGA will be di scu ed in Chapter .
6.3 SEQUENTIAL LOGIC OPTIMIZATIONS AND TRADEOFFS
State Reduction
In Chapt er 3. we described the design of equential logic. namely. of ntrollers. Wben
creating the F M. and conveni ng the F M to a tate-register and logi _ we an
some optimi zati ons and trndeoffs.
lal e reduction . also kno\\n as store minimbttion. i an ptimization redu, < the
number of F M stme without changing the F beh'l\ ior. B) mlu -ing th number
;.tates. IYC mny rcdu e size of required state regi,ter that nnplcm nt, th
318
D
OptimIZations and Tradeoffs
tbtl... rculicing circui t size. x...: ;_O_u...:tP_u_tS" _-'- y ______ ___
Reducing the number of is
po",iblc \\ hen F I contains
"'-latc!'> Ih31 ar equivalent 10 one
anolher. For c\ample. consider Ihe
of Figure 6.35(a). having
inpul x and OUIPUI y. Examinali on
reveab Ihm ,laIC, 52 and 53
appear 10 be Ihe , a me as SlaleS 50
and 5 /. of whClher we
sIan in 50 ';, r 52. Ihe OUlpU!!, will
be idemical. For example. if we
y=O y= 1 y=O
(a)
y=O y= 1
(b)
y=1
x x'
U
if x = 1,1,0,0
Ihen y = 0,1,1,0,0
(c)
start in SO and the input sequence Figure 6.35 El il11 in::lIi ng redundant "i tatcs: (a) ori ginal
FSM. (b) cqui \alenl FSM wil h fewer Slales. (c) Ihe
for four clock edge, is I. I . O. O. FS 1, arc indi <l inguishable from Ihc outside. providing
the SWle sequence wi ll be 50. 51. idenlical OUI PUI beha\l or for any inpul sequence.
5/.52.52. so Ihe OUI PUI sequence . '.
\\ ill be . I. I. D. L. If inslead we SIan in 52. Ihe same Inpul sequence wtll resull III a Slale
sequence of 52.53.53.50.50. so Ihe OUIPUI sequence \\ ill again be . I. I. O. O. In facl ,
if we tried all inpul sequences. we would find Ihat Ihe OUIPUI sequence slartIng
from slate 50 wou ld be idenli cal 10 the OUIPUI sequence slaning from Slate 52. Slates SO
and 52 are th us equivalent. Likewise. slates 5/ and 5J arc equivalem for the reason.
Thus. \\ e can redraw Ihe FSM as in Figure 6.35(b). The FSMs In FIgure 6.3) (a) and (b)
ha\e exacll y Ihe ,al11e behavior-for any sequence of inpllls. Ihe IWO FSMs provide
exacll y Ihe same sequence of OUI PUI . If we encapsul ale Ihe FSM as a box as 111 FIgure
6.35(c). Ihe world cannOl dislingui sh bel ween Ihe IwO F Ms based on the OUlputs.
Two states are equiva/ell t if;
Ihey "'l>ign Ihe same values 10 OUIPUI . A 0
for all possible sequences of inpuls. the F M will be Ihe same slaning
from either SlalC.
For large FSM,. visual inspeclion cannOI guaranl ee Ihal we've removed all redundant
'laICI-a more ,y'lemalic approach is needed. which we now inlroduce.
Impli cation Tables
Intuitively. we know Ihal IWO stales cannOI be
equivalent if Ihey produce dilTercnt OUIPUIS for
Ihe 'a me 'cquence of inpul'. Conl ider the FSM
in Figure 6.36. which is 3lmo>l identi cal 10 the
FSM in Figure 6.35 with a Ilighl modificalion -
in "ale 52, the FSM now OUIPUI' y - I in"ead of
y O. Stale, SO and S2 Iherefore clearly are nOI
cqulvalcOl. becau,c Ihey have dilTerenl OUIPUI
value, Stale, 5/ and 53 produce the 'arne OUlpu!.
hUI "'hen we I,"nlilion from either 'WIC 10 the
corre'p<>ndlng ne'l ,tale. Ihe OUIPUI dllTc", . FOr
c ample. I f the FSM 10 ,late S / and x
Occamc, r. Ihe nexl 1,lle (S2) oulpul, y . , bUI If
InputS" x; au/putS" y
x'
Figure 6.36 f\ \,trialll of Ihe FS f in
Figure (, 15 SO and S2 cannO!
oc cqlll\,llenl occaulc IhC) OUIPUI
dillerclIl \Jluc' . ,lnd 't.lIe, SI alld 53
c.ln't he equl\lIlcnl Occau-,c they hnH:
noncllUI\alent nc'l "'laIc.. lor the
!Oput \ Jim:'
6.3 Sequential Logic Optimizations and Tradeoffs
319
the FSM had staned in 53. Ihe nexl Slale (50) would OUIPUI 1-0. Thus. 51 and 53 cannot be
eqUiva leOl , because Ihe same inpul sequence results in a di fferent OUIPUI sequence.
If IWO Slates' OUIPUI S are nol equivalent. Ihe IWO slates clearly are not cquivalenL
Funhermore, if IWO Slates' next stales are nOl equi valent for a given inpul value. then the
IWO Slal es are also not equi va lent. Using these concepts of nonequivalent talCS. Table 6.2
descnbes an algorilhm for reducing an FSM' s number of stale.
TABLE 6.2 Algorithm for state reduction.
Slep
Mark Slate pairs havillg difJerem
OlflpUlS as I/ onequivalelll
For each unmarked SUl l e pair.
write the "ext st{Jfe pairs for the
same illPlII \'alues
For each lIllmarketl state pair.
mark slate pairs having nOllequil'lllelll
ll e.rrSlate pairs as I/ oll equi\ 'alem.
Repeal Ihis step III/til fl O cluII/ge
OCcurs, or ullIil al/ SUl l es are marked.
4 Merge remaining state pairs
Descripti on
States having different outputs ob\'iously cannOI be
equi valent.
States with nonequivalem stales for the same
input values can't be equi\'alent. Each time through
this slep is called a pass.
Remaining state pairs must be equi\aJem.
When comparing all poss ible pairs of Slates by hand. usi ng a graphical lable en UTe
Ihal we don'l mi ss any pairs. Consider the FSM of Figure 6.35(a). The F has -I tatcs.
Iherefore Ihere are -1 2 = 16 possible slale pairs. Figu"; 6.37(a) hows those po <ible pairs
graphicall y in a labl e. wilh Ihe Slales li sled along the rO\ and column headings. Ea h :-eU
corresponds 10 a Slate pair. We can simpli fy table size b) remo\'ing red-undanl ceU-
(e.g .. row 50. col umn 5/ is Ihe same as row 5/. column 50) and removing meaningless
cell s along Ihe diagonal of the labl e (Slat e 50 is obviou I) equi\'alent 10 :tale 0).- The
reduced lable is shown in Fi gure 6.37(b).
)S1
52
53
I
50 51 52 ]
(a
) 50
51
52
tii
53
m
1
(b
Redundanl
Diagonal
50 51 52 53
Figure 6.37 Table of ,Iatc 1'-1i",: ta) original labk comp.1ring JII rJII'. lbl ' "url'r tJN
only uni que and rclCqUH pain.. (c) una initial rililng. 111 \\ ith ,Iak' inf,-'mlJliCln.
Figurc 6.J7(c) sleps through the .Iate reduction algorithm of Tabl' (:0.2 t, r the ,:'\1
of Figure 6.35(3).
320 6 Optimizations and Tradeoffs
Step I involves looking a! every table cell and marking Ihat cell with a large " X" if Ihe
stales for Ihal cell have diITerenl OUlputS. We refer 10 such cell as bei ng marked. The first
stale pair (5/.50) is not equivalenl because SO OUI PUIS Y - O. whil e 51 OUIPUIS Y = I. We
Ihen look al laic pair (52.50). (52.5 / ). and so on. and finally (53.52). marking state pairs
having differenl OUlpUIS. resulting in the Xs shown in Figure 6.37(c).
Step 2 involves wriling Ihe nexi state pairs for each remaining unmarked cell. There are
IWO unmarked cells:
(52.50) (ci rcled in Figure 6.37(c: When x=1. state S2's nexl slate is 53, while
state SO's nexl stale is ' I (we see Ihi s by looki ng at Ihe FSM in Fi gure 6.35(a)).
Thus. we write " (S3.SI)" in tha! cell (the order doe n'l mail er). meaning thai for
slales 52 and SO 10 be equi valent. 53 and 51 muSI be equi valent. We Ihen consi der
Ihe case when inpul x=O. in which case Ihe nexl Slales are 52 and SO, so we wri le
"(52.50)" in Ihat cell also.
(53.51): When x=O. the next states are SO and 52. so we wrile (50.52) in Ihe cell.
For x= 1. we wrile (53.51) in the cell.
Step 3 involves marking as nonequivalent any unmarked cell s whose next slate pairs are
already marked as nonequivalent. Looking at cell (S2.S0). the next slate pair (53,5/ ) is
nOI marked. nor is next slate pai r (52,50) (which happens 10 be the current cell ), so we
can' l mark Ihis cell. Likewise, for cell (53.51), Ihe next state pair (SO.S2) is nOI marked,
nor is Ihe next Slale pair (53,S I), so we can' t mark thi s cell.
Because we made a pass Ihrough slep 3 wi thout any changes. we don'l repeat slep 3.
and inslead move on 10 step 4.
Step 4 involves declaring the unmarked tat e pairs as equivalent. so 52 and SO are equiv,
alent. and 53 and SI are equivalent. To finalize step 4 of the algori lhm. we combine the
equivalent tates in the FSM. After combi ning tales 52 and SO. and combining tales S3
and SI. we oblai n the FSM in Figure 6.35(b).
The method we have ju I employed is known a Ihe implicatioll table method for
state reduction.
Naturally, not every FSM can have its number of Slates reduced. For example, lei'
use the implication table method on the FSM in Figure 6.36. With 4 lale. the FSM's
implicalion table will be the same ize as the previous example. as shown in Figure
6.38(a). Step I marks state pairs wilh different OUlputs. shown in Figure 6.38(a). Step 2
li sls. for each unmarked cell, Ihe neXI tate pairs for identical inpul values. as also shown
in Figure 6.38(a).
In step 3's first pass. we firSI examine Ihe cell for late pair (52. 51). aturally.
Ihe nexI late pair (52. 52) is equi valent. The neXI Male pair (S3. S I) is unmarked. so
we cannot mark (52. 51). We then examine the cell for pair (53.51). and find th31
the nexl pair (50.52) il\ cell marked. Thi\ lell, u\ Ihm 3 and I eannol be
equi.alelll (because they could transition 10 noneqUlvalent "ate, for the sume inpul
we mark the cell for (53.51). Similarly. we mark (53.S2) ,ince its firsl neXI
'tate palf. (50.052). its cell marked. omplellng ,tep 1\ Ii "I pas re. ults in Ihe
table of Figure 6.38(b).
- - -- -----
(a)
so
(S2,S2)
(S3.S1)
(SQ,S2) (SQ,S2)
(S3,SI ) (S3,S3)
61 62
6.3 Sequential Logic Optimizations and Tradeoffs 321
Figure 6.38 Impli calion lable for FSM in Figure 6.36: (a) table after initial setup and steps I and
2. (b) after slep 3's firsl pass through the table. (c) after step 3's second and final pass through the
lable.
Because the table changed during the first pass (we marked rwo tate pairs). we must
make a second pass, because changes in the table may affect state pairs that we already
looked at and left unmarked. In the econd pass, we again look at state pair (S2.5/ ). Nat-
urall y, the next state pair (S2.52) is equi valent. The next state pair (53.5/ ). however. is
now marked, and therefore we mark (52,5/ ).
With all pair in the lable marked, as seen in Figure 6.38(c), we can conclude that no
states in the FSM are equi valent, and thu we leave the FSM unchanged.
We now provide another example of stale reducti on.
EXAMPLE 6.13 Minimizing states in an FSM using an implication table
Consider the FSM in Fi gure 6.39(a). Unlike previous examples. this FSM has 5 Iates. resulting in
more possibl e state pairs than in previous examples. The first task in minimizing the FS.M"s stares is
to construct an implication table so we can compare every state with en h other as a stale pair.
Inputs: x; Outputs: y
x'
y=1
(a)
y=1
(b)
($4.S3)
(SO.SO)
S3
Figure 6.39 n M needing Inte reduction: ta) original (h) impl; '31100 t3ble.llt<r _
I and _.
In step I of our ,tatc reduction algorithm. \\c marl \\11h an X !<lJtc plIf' WI"
lire nOI cqUi\"3icOl beenu.c Iheir UIPUI dilfer. as ShO\\l' in Figure 6.3'l<,b\.
I II
OpttmlZations and Tradeoffs
In stcp 2, \\ \.' write in all the next pairs for unmarked cell:-. of the implicati on table, as
.. ho\\ n in Fi g. ure 6.39( b). Since there arc onl y IWO of inputs (either x=O or
\ = 1), each ulll11arl-.t:d cell \\ ill have twO next slate pain-.
In sirp J's first pass. we ll1ark each SHitc pair if olle of their next stat e pai rs is marked. During
our pa!\, through the tabl e. we wi ll examine four Slale pai rs. Starting wi th (52.51). we see that
both of it:.- nl;':'( l Stal e pairs are unmarked. Looking at (53.50). \\ C one of its nexl Siale pairs.
(53.52\' i, marked. so lI' e mark (53.50)'s cell. We al so mark (5-1.50) bec,,"se ils neXl state pair (S4,S2)
i ... marked. \Ve (5.,f,53) unmarked as both of its next SHih: pairs arc unmarked. thus completing
the P3"", Fi gure 6AO( a) refl ects thc results or our fi rst pass through the impl icati on table.
Becam.e we marked new state pairs in the first pass. we conduct 3. second pass through step 1
During thaI we find no new cells to mark. Ic:.wing the table unchanged. We thus move on to step 4.
In step ..t we decl are the unmarked state pai r (52. 5 I) as equivalent. and the unmarked state
pair (S';.53) ao;; equi valent. \Ve combine states 52 and 51. and we combinc states S4 and 53. resulting
in the nc\\ shown in Figure 6...JO(b). Note that the two transiti ons with conditions x and X
from SO could be repl aced by one IrJllsitioll with no conditi ons.
Inputs: x: Outputs: y
(a) (b)
Figure 6.40 Implicalion lable and minimized FS I: (a) impl icali on lable afl er firsl pas .
(b) minimi zed "ate machine wilh stales 5 I and S2 combined. and S3 and S-I combined.
In Ihi , e.<ampl e. by reducing the number of slales from 5 down to 3. we have reduced Ihe
minimum I, lale rcgi, tcr site rrom 3 bits down to 2 bit,. perhaps reduci ng circuit size.
Sometimes equi valent states may overlap. For example. assume that for some FSM with
' tates {TO. TI . n. n . T/}. you find that state pairs (TO.TI ). (TI.n) and (n.TO) are
equivalent. How do you deal with the overl apping equi val encies'? The answer is simple:
the th ree qates. TO. TI. and n can be combined into a single
The impl icati on tabl e method is suitable for hand-optimizing small FSMs such as
tho,e introduced in the previous cx(lmples. but can qui ckl y become unwieldy for FSMs
"'ith more Consider the IS-stat e FSM in Fi gure 6.41 . reduced implicati on table
"' ould requi re 14 row' and 14 column'>. and 105 , tatc pair'>. With two combinations of
tnput'> (namely. a = 0 or a = 1), e:lch statc pair would have two Mate pairs. and. in the
"'ON ca, c. wc would need to chcck 105' 2=2 10 nc t ' tate pair, during our firM pass
"lone. What if the ,ame FSM had four input ('>ny. a, b. C. and d) in,tcad of one? With
four tnput'>. there would be 4' = 16 combination, of tnput ' (i .e. a' b ' C ' d '. a ' b ' c ' d,
0' > 'rrj' ... . abed ) and up 10 16 nc" , wte pair, III each cell In the implicati on IUble. If
tn"citd the FSM had. ,ay. 100 ' latc'> ((I rca,>onabl e number). the implication wble would
h,,\c on the order of 100* ' 00 = 10,000 '> tal e Pit " "
State Encoding
6.3 Sequential Logic Optimizations and Tradeoffs
323
Inputs: x: Outputs: z
Figure 6.41 A IS-Slate FSM.
z=t
State reduction is therefore lypicall y performed using automated tools. For mailer
FSMs, the tools may implement the impli cati on table method. For larger FSMs . the tools
may need to reson to heuri stics to avoi d inordinatel y large table sizes numbers of ne.<t
state pairs. -
Even when we reduce the number of states, we are not guaranteed that such state
reduction aCluall y reduces the size of the reSUlling logic. One re';on is because reducino the
states might not reduce the number of required -register bits-reducing the States from
15 down to 12 does not reduce the minimum state register size. which is in either case.
Another reason is because, even if the state reducti on reduce the tate re!!i ter ize. the
combinational logic size could pos ibly ill crease with a smaller state re!!i due to the
logic having to decode the state bits. Thus, automated state reduction t;'1 may need to
actuall y implement the combinational logic before and after state reduction. to determine if
state reduction ultimately yields improvements for a panicular FS 1.
Stale ellcodill g is the task of assigning a unique bit representati on for eacb tate in an
FSM. Some state encodings may opti mi ze the resulting controller circuit b\ redu im!
circuit size. or may trade off size and performance in the circuit. We now \
method for state encoding.
Alternative 1inimum-Bitwidth Binary Encodings
Previously. we assigned a unique binary en oding to ea h state in an FSM usi ng the
fewest number of bits po sible. representing a lII illiIIIUIII -biI",idlh biliary ell odi;; . If
there were four states. we used twO bit . ' f there were fi\'e. ix. seven. or st tes.
used three bi ts. The encodi ng represented the state in the ontroller's $t:1t lbere
are many ways to map minimum-bitwidth binary en odings to :1 of :lal 3\ \\ e J.re
given four states. A. B. C. and D. One en oding is .-1 :00. B: 1. :1. D: 1 . -. n(,th r
encodi ng is A:Ol. 8: 1 O. C. ll. D: OO. In fa t. there :1re 4*3 _ *' = 4! = _4 p'-'lS, i I
encodings into twO bits (4 encoding choice ' for th ' lirst stale. 3 for th ne" .
for the next. and I for the last state). Freight .'tate . lh're are " . or o\er 40-<)00. po: " i I'
encodings into three bits. For J states. there are N! (.V facto';;, )) IX , il'lk en :-c'Xling, ---a
huge \lumber for an) greater than 10 r $" . ne encoding re, ult in I '-'
324 Optimizations and Tradeoffs
combi nati onal logic than another encoding. Automated tools may lry several different
encoding' (but not all N! encodings) to redu e combinmional logic in the controller.
EXAMPLE 6.14
Alternative bll1ary encodll1g for three-cycles-high laser timer
I n Example 3.7. we encoded "laic' u.,mg a
encoding .... truting with
00. Ihen 0 l. Ihen 10. nnd Ihen II. The
rc ... ulting dC' lgn hud I grill.: inpuh (ignoring
invencr ... ), We can try In,tcad the ;Iltcmauve
binary cncmling \ huwn 10 Figure 6.42.
Tabl e 6.3 pruvide, Ihe "aIC lable for Ihe
new cnc(xllng. , howlng the difference ... from
the original CI1COdlllg.
From the \trw.: table. \\C obtain the fol
lowing CClulIlion' for the three combinational
logic output ... of a controller:
Figure 6.42 La"cr timcr diagrnm with
altcmall\'C binary ,Iatc encoding.
x - s I , sO (nole from Ihe lahle Ihal x-I
If sl-lor sO-I )
nl - 51 ' sOb ' '51 ' sOb + slsOb ' + slsOb
nl - 51 ' sO + 5150
nl - sO
nO - sl ' sO ' b + 51 ' sOb + 51 ' sOb '
nO - sl ' sO 'b + s l ' sOb + sl ' sOb +
5 I ' sOb '
nO - 51 ' b(sO ' + sO) + 51 ' sO(b + b ' )
nO - sl ' b + 51' 50
1l1C resulting circuit would have only 8 gate inpulS:
2 for x. 0 for n 1 (n I i< connecled 10 sO direclly wilh
wire). and 4 + 2 for nO. 11,e 8 Snle inpul is ignificanlly
less Ihan Ihe 15 salc inpuls needed for Ihe binary
encoding of Example 3.7. This encoding reduces size
wi thout any increase in delay. thus repre enling an
One-Hot Encoding
TABLE 6.3 State table for laser timer
conlrolier with alternative encoding
Inputs OutPU15
51 sO b x nl nO
0 0 0 0 0 0
Off
0 0 I 0 0 1
0 I 0 1 1 I
Onl
0 1 1 I I 1
I 1 0 1 1 0
On2
1 1 I 1 I 0
--
1 0 0 I 0 0
On3
1 0 1 1 0 0
There is no requirement that we encode a set of states using the fewest number of bilS.
For exampl e, we could encode four states A, B, C, and D using three bits instead of just
two bils. such as A:OOO, B: Ol1. C:llO_ D:llI. Using more bits requires a larger state
register. but possibly less logic. A popular encoding scheme is called oll e-hol encoding.
wherein we use the same number of bit for encoding as there are states, and each bit
corresponds to exact ly one state. For example, a one-hot encoding of four states A, B, C,
and D uses four bi ts, such as A:OOOl, B: OOI 0, C: Ol 00, D: 1000. The main advantage of
one-hot encodi ng is speed- becau e the state can be detected from just one bit and thus
need not be decoded using an AND gate, the controller's next state and output logic mal
involve fewer gates and/or gates with fewer inputs. resulting in a shoner delay.
6.3 Sequential LogiC Opllnuzotlons and Trodeoffs
325
EXAMPLE 6 15 One-hot encodll1g example
InpulS- non , 0u!pu1S:,
Figure 644 One-
hoI encooing can
reduce delay: (a)
minimum binary
encooing, (b) onc-
hOi encooing. (c)
though 10lal sizes
may be roughly
equal (one-hoI
encooi ng uses
fewer gales bUI
more flip-flops).
one-hOI yields a
shoner eri li cal path.
Con<ldcr Ihe '1mple 1- M (II h gure 6.4),
\\ hl ch n:pc.ltctJl) gcncnHe the nUlllul
,"quence 0_ 1. 1. 1. 0, 1. 1. 1. elc "
'IrJlghtforn.lrd I1llnlln.1I blll<lf) cncoomg I'
,h \\ n. \\ hleh I' then cm"cu OUI and n::plu cd
v. IIh n one-hOI
nle bmar) cncOOIl1!!, r'C\uh, III the '1I11e
table \hown In 'PJblc 6.4 'nc f"C,uhlllll C(IUU-
lion .. 3n:
x 0 x_1
x. 1 x_1
Figure 6,43 FSM II" gi\' ' II ,cqucllec.
n1 - 51 'sO + 5150 '
nO - sO'
x I + sO
TABLE 6 4 StOIO lable usi llU hillory
encedlllg
The one-hOi cncodlllg rc\uh, III the: t.lle
lable ' h"" n III Tuhle 6.5 Inc ""ulling <qU,I-
li on' arc
n3 .2
n2 - 5 I
nl - sO
nO - s3
- 53 + s2 + 51
A
/J
/)
Inl)UIS
s I sO
0 0
0 I
0
OUlputs
nl nO
0 I 0
0
I I
0 0
Figure 6.4-l \how\ Ihe rc,ulllllg clrcuiL,
for each encoding. -Inc binary ellcooillS Yield,
more gate" but more Hnponol11ly. require,
TABLE 6,5 Stale toble uslnu Olio-hOI ollcoding,
D
Input "
53 52 51
000
o
o
o
o
I
o
o
Output.,
sO n3 n2 nl nO
I 0 0 1 0 0
001 0 - 0- ]-
o
o
1 0 0 0 1
0- 0- 0--
Iwo le'Ol, of logic 11,e one-hOI cncoolll8 III
Ih".example require, only one bel of I08ic. II
NOllcc the logi C 10 gcncrJle the ncxt 'LUle 11
I ') just Wire!;! in th" example (olhcr example,
may require \Omc logic). Figure 6.44(c) lliu,-
lraleS.lhal lhe one-hOI encoding ha, les, delay,
mcanlOg we could Uf\C. a fa\ter clock fre.
quency for that ci rcuit
---__ ...L-_
, 2 3 4
delay (gale-delays)
(e)
326 6 Optimizations and Tradeoffs
EXAMPLE 6.16 Three-cycles-htgh laser timer using one-hot encoding
In Example 3.7. we encoded stales
using a ... traightforward binary
encodi ng. with 00. lhen a l.
then 10. and then 11. Herc. we'll
pafonn a one- hOI encoding of the
four !-laICS. requiring four bit s. as
shown in Fi gure 6.-l5.
Tabl e 6.6 shows a !- Iale wble for
the FSM of Figurc 6,45. using the
one- hoI encodi ng of the stales. We
don', show all possibl e rows. since the
table would bl.:! 100 large.
The step b to design the
combillruional logic. Deri ving equa-
tions for each output direct ly from
the table (assuming all other input
combinations Jre dOlfi-cares). and
minimili ng those equat ions
icalJy. result s in the foll owing:
x -53 + 52 + 51
n3 - 52
n2 - 5 I
nl - 50*b
nO - 50*b ' + 53
Thi s circui t woul d requi re
3+0+0+2+(2+2) = 9 gale inputs. Thus,
lht.! circuit has fewer gate inputs Ihan
the original binary encoding's 15 gate
inpuls-but one must also consider
thm a one-hOI encoding uses more
nip-nops.
Figure 6.45 One- hot encoding of laser limer.
TABLE 6.6 Slale lable for faser timer conlroller wilh
one-hoI encoding.
InpulS Oulputs
53 52 51 50 b x n3 n2 nl nO
a a a 1 a 0 a a a 1
Off
0 a 0 1 1 a a a 1 0
a 0 1 0 0 1 0 1 0 0
0111
a a 1 a 1 1 0 1 a 0
a 1 a a a 1 1 a a 0
0112
0 1 0 0 1 1 1 a a 0
1 a 0 a a 1 a a a I
0113
1 a 0 0 1 1 0 a a 1
More importantly. the ci rcui t with one-hot encoding is sli ghtl y faster. The critical path for thlll
circuil is nO : 50*b ' + 53. The cril ical path for the circuil with regular binary encoding is
nO 51 ' 5 0 ' b + 5 150' . The regular binary encoded circui l requires a 3-inpul AND gale
feeding into a 2-i npul OR gate. whereas the one-hal encoded circuit has a 2-input AND gate feeding
in a 2-i npul OR gate. Bccause a 2-input AND actuall y has sli ghl ly less delay than a 3-inpul AND
gate. Ihe one-hot encoded circui t has a shorter critical path.
For exampl es wit h more states, the cri tical path reducli ons from one-hoI encoding may be
even greater, and reducl ions in logic size may also be more pronounced. AI some poinl,
of course, one-hOI encoding results in 100 big of a slate regi ster-for example, an FSM
wilh 1000 Slales woul d require aiD-bit Slale register for a bi nary encoding. bUI would
require a looo-bil Siale regi ster for a one-hOI encoding, whi ch is probably too big 10 can
sider. In such cases, we mi ghl consider encodings using a number of bi ts in belween thai
for a binary encoding and thai for a one-hot encoding.
EXAMPLE 6.17
6.3 Sequential Logic Opl imizalions and Tradeoffs 327
OUlpUI Encoding
Some problem descriplion. require us 10 generale a particular ,cquenee of va lues On a el
of OUlpUI S. For example. a problem mighl require u, 10 repctllcdly oUlpul the following
sequence on a I"" r of OUI PUIS x and y : 00. 11.
10, 0 1 .. can caplure Ihe behavior using Ihe Inputs: none; Outputs: x, y
FSM wllh lour slales, A. B, . and D. as shown xy=OO
in Figure 6.46. A siraighiforward binary
encoding for Ihosc Slates would be; 11: 00.
8: 01. C:I 0, D:l1. liS shown in Fi gure 6.46.
we design a COntroll er for Ihi s syslem.
we II have a Iwo-bil SIaIC regisler. logic 10
delennll1e Ihe neXI MaIC. and logic 10 generale
Ihe OUlpul from Ihe present sllll e. BUI might il
xy=Ol
xy=l l xy=10
make more sense 10 a !'l Ime encoding that is Figure 6.46 FSM for given sequence.
idenlical 10 Ihe OUlpul va lues in each Male? If
we use such an encoding. Ihen we will slill have a Iwo-bi l sWle regisler. and we will still
have logic 10 generate Ihe nexi Mme. bUI we won' t have logic 10 generate the OUlput from
the prcselll Slate. Inslead. each OUI PUI will si mpl y be connecled by a wire to a bit in Ihe
slate regisler- Ihus reducing Ihe requi red number of logic galc .
If an FSM has at Icasl as many OUIPUIS needed for a binary encoding, and if each
Slale has a unique OUIPUI combinalion. Ihen we may consi der using a st.lies OUIPUI com-
bination as Ihe Slatcs enCoding. Such an encoding may reduce Ihe amount of logic
required. by eliminat ing Ihe need for logic 10 generale Ihe OUlput s from Ihe present Slate
encoding-I hal logic is reduced 10 jusl wires.
OUIPUI encoding requires Ihal Ihe syslem have al leasl as many outpulS as il has bits
in a minimal binary encoding. olherwise the OUIPUI S can'l represent enough encodi ngs 10
un iquely idenlify each Slate. Furthermore, we can' l usc outpul encoding if the desi red
outpul equenee contains Ihe same OUIPUI va lues in IWO different stales, since every
tate's encoding musl be unique. For exampl e. if we wish to repeatedly generate the
sequence 00, I I. 01. I I. we cannol use OUIPUI encoding. because if we did, then two
tates would have Ihe same encoding. Even in such a silumi on. though, we mi ght try to
OUlput encode as many slates as possible.
Sequence generator using output encoding
Exampl e 3. 10 involved design of a sequence gener-
ator. in whi ch we were 10 gCllcmte the sequence
000 I. 00 11. 11 00. 1000 on a sci orrour out pUIS.
as shown in Figure 6.47. 111 that example. we
encoded the states lI sing a two-bit binary encodi ng.
wi lh II being 00. B being 01. C being 10. and D
being 11. In thi s exampl e. we ll inslead use OUIPUI
encoding. The OUIPUIS have enough bit>. four.
whereas we need at least two bi ts to encode the four
Slates. The sequence also has a different output com-
bination for each state. Thus. we can consider output
encoding for Ihis example.
tnputs: none; Outputs: w, x, y, z
wxyz=OOOt wxyz=tOOO
y
wxYZ=OOll Wxyz=ll00
Figure 6.47 Sequence generator FSM.
.328 Optimizations and Tradeoffs
Table 6.7 ... a panial ,tatc U1ble for
the ,cquencc:: generator. an output
cncooinf!. Notice th:!! the outputs them
...e!'e' x. y. and z. don't need 10 appear
in the table. tht.!) \\ ill be the same as 53.
52. 51. and sO. We u,e a partial table to
avoid ha\ 10 all 16 rows. and we
assume all represent
From the table. we derive equati ons
for c:H.. 'h output
n3sl+s2
n2 - 5 I
nl - 51 ' 50
nO - 51 ' 50 + 5352 '
\\le obtained those equations by looking
al all the Is for a particular output. and visu-
all\ dClcrminine a minimal input equation
th;t "ould gene-rate those I s and Os for the
other ,ho\\ n column enLries (all orner output
\alues. not shown. are don't cares).
Figure 6A8 the final circuit.
Notice that there is no output logic-me
outpuLS \01 . X. y. and Z connect directly to the
Slate regi ster.
Compared 10 the circuit obtained in
E,ample 3.10 u'ing a binary encoding. the
output encoded circuit in Figure 6.-l 8 actu-
ally appear; to use morc transistors. In olher
c:<amples. an output encooed circuit might
use fewer
Whether one-hot encoding, binary
enCoding, output encoding, or some
\ariation thereof in fewe t tran-
TABLE 6.7 Partial state table lor sequence
generator controller using output encoding.
Inputs Outputs
s3 52 s l sO n3 n2 nl
A 0 0 0 I 0 0 I
B 0 0 I I I I 0
C I I 0 0 I 0 0
0 I 0 0 0 0 0 0
nO
I
0
0
I
>--
H-J
53 s2 st sO
-b State register I elk
--
n3 n2
+nl ' nO
Figure 6.48 Sequence generator controller with
output encoding.
sisto" or a ,honer critical path depends on the example itself. Thus, modern tool s may
try a variety of different encodings for a given problem to sec which works best.
Moore versus Mealv FSMs
Basic Mealy A rchiteclure
The FSM, dc'Cribed In this book have thus far all been a type of F M known as a Moore
FSM A Moore FSM b an FSM who c outputs arc n function of the FSM's state. An
alternatIVe type of F M " a Mealy F M. Mealy FSM is nn FSM who e out puts are a
funClton of the FSM\ ,tates alld illl'lIIf. Sometime, a Mealy F M results in fewer SUItes
than a Moore I-SM. rcprc-.enttng an opt.mtlallOn Sometime' tho'e fewer states come at
the c'pcn,c of liming that mu\{ be handled, repre,cnting a tmdeoff.
6.3 Sequential logic Optimizations and Tradeoffs
Recall the standard controller archi tec-
ture of Figure 3.48, reproduced in Figure
6.49. The architecture shows one block of
combinational logic, responsi bl e for con-
vening the present state and external inputs
into the next state and external outputs.
Because a Moore FSM's outputs are
solely a functi on of the present state (and
not the external inputs), then we can refine
the archi tecture to have two combinational
logic bl ocks: the lIexl-Slal e logic bl ock
convens the present state and external
Figure 6.49 Standard controller
architecture-general view.
329
o
c."
0
_ ",
inputs into a next state, and the outpullogic block convens the preseot stale (but nOI the
external inputs) into external outputs, as shown in Figure 6.50(a).
In contrast, a Mealy FSM' s outputs are a funclion of both the present stale and the
external inputs. Thus, the output logic bl ock for a Mealy FSM takes both the present State
alld the external FSM inputs as input, rather than ju t the present state_ as bown in
Fi gure 6.50(b). The next-stage logic is the same as for a Moore, taking as input both the
present state and the external FSM inputs.
o
c."
0i5 '"
SoS:
'"
Figure 6.50 Controller archi tectures for: (a) a Moore FSM. (b) 3 Meal) FSM.
Graphicall y. the FSM output assignments of
a Mealy FSM would be li sted with ea h transi-
tion. rather than each tate. beenu e each
transistion represent a present state and a partic-
ul ar input value. Figure 6.5 1 hows a two-state
1ealy F M wi th an input b and an output
When in state 0 and b-O, the F M outputs =0
and stays in state O. as indi 'atcd by the transiti n
labeled "b' I x-O". \ hen in state 0 and b = 1.
the F M output. - 1 and to state I. We
usc the .. r ,impl to sepn;'ue the tran iti n '$
Inputs: b: Outputs: x
Figure 6.51 A Me31)
output:.. \\ lth tmnoMti
- - - ---
330 OptimizatIOns and TradeoHs
"'"h !.((lflrr
F\\,I\. kf'/oIlIlK
'hI" f
11'll1/ unuUIF/nt"
'Jurpuli In II
"Iftl/.,I ,It",.
d
,
t.Jf r UlJ1urr
ITYJpIUIi/\
aU'j(flt'(/(1
input cundi ti ons from the output assignments-the .. r does not mcan "di vide". here.
Becal"e the tran>ition from 5/ to 50 IS taken no mattcr what the In put value. we li st the
simply "/x'='O:' meaning there's no input conditi on. but there is an output
assIgnment.
leah' Is lay Have Fewer lales
The minor difference between a Mealy and a Moore FSM. namely. that a
F output is a functi on of the state alld the current inputs. can lead to fewer
;tatc, for some behaviors when implemented as a Mealy machine. For example. conSider
the ,i mplc ,oda dis penser controll er FSM in Figure 6.52(a). Setting d= 1 di spenses a
>oda. The FSM stans in Slate /Ilir. whi ch sets d=O and sets an output C 1 ea 1. whi ch we
a,;ume clears a device keeping count of the amount of money deposited int o the soda dis-
penser machine. The FSM transit ions to state \Vail. where the FSM waits to be informed,
throu2h the enough input. that enough money has been deposited. Once enough money
ha; deposited. the FSM transiti ons to state Disp. whi ch di penses a soda by setting
output d= 1. and the FSM then returns to state /Ilil. (Readers who have Chapter 5
may notice thi s example is a simplified ve rsIOn of Example 5. 1: famili ari ty with that
example is not required. though. for the present diSCUS IOn.).
InpuIS: enough (bit)
OutpulS: d, clear (bit)
d=1
Inputs: enough
Slale: It Iw lw! D! I
(a)
Inpuls: enough (bit)
OutpulS: d. clear (bit)
/ d=O, clear=1
elk ...ruuiJuul
Inputs: enough -----t-i-L--
Slale: I I I w I Wit I
OUIPuIs:clea;
(b)
Figure 6.52 FS I, for q)(la di'pcn..er controll er: (a) Moore FSM h., action, 111 ; t.,O'. (b) Mealy
FSM acllon, on Iran'lition\ , rc5tu ll ing in Ihi" cn"'c In fc",cr ,tatcs.
Figure 6.52(b) .. how. a Mealy FSM for the ..nme cont roll er. The initi al slate /lIil has
no attlon .. iL<,elf. but rather ha, a conditi onle,' tran<ltion to ,tate \Vai/thm has the initial-
l/allOn action, d-O and cleo r-J. In ,tate Wail . u tran\ltion with condition enough'
return, to tatc Wail without any aClion, It'ted. nother tra''''tion with condition enough
ha, the aCllon d-I. and take, the FSM back to the /"il 'LUte. oti c thut the Mealy F M
doc nlll need the Dllp 'tate (0 ,ct d I. that aCllon occur, on a tfan"ti"n. Thus. we "ere
ahle to crcatc a MC<ll y FSM wllh fewcf tatc, thitn '" n Moore F I
EXAMPLE 6.18
6.3 Sequential logic Optimizations and Tradeoffs 331
. The Mealy state diagram in Fi gure 6.52(b) uses a convention similar to the conven-
ti on we used for Moore FSMs (Section 3.4). namel y. that any outputs not explicitly
aSSigned on a tranSlLlon are implicitl y assigned a O. As with Moore FSM . we till Ii tan
assignment to 0 explicitly if the assignment is key to the FSM- behavior (such as the
ass ignment of d=O in Fi gure 6.52(b.
Beeping wristwatch FSM using a Mealy machine
Create FSM for a wristwatch that can di spl ay one of four register by setting two outputs S 1 and
5 O. whi ch control a 4x I l11uhiplcxer that passes one of the four registers through. The four registers
correspond to lhe walch's present time (sls0=00). Ihe alann seILing (01). the dale (10). and a
stopwatch (11). The FSM should sequence 10 the nexi regisler. in the order listed above. each time
a bUlIon b is pressed (assume b is synchronized wi th the clock as 10 be high for only 1 clock cycle
on each umque. bUllon press). The FSM should SCI an OUlput p 10 1 each time the bUllOD i pressed_
c<JuslI1g an audible beep to sound.
Inpuls: b: Outpuls: s1 , sO. P
Inputs: b; Outputs: 51 . sO. P
b'/s1SO=OO. p=O Time
b'
b 5150=00. P=O
b'/51S0=01 , p=O c:w 5150=00. P=1
(a)
b'
b'!S150=10. p=O
b'/s1S0=11. p=O
(b)
Figure 6.53 FS ,I for 3 wristwatch with beeping
beh.vior (p= I) when bUlIon is pressed (b= 1): (3)
Meal y. (b) MoofC.
Figure 6.53(a) shows a Mealy describi ng the desi",d beha\'ior. 1\oti -e thai the
FSM e.lsil y the: beeping si mply by setting p-1 on the th:n :'Oln!spond
to bUlIonllfc".s. Inlhe F of Figure 6.53(bl. \\0 had 10 add an c,tra "at< 1I1 rem n ea.:b
pair of in Fi gure 6.53. with each t:'.xtra state haying the action p-l and ha\ ing a C\'\[ldio\. nI
to the IlC\( slate.
I alice that lhe Menl) fc\\a M:Hes than ma..:hioc.:\ dr.l\\ txk: b that \\ .:mm't
gunr:lIlla::d a beep \\ al least ont' lock C) ck. due to ttming that \\ e will :n
Ti min!,! Issues \\ illt F i\ ls
Icul), F 1 output s are not \\ ith ci<:l<:k bUI rather 'un 'hang in
dod edges if an iltput For e\JlIlrle. )It, id'r Ih lImtng dt.\gr.ull
331 OpllmlZations and Tradeoffs
\'''''/fllt Int' (}HI
lit ""/tt'M/In!
'-100ft" 11' HuJt's
mu hdp flU
rt'mrmhu /lUll U
W.,.,rr F51,,/ I
114"'.f Of IJ' tn
... 1,,/,.
\ 1,./11 "on Ilu'
Il/lIlllillft
sho\\n in Figure 6.52(a) for a soda dispcnser s Moore FSM. Note that the out put d
become, 1 1I0r righr (lfter the input enough became 1. but rather UII rhe fi rSI clock edge
ajrer enough became 1. In cont rast. the timing diagram for the Mealy FSM in Figure
,hows that the output d becomes 1 righl (1{ler the input eno ugh becomes 1.
outputs arc synchroni zeu wi th the cl ock: in panicular. Moore outputs onl y change
on entaing a new , tatc. which means Moore outputs only change sli ght ly after a risi ng
clock edgc loads a new state int o the stat e register. In contrast. Mealy outputs can change
not just on entering a new but also any lime an input changes. because Mealy
outputs are a function of both the stat e and the inputs. We took advantage of thi s fact to
eliminate the Disp state from the soda di spenser s Mealy FSM in Figure 6.52(b). Notice,
howe\cr. in the timi ng diilgrall1 that the d output of the Meal y FSM does 1101 SlaY lfor a
complele clock c.\'Cie. If we are unsure as to whether d's hi gh time is long enough, we
could inc lude a Disp state in the Mealy FSM. That stat e woul d have a single transiti on,
\\ith no condition and wi th acti on d=1. pointing back to state Illil. In that case, d would
be 1 for longer than one clock cycle (but less than two cycles).
The Mealy FSM feature of output s being a function of states and input s, which
enabl es the reducti on in number of states in some cases. also has an undesirable charac-
teristic-the outputs may glitch if the inputs glitch in between clock cycles. A designer
u,ing a Mealy FSM should determine whether such glitching could pose a probl em in a
panicular circuit. One solution to the glitching is to insen flip-fl ops between an asynchnr
nou Mealy FSM' s inputs and the FSM logic. or between the FSM logic and the outputs.
uch flip-fl ops make the Mealy FSM synchronous, and the Outputs will change at predi ct-
able interva ls. Of course. such flip-fl ops introduce a one clock cycle delay.
Implement ing a Mealy FSM
We create a controller implementing a Mealy
FSM in nearly the identical way that we created a
controller for Moore FSMs in Secti on 3.4. using
the method of Table 3.2. The onl y difference is
that when we create a state tabl e. the FSM out-
puts' values for all the rows of a panicular Slate
won-t necessarily be identical. For example_ Table
6.8 a state table for the Mealy FSM of
Figure 6.52(b). Notice that the output d should be
a in state Wail (50=1 ) if enoug h-a. but should
TABLE 6.8 Mealy state table lor soda
di spenser
Inputs Outputs
sO enough nO d clear
Inil 0 0 1 0 1
0 1 1 0 1
110;1 1 0 1 0 0
1 1 a 1 0
be 1 if enough= 1. In contrast. in a Moore state table. an output"s values were identical
wIthin a gi ven state. Given the state table of Table 6.8, we would proceed to implement the
oll1binational logic in the same manner as descri bed in Secti on 3.4.
Combining 100re and Mealy FSMs
Dc, igne" often utilit.e FSMs that arc a combination of Moore and Meal y types. Such a
comblllatlon the de\igner to specify some actions in tate _ and others on transi-
11 0n'>. Sueh a combination provides the reduced number or state advant age of a Mealy
FSM. yet avoid, having to replicate a , tatc', acti on. on every outgoing trnnsition of a
Itate_ Thl l , implifi catlon i, reall y ju,t a conveni ence to u designer describing the FSM:
the underl YIng implcmentatlon wi ll be thc arne as for the Menly FSM having rep-
heat ed actionl on a 'tate'" outgoing tranl;ti nl
6.4 Datapath Component Tradeoffs
EXAMPLE 6.19 Beepmg wristwatch FSM .
. usmg a combined Moore/Mealy machine
333
FIgure 6.54 shows a combined Moore/
Mealy FSM stat e diagram describing the
beeping wnstwatch of Example 6.18. The
has the same number of states as
did the Mealy FSM in Fi gure 6.53(a)_
because the FSM sull associates the beep
behaVIOr p= 1 Wi th transiti ons. avoiding
the need for extra Slates to describe the
BUI the combined FSM Slale
diagram IS easier to comprehend than the
Mealy FSM state diagram, because the
assignments to s I s 0 are associated wi th
each and not duplicated on every
outgoing transition.
InpulS: b: Oulpuls: s 1_ sO, P
b'/p:O
b'/p=O
b'/IT-O
b'/p:O
Figure 6_54 Cambinin.
Moore and Mealy -
FSMs yields a simpler
wri twalch FSM_
6.4 DATA PATH COMPONENT ffiADEOFFS
Faster Adders
4, we created several components that are useful in datapath . In that chapter. we
describe m n;,: basIC, easy to understand versions of tho e components. In this section_ we
et s to bUI ld faster or smaller versions of ome of those components.
Add"
two numbers is an extremely common operation in digital circuits, so it mak
.or us to try to cr.eate an adder that is faster than a carry-ripple adder. Recall that a
rry npple adder reqUIres that the carry blls ripple throu2h all the full-adders bef.ore all
outputs are longest path through the c; uit, shown in Fi2ure 6_ - -. i
as the CirCUli s crlfl cai path. Since each full- adder has a delav of ( \\"0- 2ate-delav
en a 4-bll carry-npple adder has a del ay of 4 2 = 8 "ate-delay -A -c- - -I'
add ' d I . 3? ' '= -' J _ .. -npp c
er s e ay IS 2 =. 64 gate-delays. That 's rather slow, but the nice thin2 -about a
carry-npple adder IS thal li doesn't require very man)' oale If a fuji dd - -
h 4 b" . - '= ' -a . er uses !!at - ,
t en a - 11 carry-npple adder reqUIres only -l 5 = 20 2ate . and a 3_-bit --ri I
adder would only require 32 5 = 160 gates. - ruT) pp e
a3 b3
a2b2 at bt ao bO Q
51
Figure 6 55 carry-ripple adda_ \\ith th,' I,>ng sl P.1th (th,' ,-nl1,' all'1thl ,oo\\n.
OptimIZations and Tradeoffs
We \\oldd like to an addcr thut i, much closcr to the dday of just a few gates,
pcrhap .... abollt 5 or 6 gatc-dda)!'-. at the of morc gales.
T\\ o-Level Logic Adder
One ob\ iOll':" way to crC(l tc a faster adder at the expense or morc gates is to Li se our
earlier-deli ned two- level combinational logic design process. An adder designed using
twO Ie"el> of logic has a delay of onl y twO gate-delays. ThOl 'S certainl y fast. But recall
from Figure that building an N- bit adder using twO levels of logic result s in exces-
shely large as N increases beyond 8 or so. To be you gel thi s point , let's
restate the previous sentence sli ghtl y:
Building an tV-bit adder 1I,ing twO levels of logic in large circuits as N
be) and or so.
For example. we estimat ed (i n Chapter that a two- level 16-bit adder would
require about 2 milli on transistors. and that a two-Icvel 32-bit adder would requi re about
100 bi II ion transistors.
On the other hand. a7 a6 as a4 b7 b6 b5 b4 a3 a2 al aO b3 b2 bl be
building a adder using
twO le"el, of logic result s in a a3 a2 al aO b3 b2 bt be
big. but reasonabl y sized ci
adder-about 100 gates. a was
sho\\ n in Figure We could
bui ld a larger adder by cas-
cading such fast adder>
together. Say \: c want ed an 8-
bit adder. We could build thi s
co
Figure 6.56 8-bil adder built from 1\\'0 fast 4-bi t adders.
by cascading "'0 fast adders together. as shown in Fi gure 6.56. If eacb 4-bit adder
i, built from two le'cls of logic. then each 4-bit adder has a delay of 2 gate-delays. The
-I-bit adder on the right take 2 gate-delays to gcnerllle the , um and carry out bits, after
\\hich the addcr on the left take another 2 to gencrate it outputs,
in a IOta I delay of 2 + 2 = 4 gate-delays. For a 32-bit adder built from eight
-I-bit adde". the delay would be * 2 = 16 gate-delay,. and the -ize would be about
8 100 gates = 800 gates. That's mucb bener lhan the 32 * 2 = 6-l gate-delays of a carry-
ripple adder. though lhe improved speed at the expen,e of more gates than the
32 - = 160 gate, of the carry-rippl e adder. Whi ch de,ign is bcner? The answer depends
on your requirements-lhe de,ign w. ing two- level log ic 4-bi t adders i bener if you
require marc ,peed and can afford the extra gate" where,,, the dc,ign using carry-ripple
-I-bit adde" i, better if you don'l need the speed or can' t afford the extra gates. It' a
tradeoff.
Carry-Lookahead Adder
A carry-Iookahead adder imp")\c; on the ,peed of a carry-ripple adder. but without using as
many l!ate, as a t"o-Ievel logic addeL The baSIC Idea" to "look abead" into lower stages to
determine whelher a carry "ill be cremed in tbe pre,elll , tage. -1l1i , lookabcad concept i
'cry elegant and general lie, to other problem,. We will therefore 'pend ,ome time intro-
dUCIn!! the IntUIlU)I1 unoerlYlng lookabead on,"der the ,"dellt"on of t\\O 4-bit numbers
,h,,"'n In I lgure 6.57(b). WIth lhe carne, In each column I.,hclcd O. ( I. t2. e3. and e4.
l
6.4 Oatapath Component Tradeoffs
335
a3b3 a2 b2 al bl aObO cin carries: c4
__ cm
c3 c2 cl cO
4-bit adder
coul
53
B: b3 b2 bl bO
52 51 sO
A: + a3 a2 al aO
(a)
caul 53 52 51 sO
a3 b3
(b)
a2 b2
al bl
(c)
Figure 6.57 Adding 1\\'0 binary numbe b "
looks all earlier bi ls and computes carry-lookahead scheme-each rage
delay IS slage 3 which has 2 10
0
ic I I f carry In bUIQ mal stage "QuId be a 1. The lon<test
of onl y four ga;e-delays. e eve S or the lookahead. and 2 for the full-adder. for a total d;lay
A Naive Inefficient Carry-Lookahead Sellen .
of carry-lookahead is as foll ows. Recall One Impk but nOt "et) effi ient way
II1pUlS a b and c ad e output equauon for a full-adder ba\ino
. . . n outputs co and s. are: =-
s = a xor b xor c
CO = bc ae + ab
So we know that lhe equations for el. e2. and e3 in a adder will be:
el coO bOcO + aOeO + aObO
e2
e3
col
co2
blel aIel + albl
b2e2 + a2e2 + a2b2
In other words. the equation for the carry ' t .
. r ' -In a a pano ular stage i- the same a- the equa-
t,on ,or the carry-out of the pre"ious stage.
We can substitute the equali n r e-l ,'nt 2
e - equati II. resulling in:
e2 - blel + aIel + albl
e2 - bl(bOeO + aOeO + aObO) al(bOcO T a cO aO
c2 - blbOeO + blaOcO + blaObO albOeO
albl
\ e can thell ,ub,tilllte the equ:lli n for c2 into c3', equal1oll. re,uhlOg in:
336 6 Optimi zations and Tradeoffs
e3 : b2e2 + a2c2 + a2b2
e3 = b2(blbOeO + blaOeO + blaObO + albOeO + alaOeO +
alaObO + albl) + a2(b lbOeO + blaOeO + blaObO
+ albOeO + alaOeO + alaObO + albl) + a2b2
e3 = b2blbOeO + b2blaOeO + b2blaObO + b2a lbOeO +
b2alaOeO + b2alaObO + b2albl + a2b l bOeO
+ a2blaOeO + a2blaObO + a2albOeO + a2alaOeO
+ a2alaObO + a2albl + a2b2
We'lI omi t the equation for e4. in order to save a few pages of paper.
We could creat e each stage with the needed inputs. and include a look ahead logic
component implementing the above equations. as shown in Fi gure 6.57(c). Notice that
there is no rippling of carry bits from stage to stage-each stage comput es its own carry
in bit by ""looki ng ahead"" to the val ues of the previous stages.
While the above demonstrates the basic idea of carry lookahead. the scheme is not
very efficient. e I requires .j gates. e2 requires 8 gates. and e3 requires 16 gates, with
each gate requiring more inputs in each stage. If we count gate inputs. e I requires 9 gate
inputs. e2 requires 27 gate inputs. and e3 requires 71 gate inputs. Building a larger
adder. sayan .bi t adder. using thi s lookahead scheme would thu likely result in execs
sively large size. While the pre ent ed scheme is therefore not practical. it serves to
introduce the basic idea of carrylookahead: by having each stage looking ahead at the
inputs to the previous stage and computing for itself whether that stage's carry.i n bit
should be I, rather than waiting for the carryin bit to ripple from previous stages, we get
a fourbit adder with a delay of only 4 gatedelays.
All Efficient Carry-Lookahead Scheme. A more efficient carry lookahead scheme is
as follows. Consider again the addition of twO 4bit numbers A and B. hown in Fi gure
6.58(a). Suppose that we add each column' s tWO operand bit (e.g .. aO + bO) using a
half. adder. ignori ng the carry in bi t of that column. The resulting halfadder outputs
(carry.out and sum) give us some useful informati on about the carry for the next stage. [n
panicular:
If the addi ti on of aO wi th bO resul ts in a carryout of 1. then we know for sure
that e I will be 1. regardless of whether cO is a I or O. Why? Because considering
adding aO+bO+eO. then 1+1+0=10. and 1+1+1 - 11 (the ""+" represents add
here, not OR}-both cases generate a carryout of I. Recall that a halfadder com
putes its carryout as a b.
If the additi on of aO with bO re ults in " su m of 1. then e I wi ll be 1 only if cO is
1. In panicular. con idering aO+bO+eO. then 1+0+1 - 10 and 0+1+1-10. Recall
that a halfadder computes sum as a xor b.
In other el wil l be I if aObO-l. OR if aO xor bO - 1 A D eO- !. So
we get the following equation, for the carry bits:
cl - aObO +
e2 - albl +
c3 - a2b2 +
e4 - a3b3
(aO xor bO)eO
(al xor bllel
(a2 xor bZ)eZ
(a3 xor b3le3
6.4 Datapath Component Tradeoffs 337
cl ........ cin ,'''' ... cO
: \ Cl---
1
lE ljl:' 1 1] ' 01 1----
bl 1 bO :----_______ ! 1 " t
: 1 -
al l aO ; 1 i .:!i...:J....: + 1
51\SO/ 0:
' ..../ ,,' ' ,: 0 0
(a) = l' if xor bO = t
carries: c4 c3 c2
:: I :
cout 53 s2
then cl = 1 then cl = 1 cO = 1
a3 b3 (call thiS G: Generate) (call this p . Propagate)
00
, G' " _____ L
.. ---.... - -. - ----.---_. . J
cout 53 (bl
P3 G3 52
sO
PO GO
r
"
"
"
" ::
"
"
"
"
"
"
"
"
"
"
"
"
" ' .
"
"
"
" : : /I ,
= ... ,-._.
------------------ --------------
cl = GO + POcO
c2=Gl +P1GO PfPOcO
c3 = G2 + P2Gt + P2PtGO + P2P1POcO
cout;, G3 + P3G2 + P3P2Gl + P3P2P1GO + P3P2P1POcO
(cl
Figure 6.58 Adding IWo binary numbers using 3 fasl cam Iookah d h .' .
propagate and -. ea "" eme. (al ,dea of 'IO
a
genernte tenns. (b) computing lhe propagate and c.ener-He U!mh d -ill
10 Ihe c;rrY.lookahe..1d logic. (e) using Ihe prop.1gale ';;d gene"';;e lerntS I ng::;m
or each olumn. The correspondence bel\\een e I in fi2Urt" tel and bl -sOO put.
cn'CIcs connected by the line: eAist fo; c ... and C . I' wn
Let's include a hnl fndda in en h tage to add ( \\ 0 """mnd b'I" t' tha I
<h . F . 6 -8 - '" ,r t, unm ,
. Igure (b). En h halfadderoutpulS a cam-uut bit {\\ luch 1<0 \ .1
(which IS a or b) . h . . , JIlu 'hwn 11
. ote III t e figllre thut for a gi \ cn :'Olumn. \\C Ju,t nU'd I< r the
-
338 Optimi zations and Tradeoffs
UJn '/UHCIWmt'.\'
\\ 'ht'1/ clObO=J. lit'
kfl(}\I\lt'sllould
gel/ frail! d I felr
c/. Hht'II"Oxor
bO= 1.11,1..,1011
propagate lhe cO
fa/m'l/rcllt' I'll/lit'
oJcl. mf'Cl1Img c /
sholild t'qlUlll,.'O.
half-addcr'$ Olltput with the column's carry-in bit to comput e that col umn's sum bit,
because Ihe sum bil fo r a column is jusl a xor b xor c (see Secli on 4.3. page 188).
Let"s rename Ihe earry-oulpUI of Ihe ha lf-adder gel/ erate. symbolized as G-so GO
mean, aObO. Gl mcans alb!. G2 means a2b2. and G3 means a3b3. Lel's also rename
Ihe sum OUlpUI of Ihe half-adder as propagate-so PO means aO xor bOo PI means
al xor b!. P2 means a2 xo r b2. and P3 means a3 xor b3. In short:
Gi aibi (gel/ erale)
Pi ai xor bi (propagme)
When we perfor m carry- Iookahead. ralher Ihan looking direcll y al Ihe operand bils of
previous slages as we did in Ihe naive look ahead scheme (e.g .. slage I looking al aO and
bOl o lel 's look inslead at Ihc half-adder oulPUI S of Ihe previous slage (e.g .. slage I looks
at GO and PO). Why? Because the lookahead logic wi ll turn OUI 10 be simpl er Ihan in the
nai ve lookahead scheme.
We can Iherefore rewrile our equations for each carry bil as fol lows:
cl GO + POcO
c2 Gl + Plcl
c3 G2 + P2c2
c4 G3 + P3c3
Substiluling like we did for Ihe naive scheme, we gel Ihe foll owing carry-Iookahead
equati ons:
cl GO + POcO
c2 Gl + Plcl Gl + Pl(GO + POcO)
c2 Gl + PIGO + PI POcO
c3 G2 + P2c2
G2 + P2 ( GI + PIGO + PIPOcO)
c3 G2 + P2Gl + P2P1GO + P2PlPOcO
c4 G3 + P3G2 + P3P2Gl + P3P2PlGO + P3 P2PlPOcO
Remember. Ihe P and G symbols represent simple lerms: G i
ai * bi, Pi ai
xor bi .
Fi gure 6.58(c) shows the circuits implementing Ihe carry- Iookahead equations for
compuling each slage' s carry.
Fi gure 6.59 shows a high-level view of Ihe carry- Iookahead adder's design from
Figure 6. 58(b) and (c). The four blocks on Ihe lOP are responsible for generating the sum,
Figure 6.59 Hl gh level view Or" 4 bil earr)' -Iookahc:rd adder.
6.4 Oatapath Component Tradeoffs 339
propagale, and generate b' I '
6.58(b) thaI each SPG b ns- el S call1hose "SPG blocks," and you'll recall from Figure
use the pro lock conSI sts of JUSI three gates. TIle 4-bn carr)'-Iookahead logic
using only I paglate a
l
nd generale bils 10 precompute the carry bits for high-order stages.
wo eve s of gates.
The complele 4-bil I
Ihe nonlookahead 10 ' carry- ookahead adder require onl y 26 gates (4*3=12 gate for
TI d I
g lC, and Ihen 2+3+4+5= 14 gates for the lookahead logIC).
le e ay of IllIS 4 bil dd .
h
- a er IS onl y 4 gale-delays- I gale Ihrouoh the half-adder 2
gates I rough the carry lookah d I . " .
thos I . F - ea OgIC, and I 10 finally generate Ihe sum bil (we can see
I de gahes tn Igure 6.58(b) and (c. An 8-bil adder buill usi ng the same carry-looka-
lea sc eme would still hav . d I
(8*3-?4 f e.r e ay of onl y 4 gate-delays. bUI would require 64 gate
k I -d I gal es or Ihe nonlookahead logic. and 2+3+4+5+6+7+8+9 = 44 gales for the 100-
b
a
lea 0lgd'C). A .16-bil carry-Iookahead adder would still have a delay of 4 gate-delays.
ul wou requIre ?OO gal ( 16*3 8 .
2 3+4 - 6 - es =4 gates for Ihe nonlookahead 10glc. and
3 + . +)+ +7+8+9+10+11+12+1 3+14+15+16+1 7=152 gates for the lookahead logic . A
2-bn carry-lookahead adder would have a delay of 4 gale-delays. but would require 656
gates (32*3=96 gales for the nonl ookahead logic. and 152+18+19+20+21+22+23+24+25
+26+27+28+29+30+3 1+32+33=560 gales).
Unfort unately. Ihere are problems thaI make
the size and delay of large carry-Iookahead
adders less attractive. Firs!. Ihe above analysis
counts gates, bUI nO! gale inputs. whereas gale
tnpUIS belter lell us the number of lransistors
needed. NOlice in Figure 6.58 that the gales keep
getting in hi gher stages. For example, stage
3 has a 4- tnput OR gate and 4- inpul AND gate.
whtl e slage 4 has a 5-inpul OR gate and 5-inpul
AND gate as hi ghli ghted in Figure 6.60. Siage 32
of a 32-bil carry-Iookahead adder would have 33-
input OR and AND gates, along wilh other large
gates. Since gates with more inpul s need mo;e
r _ ____ __ ___ ____ _ ____ _ _ _
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
___ _________ ___________
Figure 6.60 Gate size problem.
transistors, Ihen in lerms of tran istors. the carry-Iookahead design is actuall y quite large.
Furthermore, those huge gales would nO! have Ihe same delay as a 2-input AND or OR
gale. Such huge gates are Iypicall y built u ing a tree of smal ler gates. a we would ha\'e
more gate-delays.
Hierarchical Carry-Lcokahead Adders. Building a -I -bi t or even -bil carrv-lookahead
adder using the previous sec lion s method may b; reasonable with respecl gale si zes.
bUI larger carry- Iookahead adders begin to involve gates with 100 many inputs.
We can build a larger adder by connecting smaller adders in a carry-ripple manner. For
exampl e. suppose we have 4-bil carry-Iookahead adders available. We can build a 16-bit
adder by connecling four 4-bil carr)'-Iookahead adders. as sho\ n in Figure 6.61. Lf each
4-bil carry-look ahead adder had a -I-gale-delay. then the lotal dela) of the l6-bit adder
wou Id be = 16 gale-delays. Compare thi s to the delay of a 16-bil :lIT) -ripple
adder-if each full -adder has a IWO gale-delay. then a 16-bil calTy-ripple adder would ba\e
a delay of 16*2 = 3_ gate-delays. Thus. Ihe 16-bil adder built from ur !lIT) -1<X) ' ahead
adders connecled in a carry-ripple manner is Iwice as fasl as the 16-bit :lIT) -ripple udder.
3-'0 6 Optimizations and Tradeoffs
(Actually. careful observat ion of Figure 6.55 reveals that the carry-out of a four-bit carry-
lookahead adder would be generated in three gate-delays rat her than four. resulting in even
faster operation of the 16-bi t adder built from four carry-Iookahead adders-but for sim-
plicity. let' s not look inside the components for such detai led ti ming analysis.) Si xteen gate-
delays is good. but can we do bener? Can we avoid having to wait for the carri es to ripple
from the lower-order 4-bi t adders to the hi gher-order adders?
bl l.bB a7a6a5a4 b7b6b5b4 a3a2al aO b3b2blbO
cout s15-s12 slls8
Figure 6.61 l6-bi t adder implemented using fo ur 4-bi t adders connected in a carry-ripple manner.
In fact. avoidi ng the rippling is exactl y what we did in developing the 4-bit carry-looka-
head adder itself. Thus. we can repeal Ihe Sallie cany- Iookahead plVcess all/side of the
4-bit adders. to quickly provide the carry-in value to the higher-order 4-bit adders. To
accomplish this. we add another 4-bit carry-Iookahead logic block out side the four 4-bit
adders. as shown in Figure 6.62. The carry-Iookahead logic block has exactl y the same
internal design as was shown in Figure 6.58(c). Notice that the lookahead logic needs prop-
agate (P) and generate (G) signals from each adder block. Previously. each input block
output the P and G signals j ust by ANDing and XORing the block's a i and bi input bits.
However. in Figure 6.62. each block is a 4-bit carry- Iookahead adder. We therefore must
modify the internal design of a 4-bit carry- Iookahead adder to output P and G signals. so
that those adders can be used with a second level carry-Iookahead generator.
b11-b8 a7a6a5a4 b7b6b5b4 a3a2a l aO b3b2bl bO
Figure 6.62 l6-bit adder implemented usi ng four CLA 4-bit adders and a second level of lookahead.
Thus. let\ extend the 4-bit carry-look ahead logic of Figure 6.58 to output P and G
signal s. The equations for the P and G outputs of a 4-bit carry-Iookahead adder can be
written as follows:
P - P3P2PIPO
G - G3 + P3G2 + P3P2Gl + P3P2PIGO
6.4 Data path Component Tradeoffs 341
To understand these equ ['
should e ual th Ions, recall that propagate meant that the output carry for a
the COlumn) F qh e '"put carry of the column (hence propagaung the carry through
stage of the to be the case for the carry in and carry out of a 4-bit adder, the first
must
P
'. er must propagate Its '"put carry to its output carry, the second sta"e
ropagate Its '"put ca t ' e
[ oth d rry 0 Its output carry, and so on for the third and four stages
n er wor S each internal . I .
P3P2P1PO. ' propagate signa must be 1. hence the equation P
Likewi se recall that g
I
. enerate meant that the output carry of a column should be a 1
( lence generating a carry of 1) G .
G . enerate should thus be 1 If the first stage generates
a carry ( 0) and all the hi gher stages propa"ate the carry through (P3P2Pl) yield'
the term P3P2P1GO G h Id " . ' Lng
. . enerate s ou also be a 1 If the second stage generates a carry
and all hlgher stages propagate the carry through, yielding the term P3P2Gl. Likewise
for the third stage, whose term is P3G2. Finall y, generate should be 1 if the founh stage
generates a carry, represented as G3. ORing all four of these terms yields the equatio
G - G3 + P3G2 + P3P2G1 + P3P2P1GO. n
We then revise the 4-bit carry-Iookahead logic of Figure 6.58(c) to include
two additIOnal gates in stage four, one AND gate to compute P - P3 P 2 P 1 PO. and one
OR gate to compute G - G3 + P3G2 + P3P2G1 + P3P2P1GO (note that sta"e four
already has AND gates for each term, so we need only add an OR gate to OR the
For conCiseness, we 0 11111 a fi gure showing these two new gates.
We can introduce additi onal levels of 4-bit carry-lookahead generators to create even
larger adders. Fi gure 6.63 illustrates a hi gh-level view of a 32-bi t adder buil t using 32
SPG blo?ks and three levels of 4-bit carry-lookahead logic. otice that the carry-looka-
head logiC form a tree. Total delay for the 32-bit adder is only two gate-delays for the
Figure 6.63 Vicw of multilevel carry-lookahead. showing tree stru lure. \\ hi h erutbl fast .<!din n
with reasonable numbers and sizes of gUles. En h level adds nly 1\\ gate-<iel )s.
Optimization5 and TradeoH5
SPG blocks. and two gate-delays for each level of can'y- Iookahead (CLA) logic. for a
total of 2+2+2+2 = 8 gate-delays. (Actuall y. closer exami nati on of gale del ays within
each component would demonstrate thm total de lay of the 32-bit adder is actuall y less
than 8 gate-delays.) Carry- Iookahead adders buill from mul tiple levels of carry-lookahead
logic are known as IIIlIltile,'el or hierarchical carry-Iookahead adders .
In summary. the carry- Iookahead approach resulLs in faster additions of large bi nary
numbers (more than 8 bit s or so) than a carry- rippl e approach. at the expense of more gates.
However. by clever hierarchi cal design. the carry- Iookahead gate size is kept reasonable.
Carr y-Select Adders
AnOlher way to build a larger adder from small er adders is known as carry-select. Con-
sider bui lding an 8-bit adder from 4-bit adders. A carry-select approach uses two 4-bit
adders for the hi gh-order four bit s. whi ch weve labe led H14_1 and H14_0 in Figure 6.64.
HN_ I assullles the calTy- in will be I. whi le HI4_0 aSSlllll es the carry-i n wi ll be O. so both
generate stable output at the same time that LO.J generates stable output-after 4 gate-
delays (assuming the -I -bit adde r has a delay of four gate-delays). We use the L04 carry-
out value to select among H14_1 or HN_O. using a 5-bit-wide 2x I multiplexer-hence
the tenll carry-selecl adder.
a7a6a5a4 b7b6b5b4
a3a2al aO b3b2bl bO
ci
co 57565554 53525150
Figure 6.64 8-bil carry-seleci adder implemented usi ng Ihree 4-bil adders.
The delay of a 2x I mux is 2 gate-delays. so the lotal de lay of the 8-bit adder is 4
gate-delay; for H14_1 and 1-114_0 to generate correCI sum bit s (L04 executes in parall el).
2 gate-delays for the mux (whose select line is ready after onl y 3 gate-delays). for a
tOlal of 6 gate-delays. Compared with a carry- Iookahead impl ementati on usi ng two 4-bi t
wc've reduced the total de lay from 7 gale-delays down to 6 gat e-delays. The cost
is one exira 4-bit adder. If a 4-bi t carry- Iookahead adder requires 26 gates. then lhe design
with two <I-bit adde" requires 2*26=52 gatcs. whil e the carry-select adder requires
3*26= 78 gate,. the gates for the 5-bit 2x I mux.
We could " 1,0 buil d a 16-bit carry-select adder u,ing 4-bi l carry- Iookahead adders.
by u,ing multiple levels of multiplexing. Each nibbl e ( four bits) would have IWO 4-bit
one a"umi ng a carry- in of l. the other O. ibbleO':. carry-out would
6.4 Oatapath Component TradeoHs 343
select. USing a multipl ex h .
out would then I er. t e appropnate adder for Nibble I. Nibble I '5 selected carry-
would finall y t:e adder for Nibbl e2. Nibble2' s selected carry-out
be 6 gate-dela 5 N' ppropnate adder for Nibble3. The del ay of such an adder would
delays for I, bblel . plus 2 gate-delays for Nibble2' election. plu 2 oate-
adders would hav se eCll on-for a total of only 10 gate-del ays. Cascading four 4-bit
select . e required 4+4+4+4 = 16 gates-delays. The peedup of the carry-
verSIOn over the cascaded version would be
16 / 10 = 1.6. TOlal size would be 7*26 = 182 gates.
plus the gates for the three 5-bit 2x I muxes. That 's
prenyefficient size for prelly good speed.
. F'gure 6.65 illustrat es the tradeoffs among adder
deSigns. Carry-ripple is the smallest but ha the
longest delay. Carry-Iookahead is the fastest but has
the largest size. Carry-select is a compromi se
between the two. involvi ng some lookahead and
some rippling. The choice of the most appropriate
adder for a des ign depends on the speed and size
constratnts of the design.
Smaller Multiplier-Sequential (Shift-and-Add) Style
'"
ca rrylookahead
multilevel
carry-lookahead
carry-select
delay
carry-
ripple
Figure 6.65 Adder tradeoff .
An array-style multiplier can be fast but may require a 101 of aate Cor ' d b' .
I ' I' ..' e WI e- It\vldth
mu tiP lers. itke 32-blt multi pliers. In this section. we create a sequential I '
. t d f b" mu Ilplier
ttl S ea 0 a com ttl all onal one. in order to reduce the size of the multiplier Th ' d
. I '" . e I ea of a
sequentl a mult'pit er IS to keep a running sum of the panial products and compute ea h
pantal. product one at a time. rather than computing all the pani al product at once d
UIll.llllll g them. an
Fi gure 6.66 provides an exampl e of 4-bit multiplication. As ume we stan with a
runlltng of SUIll of 0000. Each step corresponds to a bit in the multiplier (the second
number). In step I. we com pUle the partial product a 0110. which we add to the runnin
a
sum of 0000 to obtattl 00 II O. In step 2. we compute the panial product as 0 I 10. which
we add to the propercolumns of the runmng sum of 00 11 0 to obtain 010010. In ste '
we compute the pantal product as 0000. which we add to the proper colu f
P
.
. L' k ' . runs 0 the
runlltng sum. I ' eWlse for step 4. The fi nal runlltng sum is 00010010. whi h i
correct product of 0110 and 0011. the
Step 1
0110
x 001 1
Step 2
0110
x 00'1
Step 3
0110
x 0 0 11
Step 4
0110
x 0011
o 0 0 0 I" 00 1 lO r 0 1 0 0 1 0 I" 00 1 0 0 1 0 (running Sum)
(P8noalptOduct)
o 0 t 1 0 0 1 0 0 1 0 00 1 00 1 0 0 0 0 1 0 0 1 0 (new runlllng sum)
Figure 6.60 Multiplication done by generuli ng n p:u-tial produ'l for bil in the multipher (the
number on the boIl OI11 ). nccul1lulatlllg the part ud products III a rulllllllg
344
Optimizations and Tradeoffs
Computing each partial product is easy-we just AND the current multiplicand bit
with every bit in the multiplier to obtai n the partial product. So if the current multiplicand
bit is 1. the AND creates a copy of the mult iplier as the part ial product. If the current
multiplicand bit is O. the AND creates 0 as the part ial product.
We need to determine how to add each partial product to the proper columns of the
running sum. Notice that the part ial product shoul d be moved to the left by one bi t rela-
tive to the running sum after each step. We can look at this another way-the running
sum should be moved to the right by one bit after each step. Look at the multiplication
illustration in Figure 6.66, unti l you "see" how the ru nning sum moves one bit to the right
relative to each partial product.
Therefore. we can compute the running sum by init ializi ng an 8-bit register to O. In
each step we add the part ial product for the current mult ipli cand bi t to the leftmost four
bits of the runni ng sum. and we shift the running sum one bi t to the ri ght , shifting in a 0
into the leftmost bi t. So the runni ng sum register shoul d have a clear functi on, a parallel
load function. and a shift right function. A circuit showing the running sum register and
an adder to add each partial product to that register is shown in Figure 6.67.
multiplier
e mrld
c:
8 mr3
mr2
mr1
mrO
f----': rs:'c:;:le'7a"-r_ ________
_ _ ________ _1shr
start
running sum
register (8)
product
f igure 6.67 Internal design of a 4-bit by 4-bit sequential mult iplier.
The last thing we need to figu re out is how to control the circu it so that the ci rcuit
does the right thing during each tep-that 's exact ly what conlroll ers are for. Figure
6.68 hows an FSM describing the desired controller behavior of our sequential
multiplier.
6.5 Rll Oesign Optimizations and Tradeoffs
345
mdld
mrld
mr3
mr2
mr1
mrO
rsload
rsclear
rsshr
start
figure 6.68 FSM describing the conlroiler for the 4-bil multiplier.
In terrnsof performance, the sequenti al multi pl ier requires two cycles per bit. plus I
cycle for IOIt lall zatlon. So a 4-bi t multipli er would require 9 cycles. while a 32-bit multi-
pher would require 65 cycles. The longest register-to-register delay is from a n!gister
through the adder to a register. II we built the adder as a carry-Iookahead adder havin
a
onl y 4 gate-delays, then the total delay for a 4-bit multiplication would be 9 cycles * ;;
gate-delays/cycle = 36 gate-delays. The tOlal delay for a 32-bi t multiplication would be
65 cycl es.* 4 gate-delays/cycle = 260 gate-delays. While slow, notice that this multiplier' s
size IS qUIte good, requiring only an adder, a few registers, and a state-register and some
control logic for the controll er. For a 32-bit multiplier, the size woul d be far smaller than
an array-style multipli er requiring 31 adders.
The mult ipli er's design can be further improved by using a shifter in the datapath, but
we omIt detail s of Ihat improved design.
6.5 RTL DESIGN OPTIMIZATIONS AND TRADEOFFS
Pipelining
In Chapter 5, we described the process of RTL design. While creating the datapath durina
RTL design, there are several optimizations and tradeoffs that we might make to creat:
smaller or fas ter des igns.
Mi croprocessors continue to become small er. faster.
and less expensive. and Ihus designers use mi cropro-
cessors whenever possible to impl ement desired
di gital system behavior. But designers conti nue to
choose 10 build thei r own digital circuit s to imple-
ment desired behavior of many digi tal systems. wi th
the mai n reason for that choice being speed. One
method of obtai ning speed from digital circuits is
through the lise of pipelini ng. Pipelilling means to
break a large tusk down into a sequence of stages
Without pipelining:
With pipelining:
I WI I W2 I W3 1 .. Stage 1"
] ] Ej ..Stage 2"
Figure 6.69 pplying pipelining [0
and
dIShes can be done n WTentl) .
Optimizations and Tradeoffs
such Ihat data moves through lhe stages like part s move Ihrough a factory assembl y line.
Each stage produces output used by the next Slage, and all stages operale concurrently,
resulting in bell er performance than if data had to be fu ll y processed by the lask before
new dala could begi n being processed. An exampl e of pi pelining is washing di shes wilh a
friend. wilh you washing and your friend drying (Figure 6.69). You (the fi rsl slage) pick
up a di sh (di sh I) and was h it. Ihen hand il to your friend (Ihe second stage). You pi ck up
Ihe nexl dish (dish 2) and wash il cOl/currell tly 10 your fri end drying dish I. You then
wash di sh 3 whil e your friend drys dish 2. Di shwashing Ihi s way is nearly lwice as fasl as
when washing and drying aren' t done concurrent ly.
Consider a syslem wi lh data inputs H. X, Y. and Z. lhal should repeal edly outpullhe
sum 5 = \, + X + Y + Z. We coul d impl emelll lhe syslem using an adder tree as
shown in Figure 6.70(a). The fastesl cl ock for thi s design must not be faster lhan the
longesl path bel ween any pair of regislers, known as lhe crili cal palh. There are four pos-
sible palhs from any regisler OUIPUl 10 any regisler inpul , and each path goes through two
adders. If each adder has a delay of 2 ns. then each path is 2+2 = 4 ns long. Thus, the
crilical path is 4 ns. and so the faslesl clock has a peri od of at leasl 4 ns, meaning a fre-
quency of no more lhan I 14 ns = 250 MHz.
elk el k
So mininum clock
*
elk-1L--iL
S(O)
(a)
So mininum clock
period is 2 ns
,-----,
(b)
Figure 6.70 Nonpipelined versus pipelined dmapal hs: (a) four regisler-I o-regisler palhs of 4 ns each,
so longe' l palh is 4 n . meani ng minimum clock period is 4 ns. or 114 ns = 250 MHz, (b) six
rcgisler-to-regi ster paths of 2 ns each, so longest palh is 2 ns, meaning mi ni mum clock period of
2 "', or 112 ns = 500 MHz.
Figure 6.70(b) shows a pi pelined version of lhis des ign. We merely add regislers
belween lhe fi"l and second row of adders. Since Ihe purpose of lhese registers is
solely relaled to pipelini ng, lhey are known as pipelil/e registers. though lheir internal
des ign is Ihe same as any ol her register. The compul ations bel ween pipeline regislers
are known a, stages . By inserting lhose regi sters and lhus creating a lwo-slage pipeline,
we've reduced Ihe critical palh from 4 ns down 10 on ly 2 ns. and so the fastesl cl ock has
a period of al leasl 2 n,. meani ng a frequency of no more Ihan 112 ns = 500 MHz. In
olher words. jusl by inserting lhose pipeline regi lers. we've doubled the perfo rmollce
of our de,ign!
6.5 RTl Oesign Optimi zations and Tradeoffs 347
Latency versus Throughput
The lerm "performance" ne d b fi
F 670 b e s 10 e re ned due 10 lhe pipelining concept. NOlice in
. . () .lhallhe firsl result 5(0) doe n' t appear umil after IWO cycles. whereas
I e eSlgn 111 FIgure 6.70(a) outputs Ihe fi rst resull after only one cycle. Thal's because
data must now pass lhrouoh .
c. <> an eXira row of regISters. The term latency refer to delay
lor new II1pUl dala 10 result . .
B '. 111 new OUIPUI data. Lalency IS one kll1d of performance
oth deSIgns 111 Ihe fi gure have a lalency of 4 ns. Fi gure 6.70(b) also hows that a
value for 5 appears every 2 ns, versus every 4 ns for Ihe design in Figure 6.70(a). The
term throughput refers 10 the rale at whi ch new dala can be input to lhe sy tern and
slm.ll arly, lhe rale al whi ch new oUlpul S appear from Ihe syslem. The throughpUl ;f the
deSIgn In Fl gur.e 6.70(a) IS I sampl e every 4 ns, while lhe lhroughpul of lhe desion in
6. 70(b) I I sampl e every 2 ns. Thus. we can more precisely describe the p:rfor-
ance Improvemenl of our plpehned de ign as having doubled the throughpllt of lhe
deSIgn.
EXAMPLE 6.20 Pipelined FIR filler
Recall the 100-lap FIR fi lter from Example 5.8. We
est imated that implcmcnl31ion on a microprocessor
would require 4000 ns, while a custom di aital circuit
implementati on would require only 34
c
115. That
custom digi tal circuit utili zed an adder trce, wi th
seven levels of adders-50 addili ons. Ihen 25. then 13
(roughl y), then 7. Ihen 4, Ihen 2. then I. The IOlal
delay was 20 ns (for Ihe mult ipli er) plu seven adder-
delays (7*2ns= 14ns), for a lotal delay of 34 ns. We
can funher improve Ihe Ihroughpul of Ihat fi lter using
pipel ining. NOli cing Ihal Ihe mult ipli ers' delay of 20
ns is roughl y equal 10 Ihe adder lree delay of 14 ns,
we mi ght deci de to insen pipeline registers (50 of
them since there are 50 mullipli ers feeding into 50
adders at Ihe lOp of the adder tree) belween Ihe multi-
pli ers and adder tree. resulling in dividing the
computation task into two stages. as shown in Figure
6.71. Those pi peline regislers shonen the critical path
from 34 ns down to only 20 ns. meaning we can clock
the ci rcuit faster and hence improve the throughput.
The Ihroughpul speedup of Ihe unpipelined design
N
Q)
0>
'"
;;;
Figure 6.71 Pipelined FIR filter.
the was. 4000/3.4. ;;: 1 17. while the throughput speedup
of Ihe plpehned deSIgn IS 4000/20 = 200. QUJle a nice nddJl lOnnl speedup for jusl insening orne
registers!
Although we could pi peline the adder tree also, that would not gain u higher throughput. since
the multiplier stage woul d still represent the critical path. \ Ve call' t cI k a pipelined an\
fasler than the longest stage. since otherwise that stage would fail to load COrre't \aJues into
stage's output pipeline registers.
The I.Hency of the nonpipclined design is one cycle of 34 ns. or 34 n:-. totai. The of the
pipclincd design is two cycles of 20 ns. or 40 ns total. Thus. we IhJt the
throughput at the expense of hHt:n y.
J.l8 6 Optimizations and Tradeoffs
Concurrency
EXAMPLE 6.21
A key reason for de igning a custom digi tal circuil , ralher than wri ling software that exe-
cut es on a microprocessor. is 10 achi eve improved performance. A common method of
achieving perfomlance in a custom digital ci rcuil is through concurrency. COllcurrellcy in
digital design means to divide a lask into several independent subpans, and then to
execule those subpan s simultaneously. As an analogy, if we have a stack of 200 di shes to
wash. we mighl di vide the slack into 10 subslac ks of 20 di shes each, and then give 10 of
our neighbors each a subsl3ck. Those neighbors simult aneously go home, wash and dry
thei r respecti ve substacks, and return 10 us their compl eled di shes. We would get a ten
times speedup in dishwashing (ignoring Ihe time to di vide the slack and move substacks
from home to home).
We have used concurrency in several exampl es already. For example, the FIR filter
datapalh of Figure 5.38 had three multipliers executing concurrentl y.
LeI's use concurrency to creale a fasler version of an earli er example.
Sum-of-absolute-difference component with concurrency
In Example 5.7. we designed a custom circuit for a sum-of-absolute-difference (SAD) component, and
we estimated that component to be three times faster than a software-an-microprocessor solution. We
can do even bener. Notice that comparing one pai r of corresponding pi xels of two frames is indepen-
dent of compari ng another pair. Thus, such compari sons are an ideal candidate for concurrency.
We firsl need 10 be able 10 read the pi xels concurrentl y. We can do thi s by redesigni ng the
block memories A and B. which earli er were designed as 256-byte memories. Instead, let's design
them as 16 word memories. where each word is 16 bytes (the total is still 256 bytes). Thus, each
memory read corresponds to readi ng an entire pi xel row of a l 6x 16 block. We can then concur-
rently determine the differences among all 16 pairs of pi xels from A and B. Figure 6.72 shows a
new data path and controll er FSM for a more concurrent SAD component.
iii
dill16
- -A8_rd=1
53 sum Id=1
i Ino; l
54 sad_regJd=l
ConI roller
AO 80 Al 81 A14 814 A15 815
Dalapalh
sad
Ftgure 6.72 SAD datapalh usi ng concurrency for speed. along with Ihe controll er FSM.
6.5 RTL Design Optimizations and Tradeoffs 349
The data path consists of 16 b .
lowed by 16 ab olute I su tractors operating concurrentl y on the 16 pixels of a row. fol-
result gets added 'thvalh
ue
components. The 16 result ing differences feed into an adder tree, whose
WI . e present sum for w 'f b k '
pares its COunter i with 16 since . n Ing ac. Into the sum register. The datapath com-
difference between rows 16' . there are 16 rows In a bl ock, and so we must compute the
ences of each row and th The controlling FSM loops 16 times to accumulate the differ-
SAD
. ' en oa s the final result into the register sad reg. whkh connects to the
component s output sad. -
. In 5.:, we esti mated that a software solution would require about six c des r '
Sillce there are 256 pi xels in a 16x l 6 block, the software would
J s to compare a pmr of bl ocks. Our SAD circui t with concurrency instead requires only 1
e row of 16 pi xels. which the circuit must do 16 times for a block resulting in
n y = . eye es. Thus. the SAD circuit's speedup over software is 1536 I 16 =' 96. In other
words, the .relatl vely Simple SAD Circuit usi ng concurrency runs nearly 100 times faster than a soft
ware solulion. Thm son of speedup eventually translates to beller quality digitized vi deo
whatever Video appliance we are designing.
Pipelining and concurrency can be combined h'
improvements. to ac teve even greater performance
Component Allocation
When the same operation is used in two different states of a hi gh-level state machine, we
can choose to ell her tnstanUale two functional units. one for each state. or one functional
uOJt: whtch Will be shared .among the two states. For exampl e, Fi gure 6.73 shows a
poruon of a state machtne with IWO states A and B that each ha I' li .
. . . ' , ve a mu ttp catton oper-
auon. We can choose 10 use IWO dlSttnct multipliers as shown ' F' 673( )
" tn tgure . a (we
aSSume the t vanables represent regt sters). The figure also shows the control si !ffials that
would be sel tn each Slale of the FSM contrOlling thaI datapath. with the t 1 reoi;ter bein2
loaded tn the first state (tll d-I), and the t4 register beino 10 d d ' th e -
(t4l d= I) . " a e to e second state
"0---8
t = t2 13 14 = t5 t6
F5M-A: (tlld=l) 8: (t4Id=1)
.,
e2mul
e1 mul
(8)
Figure 6.73 Two different component allocations: (a) two multipliers. (b) one multiplier (c) the one
multiplier allocation represents a tradeoff of smaller size for slightly more delay. .
However. because a slate machine can' t be. in. IWO states at the ame time_ then we
know that Ihe FSM wtll perform only one multtphcation at a time. 0 we an <hare ne
multiplier among the IWO states. Because fast multipliers are big h h ' -
_, su mug ould sa'.
350 Optimizations and Tradeofts
TI,e remlS
"opu(I!or" {/rid
"oper-arion"
refers 10 belial'ior;
like addition or
lIIuttiplicatioll,
TI,e/enll
"compolllfli/"
(aka 'jimcriollal
unit") refers 10
hard\\'ore, like (III
adder or (I
multiplier.
a 101 of gates. A datapath wi th only one multipli er appears in Figure 6.73(b). In each state
of the state machine. the comroll er FSM would confi gure the multipl exer select lines to
pass the appropriate operands through to the multipli er. as well as loading the appropriate
dest ination register as before. So in the first state A. the FSM would set the select line for
the left Illult iplexer to 0 to let t2 pass through (s 1 =0) and woul d set the select line for
the right multi plexer to 0 to let t3 pass through ( S r=O). in addition to selling tll d=l to
load the result of the mutlipli cati on into the t 1 register. Likewise. the FSM in state B sets
the muxes to pass t5 and t6. and loads t4.
Fi gure 6.73(c) illustrates that the one-multiplier design would have small er size. at
the expense perhaps of sli ghtl y more delay due to the multipl exers.
A component library mi ght consist of numerous different functional units that could
pOlenri all y impl ement a desired operati on- for a multipli cation. there may be several
multi pl ier components: MULl might be very fas t but large. whil e MUL2 mi ght be very
small but slow, and MUL3 may be somewhere in between. There may also be fast but
large adders. small but slow adders. and several choices in between. Furthermore. some
component s might support multipl e operations. like an adder/subtractor component . or an
ALU. Choosing a panicular set of fu nctional units to impl ement a set of operations is
known as compoll ellt allocatioll . Automated RTL design tools consider dozens or hun-
dreds of possible component allocati ons to find the best ones that represent a good
tradeoff among size and performance.
Operator Binding
Gi ven an all ocation of component s. we still have to choose whi ch operations to map to
which components. For exampl e. Fi gure 6.74 shows three multipli cation operations. one
in state A, one in state B, and one in state C. Fi gure 6.74(a) shows one possibl e compo-
nent binding to two multipli ers. resulting in two multipl exers. Figure 6. 74(b) shows an
alternat ive binding to two multipli ers. whi ch results in onl y one multipl exer, since the
same operand (t3) is fed to the same multipli er MULA in two different states and thus
that mUltiplier's input doesn' t require a mux. Thus. the second binding results in fewer
t4 = t5 \\t6 t3
t5 (f 6 f 3
sl-Tii fsr
(a)
Binding 1
Binding 2
Figure 6.74 Two different operator bindings: (a) binding I uses two muxe., (b) binding 2 uses only
one mux. (c) binding 2 represents an optimi zation compared 10 binding I .
6.5 RTl Design Optimizations and Tradeofts 351
gates. wi th no performance loss . . .
that bindin
o
not onl y - an opuml zauon. as shown in Figure 6.74(c). ote that
map to which co maps operators to components. but also chooses which operand to
mponent IIlput If we had d t3
Figure 6.74(b). then MULA w mappe to the left operand of MULA in
M " ould have reqUi red two muxes rather than just one
applll g a given set of operations to a . I .
operator billdillg. Automated to I . II partl cu ar component allocation is known as
given component allocati on. 0 s typlca y explore hundreds of different bindings for a
Of course the tasks of co .
demo If we ail ocate onl y one mponent all ocallon and operator binding are interdepen-
component. If we all ocate tw component, then all operators mu t be bound to that
all ocate many component s. then we have some choices in binding. If we
will perform all ocat ion and binding s I any mlore bllldlllg chOIces. Thus. some tools
Imu taneous y, or the tools will iterate a th
two tasks. TOg, ether. component all ocati on and operator binding are sometimes t e
as reSOurce S IQrll1g. 0
Operator Scheduling
Given a hi gh-level state machine, we may introduce add ' . I
II
lliona states to enable u to
create a sma er datapath. For exampl e consider the h'gh-I I '..
675( ) Th h' I eve state machine III Fl oure
. a . e state mac lIl e has three states. with State B having rwo multipl ications. Since
'"
0---0--0
(some tl = t2 t3 (some
operat ions) 14 = 15 t6 operations)
(a)
(some 11 = t2 13 t4 = 15 16
operations) _______ _
3-state schedule
4-slate schedule
delay
(e)
(some
operations)
Figure 6.75 Schedu ling: (a) initial 3-state schedut e requires two muttipliers. (h) new 4-smte
schedule requires onl y one multipli er. (c) new schedule trades off size for delay (extra late).
those two multi pli cati ons occur in the same state. and we know th t h .
. . . a eac state Will be a
slllgie clock cycl e. then we wIll need two mUltipli ers (at least) in the datapath to u
the two Slltlultaneous multipli cati ons III state B. But what if we 0 I h ppon
. . n y ave enouoh Oates
for one mulupli er? In that case .. we can reschedule the operations so that there i "at 7no t
onl y one multiplicati on needed III anyone state. as in Figure 6.75(b). Thus. when we allo-
cate components. we need ? nl y all ocate one multiplier as hown. and as was also done in
Fi gure 6.73(b). The result IS a smaller but slower destgn. a illustrated in Fi oure 6 -
That scheduling example a sumed that the computati on of t4 uld t be '" . ).
no moved t Sl3te
A or state C. perhaps because .those states already used a multiplier. r perha beenu
t 5 and t 6 were not ready yet III state A. and the new re ult in t4 \\ as ed d P se
no 0 III ' tate C.
352 Optimizations and Tradeoffs
Convening a computation from occurri ng concurrent ly in one stat e to occurring
across several states is known as serializing a computation.
Of course, the inverse rescheduling is also possible. Suppose we staned with the
high-level state machine of Figure 6.75(b). If we have plenty of gates avai lable and want
to improve our design's perfomlance, we might reschedule the operations such that we
merge the operations of state B2 and B into the one state B, as in Figure 6.75(a). The
result is a faster but larger design. requiri ng two multipli ers instead of one.
Generally. introducing or merging states, and assigning operati ons to those states, is
a task known as operator scheduling.
You may have noticed that operator scheduling is interdependent with component
allocation. which you may recall was interdependent with operator binding. Thus, the
tasks of scheduling, allocation, and binding are all interdependent. Modem tools may
combine the tasks somewhat. andlor may iterate among the tasks several times, in search
of good designs.
EXAMPLE 6.22 Smaller FIR filter using operator sc heduling
Consider the 3-lap FIR filter of Example 5.8. That design had no controller. meani ng the high-level
state machine actually had just one state containing aU the dat apath actions. as shown in Figure
6.76(a). We could reduce Ihe size of the datapath by scheduling the operations across several stales.
such that at most one multipli cation and one addition occurs per state. as shown in Figure 6.76(b).
The first stale loads the x registers with samples-note that the ordering of those actions nextla the
state doesn '( matter si nce all the actions occur simultaneously. That state also clears a new register
named sum. which had to introduce to keep track of the intermediate tap sums to be com pUled
in the laler Slales. The second state compules Ihe firsl lap of Ihe filter result . the neXI stale computes
the second tap. and the next Slate computes the third lap. The laSl state OUlput s the result , and then
the state machine returns to the first state again.
Inpuls: X (N bits)
Oulpuls: Y (N bits)
Local registers:
xtO. xtt . xt2 (N bits)
W
xtO =X
51 xt1 = xto
xt2 = xt1
Y = xtO' cO
+ xt1 ' c1
+ xt2 c2
(a)
Figure 6.76 High-level state machine for 3-tap
FIR filter: (a) original one-state machine, (b)
fi ve-stale machine with at moSl one add and
one mult ipl y per state. We ignore the writing
of the constant regi sters (c O. c 1. c 2) for
simpli city in the example.
Inputs: X (N bits)
Outputs: Y (N bits)
Local registers:
xtO. xt1, xt2. sum (N bits)
sum =0
xto = X
xt1 = xtO
xt2 = xt 1
sum = sum + xtO cO
sum = sum + xt' cl
sum = sum + xt2c2
Y = sum
(b)
6.5 RlL Design Optimizations and Tradeoffs 353
A dalapalh for thi s Slate h' . h . .
. I' d mac Ine IS s own III Figure 6.77. The data path requires only one mul-
up Jer an one adder beca th . . . .
in Figure 6.76. The' . use erc IS at r:nOS( one and one addition in any given state
panlcular configurall on of Ihe multlpher. adder. and regi ster in Figure 6.77 is
extremely .common In single circuits. and is generally known as a multiply.accumulaJe
(MAC) unll. The dalapalh multIplexes the inputs 10 the MAC unit.
Figure 6.77 Serial FIR filter datapath. The components in Ihe dashed box compri se whal is known
as a multiply-accumul ale (MAC) component.
One fu nher difference belween thi s datapalh and the concurrent datapath of Example 5.8 is
Ihat Ihi s datapath has load lines on the X regiSlers and on yreg. The conCUrrent design loaded those
registers every clock cycle. but Ihe serial design onl y loads those regi sters during particular tates-
other Slates compute intermediate results.
We estimated Ihe performance of the concurrent design of Example 5.8 assuming I os per gate.
2 ns per adder, and 20 ns per multiplier. The design had a critical path of 20 ns for the multiplier aod
then 4 ns for two adders In senes, for a total of 24 ns. That was al 0 the time between new results
being laken in al tlle inputs and generated al the output: 24 ns. Using the more precise performance
measures of lalency and throughput defined in Section 6.5. the concurrent design has a lmeney of 24
ns (delay from inpul 10 OUlput), and a throughput of I sample every 24 os. The serial d iro has a
criti cal palh equnl to Ihe delay through a mux. multiplier, and adder. Assuming two gate-<lelaY; for the
mux, we obtai n a delay of 2 ns + 20 ns + 2 ns, or 24 ns. The latency from input to oUlput is five states.
meaning 5 24 ns = 120 li S. The throughput is I sample every 120 ns. Thus. the concurrent 3-mp FIR
filter has 120/25 = 5 times faster lalency. as well as 5 times fasler throughput. companed to the serinl
FLR filt er. Recall from Example 6.20 that a pipeli ned concurrent FlR filter has even fasler throughput.
The performance difference between serial and concurrent become even more pronounced if
we look at an FIR fi lter with more laps. We estimated the latency of a concurrent lOO-tap FIR filter in
Sectioll 5.3. after Example 5.8 to be 34 ns (the delay I grealer than the concurrent 3-tap filter becau
Ihe lOO-tap fi lter needs an adder. ,:"e). The senal desIgn would till have a _4 os ritical path. but
would require 102 states (I to lIuuahze, 100 10 compute the taps. and I to oUlput). for a lateOC) f
102 24 ns = 2448 liS. Thus. Ihe latency speedup of the concurrent design would be 1 34 = _.
We should also consider the size difference between the serial and ncurrent design . Let's
assume for illustralive purposes Ihat an adder reqUIres nppro"matel) 500 gates and a multiplier
35-1 Optimizations and Tradeoffs
require ... 5000 gates. The serial design' s one and one would thus require only 55?O
For a 3-tap FIR filter. the concurrent design s 3 muillpi lers and 2 would
5000*3 + 500*2 = 16.000 gates. For a IOO-tap FIR filt er. the concurrent design s lOO multlphers
alone \\Quld require 100*5000;. 500.000 gates- I 00 times more gates than the senal deSign.
Intuitively. these numbers make sense. A concurrent
dcsi2.n ror 100 lapS uses about 100 limes more gates (due to
100 multi pliers instead of just I) compared to a serial
design. yet achieves about 100 limes bctl cr performance (due
10 computing 100 multipli cati ons concurrentl y rather than
computing one multiplication at a time). .
Depending on our pcrfonnance needs and Size con-
sLIaints. wc mi ght considcr designs in between the two
extremes of serial J nd concurrent. such as a design with two
multipliers. whi ch would be roughl y twice as big and twice
as fasl as Ihe serial design. or len multipliers. whi ch would be
roughly ten timcs big and ten times as fast as serial
design. Fi!2ure 6.78 illustr.1tcs tradeoffs among senal and
for an FIR filter.
concurrent FIR
1 compromises
senal
- FIR
delay
Figure 6.78 FIR design
tradeoffs.
The above sections should have made it quite clear that RTL design presents an enor-
mous ran2e of possibl e soluti ons to the designer. A singl e hi gh-level state machine can be
impleme;ted as any of a huge variely of possible implementati ons thai differ tremen-
dous ly in their sizes and performance.
Moore versus Meal y Hi gh-Level St at e Machines
In the same way that we can create either a Moore or a Mealy FSM (see Section 6.3), we
can create Moore or Mealy high-level state machines. In Ihe case of a hi gh-level slate
machine. a Moore Iype can only have acti ons associaled with the states, while a Mealy
type can have actions as ociated with the transiti ons. As was the case wilh FSMs, a
Mealy type may result in fewer stales. Mi xing Moore and Mealy types IS commonly done
in high-level state machines.
6.6 MORE ON OPTIMIZATIONS AND TRADEOFFS
Serial versus Concurrent Computation
Having seen in thi s chapl er numerous examples of Iradeoff techniques at various levels of
design, we can detect a common theme underl ying some of Ihose Iradeoffs. The common
Iheme is that of seri al ver us concurrent compul ali on. Serial means to perform lasks one
at a lime. COll currell t means 10 perform lasks aI Ihe . ame lime.
For example. in combinalional logic design, we can reduce logic size by faclOring
out By factoring OUI lerms. we are essenti all y seriali zi ng the compulalion. by com-
pUling the factored out terms firsl. and then combining Ihe resul ts with other terms. In
datapalh componenl design, we can improve an adder's speed by compuling carries can-
currenlly. rather than wai ling for the carry to ripple !.eri all y. In RTL design, we can
schedule operation, across ,everal . Iates. scri aliling Ihe opcralions 10 reduce size
6.6 More on Optimizations and Tradeoffs 355
compared to operations in a single state. Example 6.2 1 and Example 6.22 both
senal versus concurrent computati on tradeoffs. for an SAD circuit and an FIR
CtrCUIl. respectively.
Trading off between serial and concurrent computation is a fundamental concept
spanmng all levels of digital design. As a general rule, a concurrent design is faster but
larger, whde a sert al design is smaller but slower .
Typi call y, numerous design options exist that span the ranae in between fully serial
and fu ll y concurrenl designs. <>
Optimizations and Tradeoffs at Higher versus Lower Levels of Design
As a general rule, the.optimi zat ions and tradeoffs made at the higher Ie els of design may
have a much greater tmpaci on design cri leria than the optimizations and tradeoffs made
at lower levels of design. For example, imagine wanting to dri ve to a city on the other
side of the country in as lill ie time as possible. We could reduce time by reduci ng the
number of stops we make to eat. meaning we carry our own food in the car. We could
also reduce time by reduci ng stops for fuel, meaning we use a car wi th the lonee t dri viDe
capacity per gas lank. Some people (nO! you. of course) might even consider driving
faster than Ihe legal speed limit. But those are nO! Ihe fir t things you typicall y think of
when trying to reduce driving time for a cross-country trip. The most important deci sion
is which rout e to lake. One route mi ght be 4000 mil es long. whil e another route may be
onl y 2000 miles. The hi gh- level decision of which route to take has far more impact than
all the lower- level deci sions mentioned previously. Those lower-l evel deci sion are onl y
reall y useful 10 us if we made the ri ght hi gh-level decision, and then if we till want to
reduce the time furt her.
In digi tal design, optimi za-
tion/tradeoff deci sions at the
hi gher levels (e.g., RTL deci sions)
may have a much larger impact
than deci sions at Ihe lower level
(e.g .. datapath component deci sions
or multilevel logic decisions). For
example, the RTL decision to bui ld
a serial or concurrent FIR fi ller
(Example 6.22) wi ll have a far
greater impacl on circuit size and
perfonnance than Ihe datapath-
component- level decision 10 use a
carry-rippl e or carry-Iookahead
adder, or Ihe combi nali onal-Iogic-
delay
(a)
land
(b)
Figure 6.79 Higher- lower-level deci"ions:
(n) higher-level decisions (denoted by the larger two
circles) focus the design into a region. while 100\cr-lc\'cl
decisions tune withi n the region. (b) spotlighl
level decision to u e two-level or mullil evel logic. Those lower-level decision mereh rune
the size and performance of Ihe higher-level decision. Figure 6.79(a) illustrates thi co-n cpt.
An anal ogy might be a spotlight shining down on land. illu trated in Figure 6. 9(b>-
movina Ihe spotli ghtlefl or right at high altitude (higher-level decisions) has a larger impact
on which land region (possible solutions) is illuminated than d I wer-allitude mo\ emems
(lower-level decisions).
___ -..,..,.._....-- --c'- .__ __ __
356 6 Optimizations and Tradeoffs
Algorithm Selection
When attempting to implement a system as a digital circuit , perhaps the highest-level
design decision. havi ng therefore the most signficant impact on design cri teria like size.
performance. power. etc .. is the selection of an algorithm. An algorithm is a set of steps
thai solve a problem. The same problem can be solved by different algorithms. Algo
rithms for the same problem, when impl emented as a digit al circuit , may result in
tremendously different perfomlance andlor size. Some algori thms may simply be bener
than mhers (optimization without much tradeoff). while other algorithms may represent
tradeoffs between perfomlance, size, and other crit eri a. Select ing an algorithm for a
digi tal design problem is perhaps the hi ghest level of design, and can have the biggest
impact on design cri teria. For example, earli er examples showed vari ous impl ementations
of an FIR filter. But Lhere are many other algori thms for fi ltering very different from the
algorithm used in FIR. Some algori thms may provide hi gher-quality filteri ng at the
expense of more required comput ation. others may provide lower quality but need less
computation.
We illustrate algori thm selection using an exampl e.
EXAMPLE 6.23 Data compression using different table lookup algorithms
We wish ( 0 compress data being sent over a long-distance computer network in order to achieve
faster communicalion by sendi ng fewer bilS. One method for such compression is to use short codes
for frequently appeari ng data values. For example, suppose each data item is 32 bits long. We mighl
analyze the data we expect to send and fi nd the 256 most frequently appearing data values. We could
then assign a unique gbi t code to each of lhase 256 values. When sending data over the network, we
first send a bit indicating whether we are about (0 send an encoded 8-bit data item or a raw 32-bit
data item-if the first bil is 1. that might mean encoded. and a a mi ght mean raw. If al l the data
ilems being sent happen to be among the lOp 256 most frequent ones. then we' d be sending 9 bits per
data item ( I bit indicati ng wheLher encoded, plus 8 bits of encoded data) raLher than 32 bi ts per data
itcm-3 compression of nearly 4x, which could translate to about 4 limes fasl er communication.
We might design the encoder usi ng a 256-word
memory that stores the 256 most frequent values in sorted
order. from small est to largest in binary. The code would
then be the address of Lhal word in the memory. Figure 6.80
shows sample contents of such a memory, in hexadecimal.
The contents vary depending on the communi cating appli ca-
tions we are considering.
0:
1:
2:
3:
OxOOOOOOOO
OxOOOOOOO1
OxOOOOOOOF
OxOOOOOOFF
One algorithm for searching a li st of values in a memory 96: OxOOOOOFOA
is known as linear search. Starting at address O. we compare 128: OxOOOOFFAA
each memory word' s contents wi th the data item we are
look.ing for (known as the key), incrementing the addre", and
repeating unLil we find a match. at which point we treal the
add res at which there was a match as the encoded value. If
we get to address 255 and don't find a match. we will transmit
the raw data. The linear search algorithm is a slow way to
search a sorted list in memory. The algorithm requires 256
reads and compares for data items that aren' t in the memory.
which may translate to 256 cycles. For data items that arc in
the memory. we would require on average 128 reads.
255: OxFFFFOOOO
256x32 memory
..
"
'"
..
c
:0
Figure 6.80 Searching a sorted
memory for the key OxOOOOOFOA
- linear search requi res 97 reads!
compares. binary search onl y 3.
6.6 More on Optimizations and Tradeoffs 357
A faster algori thm for searching r f . .
first sort the list and th h ' ISla Items In a memory is known as binary search. We
en store l e list In the memo (\ . d I .
we start in the middle of the memo . ry \Ie nee on y son once). To look up an Item.
the key. If the content' s val ue is les ry. mealllng address 128. and compare that contents with
b h
s than 128. Ihen we know that the key. If 1\ eXists III the memory.
must c somcw ere between 0 and 127 S
and aga'ln com If h . 0 we go to the middl e of that range, meaning address 64.
pare. t e value there is les th h k
65- 127 So afl . h ' s an t e ey. we search 0 to 63; if greater. we search
h k 'I' b cr cae companson. we decrease the remaining possible range of addresses in which
t e ey les y one half. Halving 256 repeatedl y can onl y be done 8 times' ?56 128 64 32 16 8
4. 2. I. In other words. after at most 8 o . . . - . . . . . .
to 1, meanin the ke can' be . c mpansons. ve eIther seen the key, or shrunk the range
. g y t found III the memory. Billary search is 256/8 = 32 times faster than
Im,car when the key does nO{ exist in the memory. and roughly that much faster when the key
eXIsts 10 the memory too. : ct binary search only requires a sli ghtly smarter controller.
. We sec Ihat the chOice of the ri ght algorithm makes a big difference in performance for
thiS exampl e-much bigger a di ffere nce than determined by. say, the speed of the comparator
belllg used.
Power Optimization
Power is becoming an important design cri teria, both in high-end computing as well as in
embedded computtng. The unit of power i watts, which represents the energy per second
(I. e., Joules per second) .. ln high-end computi ng. like desktop PCs. servers, or video-game
consoles. the chtps tnstde a computer consume a 101 of power. causing the chips to
become very hoI. For exampl e, a typical chip insi de a PC may consume 60 wans-thiok
about touchtng a 60 wat! It ght bulb (but don' t actuall y touch one) to understand how bot
that is. Designing low-power chi ps reduces the need for expensive chip cooling methods
beyond si mpl e fans in hi gh-end computing, and also reduces the eleclriciry costs. which
can be quite significant for companies operating large number of computers.
In embedded computing, even simple cooling methods like fans are often not avail-
abl e-for example, your cell phone does not hav; a fan (if it did. people might find their
tie or scarf getting stuck in that fan). Portable embedded devices might have chip that
run at only I watt or less.
FurthemlOre, portable devices typically get
their energy from batteri es, and Lhus low power
chips are necessary to extend battery life-espe-
cially consideri ng the fact that batteries are not 0
improving fas t enough to keep pace with
.S;
"
"
"
,.,
8
increasing power consumpti on. By some mea-
sures. energy demand per chi p is doubling about
every three years (going along with Moore'
Law). Figure 6.8 1 plots such energy demands
compared to battery energy densities improving
at their present rate of only about 8% per year .
The increa ing gap shown translates to shorter
battery lifetimes for a device like a cell phone.
2
c
"
2001
energy
demand
banery energy
03 05 07
or translates to bigger batteries.
The most popul ar IC technology today use
CMOS transistors. and the biggest contributor to
Figure 6.81 Battery energy densit), is
impro\'ing slo" er than the in reasing
energ) demands of digital chips.
358 Optimizations and Tradeoffs
power consumpti on in CMOS is the switching of values from 0 to 1. The reason for this is
thm wi res aren't perfect. having capaci tance (we don' t put a capacit or there on purpose-it
is simpl y a result of the fact that wi res aren ' t perfect conductors of electricit y). Swi tching
the wire from 0 to 1 requires charging that capacilOr. Switching from 1 back to 0 causes
that charge to be di scharged to ground. That switching result s in power being consumed.
This power is known as dYll amic power. since thi s power comes from the changing of
signals (dynamic means changing). Dynamic power consumpti on of a CMOS wire is pro-
porti onal to the size of the capacitance (C) of the wire. multipli ed by the voltage (V)
squared. multiplied by the freque ncy at which the wire switches (f), namely:
(equati on for CMOS dynami c power consumpti on)
where k is some conswnt. To compute the dynami c power of a circuit. we would add up
the power computed by the above equation for every wire.
Looki ng at the above equation. one can clearl y see that lowering the voltage will
cause the grealC t reducti on in dynamic power. because of the voltage havi ng a quadratic
(squared) contributi on 10 dynamic power. Low-l evel circuit designers seek to reduce
power by creating transistors that operate at the lowest vollage possibl e. 10 reduce the V
term. and that have the small est wire capacitance possible. to reduce the C term. Digital
designers can therefore choose to uti lize gates that operate with a lower vol tage.
Unfort unately. lower voltage gates have a longer delay than hi gher voltage gates.
resulting in a tradeoff between power and performance.
Another way 10 reduce the dynamic power consumed by a circuit is to reduce the cir-
cuit' s clock frequency. which obviously reduces the f term for all the clock wires in the
circuit. as well as for the many other wires that change on each clock edge (like register
wires and the logic connected to those registers' output s). But again. reducing the clock
Frequency slows performance. resul ting in a tradeoff between power and performance.
The chi ef techni cal officer at a major chip design company IOld me in 2004 that, for
thei r company. "Power is enemy number one." The reason is that they had scaled their
voltage down nearly as low as possible. yet are pUlling more transistors on each IC every
year due to the shrinking of transistor sizes. meaning more wires switching. And capaci-
tance i n't decreasi ng at the same rate as transistor sizes. The resull is that an Ie
consumes more power as we put more transistors on the IC. which can result in problems
due 10 100 much heat and due to fast banery energy consumption.
Clock Gat ing (Advanced Technique)
Assumi ng the C and V term have been reduced to the extent possibl e using transistor-level
de ign techniques. power can be reduced furt her by reducingf . the Frequency at which wires
swi tch. One method for reducing such power is known as clock gating. Clock ga/i/lg is the
di sabling of the clock signal in regions of the chip that we know are not computing anything
at a given time. Clock gat ing aves power because a signifi cant percentage of the wires
switching in a chip are the wires that di stribute the clock to all the registers and flip-flops-
perhaps 200/c-30% of the power consumpti on is due to the clock signal switching
throughout the chip. Clock gati ng reduce f without slowing the clock frequency itself.
In clock gating. the clock is di sabled by A Ding the clock signal with an
enable signal that is in the machine. Recall that a register with parallel load inter-
nally reload, the ,ame va lue from the regi' ter', flip-fl ops back into the fl ip-flop on a
j
6.6 More on Optimizations and Tradeoffs 359
ri sing clock edge Preventi no the I k d
fl' fl . Id' . o C oc e ge from appearing keep the same values in the
IP- ops. Yle . II1g the same net result-the register's COlllent s don't change.
Clock gmll1g IS not someth' h' d' . . .
. II1g t at IgllaJ desloners ryplcally do themselves. Rather.
modern sYlllhesls tools may II . 0
. a ow us to speCify clock enable and disable u ing pecial
commands 111 each state These t I . .
. . 00 s must u e extreme cautIOn. becau e addine a gate on
a clock slonal delays the clock ' I '. .. -
. . .0 . signa. resulting 111 clock Signals 111 different parts of the
C" cull bell1g sli ghtl y different from one another, an effect known as clock skew. The tools
must perform carelul timing analysis to ensure that the clock skew doe not chanee
overall C"CUIl behavior. Furthermore, pUlling gates on a clock sional can reduce the
sharpness of the clock cdoes and . b 0 . . .
0 ." . so must e done careFull y. somellmes uSll1g speCial
gates. Nevertheless. the technique IS widel y used by low-power tools in practice.
We de monstrat e clock gati ng wi th an example.
EXAMPLE 6.24 Serial FIR filter with clock gating to reduce power
n4 ___ --' "--__ --' '--__ -'rL
We designed a serial FIR fi lter in Example
6.22. A five-Slate state machine controlled
the dalapath. The state machine loaded the
three XI registers only in me first slale. tale
SI . and loaded Ihe y reg regisler only in the
last Slate. state 55. Yet. lhe design routed
the cl ock signal 10 all four registers utilizing
four wires. labeled n I-n4 in Figure
6.82(a). Notice from lhe liming at
the lOp or the figure Ihat n1-n4 change
identi call y a the clock signal changes. and
remember that every such change consumes
dynamic power.
Figur.6.82 Clock gating: (n) the lock
signal switches e\ el) cycle n all the
heJvily bolded \\ ires. but the \ t reQi ters
are only loaded in state J.:J.IlO the Y reg
SI31C 5-so mmt of doc\... Itchin2. is
\\'Ilsled: (b) gnling the dock redu 's
hing on the lock. \\ In:,.
360 Optimizations and Tradeoffs
Figure 6.S2(b) shows 0 design using clock gati ng. The controll er gates the clock to the xt reg-
isters by selling si lO a in all states but 51. Likewise. the controller gates the clock to the yreg
register by scning 55 to 0 in all states but 55. Notice the significant decrease in signal switching on
Lhe clock's wires n} - n4. shown at the baLtom of Figure 6.82.
Low-power gates on noncritical paths
at all gates are equa ll y rast. Engineers that buil d gates rrom transistors can make a gate
faster by increasing the size of the gate's transistors, or by operating the gate at a higher
voltage, or by any or several other means. Thus, one
two-i nput AND gate might have a I ns delay, whil e
another two- input AND gate might have a 2 ns delay.
The laner AND may consume less power, due 10 its high-power gates
smaller size or lower voltage.
If we want 10 reduce the power consumed by a
circui t, we can build the enti re circuit using low-
power gates 10 achieve low power at the expense of
slower perfomlance. as ill ustrated in Figure 6.83.
Altematively. we can put low-power gates onl y
on the noncritical paths. such that we lengthen those
paths 10 have delays no longer than the cri tical pat h,
as shown in the foll owi ng example.
Q) low-power gates
~ on noncritical path
a.
low-power
gates
Figure 6.83 Using low- power
gates
EXAMPLE 6.25 Reducing noncritical path power with multilevel logic
In Example 6. 12. we reduced the size of a noncriti cal pat h by usi ng multil evel logic. In this
example. we instead reduce the power consumed by the noncritical path by using low-power gales.
Assume that nonnal gales have a delay of I ns and consume I nanowatt of power, and that low-
power gates have a delay of 2 ns and consume 0.5 nanowatts of power.
The left si de of Figure 6.84 shows the same circuit from Example 6. 12. havi ng a critical path
of 3 gate-delays. Assume that all the gates are nom1al gates, meani ng the cri tical path delay is 3 ns,
and the IOta I power consumption i s 5 nanowallS.
d
e
Figure 6.84 Using low-power gates on noncritical paths. Numbers inside a gate represent the gate's
del ay in nanoseconds, and the gate's power consumplion in nanowall S.
The bottom two AND gates lie on two noncritical paths having delays of only 2 ns. We can
thus replace those AND gates by low- power A D gates. The result is that the two paths' delays
lengthen to 3 ns. so become equal to the criti cal path delay, but not longer. The result is also that th.
total power becomes onl y 4 nanowatts instead of 5 nanowatt s (a 20% reduction).
6.7 Product Profi le: Digital Video Player/Recorder 361
6.7 PRODUCT PROFILE: DIGITAL VIDEO PLAYER/RECORDER
Digital Video Overview
In the 1990s, the di git izat ion of video became practical due to faster, smaller, and lower-
power digital circuit . Previously, video was largely captured, stored, and played using
analog methods. Di gi tized video works by sampli ng an analog video signal and trans-
formtng the sampl es to digital values. Such digiti zati on is simil ar to the audio digitization
example from Fi gure 1. 1, but with some additional work.
A video is actuall y a series of
qui ckl y di splayed still pi ctures, known as
frames, as shown in Figure 6.85(a). One
second of video mi ght consist of about
30 frames-the human eyes and brain
see such a rapid sequence or frames as a
smooth, conti nuous video.
A digital di spl ay may be di vided
into several hundred thousand tiny "pi c-
ture elements," or pixels. A typical size
might be about 720 across and 480
down. For each fra me, a digitized sampl e
captures several values for each pi xel,
li ke the intensity of the red, blue, and
green component s of the light at that
pi xel, convening analog measurements
of those intensities into di gital numbers.
The result is the representation of a digi-
ti zed frame as a (l arge) series of as and
Is, and the representat ion of a digitized
(a)
I ~ D G ]
1 P P
- . (b)
Figure 6.85 Video: (a) is a series of pictures. or
frames, with much interframe redundancy. (b)
can be constructed from I (intra) frames and P
(predicted) frames. shown with relative bit
encoding sizes.
video as a large seri es of digitized frames. Di gitized video can be transmined. stored.
repl ayed, and copied with much higher quality than analog video. Funhennore. digitized
video can be compressed, resulti ng possi bl y in higher quality video than analog video
transmitted or stored using the same medi um.
DVD-One Form of Digital Video Storage
Di gital video discs (also known as digital versati le discs). or DVD . store video in a
di gi tal format. First sold in 1997. DVDs replaced the analog video technology known as
VHS tape. DVD pl ayers appear in home entenainment centers, personal computers. auto-
mobi les (especiall y famil y-oriented vehicles). and even as stand-alone portable units. In
200 I , consumer electroni c companies introduced the first DVD recorder to market.
all owing individuals to record television shows to special recondable DVD . The popu-
larity of DVDs compared to the previously popular analog-based VH technology terns
from several advantages. includi ng bener quality video. no deterioration in "ideo quaJit)
over time. and the abili ty 10 jump directly to panicular pan in a ideo without having to
sequentially forward or rewind.
-- - - -- -== .".- -. ----- ~ -
362 Optimizations and Tradeoffs
DVDs store large amOu11l s of data on a thin reflective layer of metal. Although the
metal layer within a DVD looks fl at From our perspecti ve. there are actuall y bi ll ions of
tiny pi ts on the metal layer that store the data. These pits, or lack of pits (called lallds),
store the binary data on the DVD. Figure 6.86 shows how a DVD player reads the infor-
mation off a DVD. Using a very precise laser. the laser's light is focused onto the metal
layer withi n the DVD. The metal layer refl ects the light onto an opti cal sensor that can
detect iF the light is reflected off of a pit or a land. By detecting the difFerent regions, the
optical sensor creates a stream of binary values as it reads the DVD.
Optical
Pickup
... 010100101 100
---_.-/
figure 6.86 How a DVD player reads a DVD. The DVD player' s optical pickup element shi nes a
laser on the surface of the DVD. The DVD refleclS the laser back to an opti cal sensor. and the
optical sensor use the intensit y of the reflected laser to output the sequence of Os and Is stored on
lhe DVD. A video decoder circuit convens lhe bi nary data (0 a sequence of frames that humans
interpret as a moving picture.
The DVD' s binary data is organized into a eries of tracks that spi ral outward from the
center of the DVD. As the DVD player is reading the data, the laser and optical sensor must
slowly move outward from the center of the DVD to the outer edge. [f a DVD is dual-lay-
ered. the data on the di sk 's second layer is stored in a spiral that moves from the di sk's outer
to inner edge. The moti vati on for the second layer'S reverse spiral is to prevent the laser and
opti cal sensor from needing to reposition itself to the center of the di sk after focusing on the
second layer during a layer change. (You may have noticed a DVD pause momentari ly at a
certain point in a movie during a layer change.)
A single-layer single-sided DVD can store 4.7 gigabytes of data (meaning 37.6 giga-
bits), but that amount i not enough for a movie unl ess the dala is compressed. Consider
a video wi th a resolution of 720 pixels by 480 pixels, using 24 bi ts of information per
pixel. and di splayed at 30 frames per second. One frame would require 720*480*24 =
8,294.400 bits. or about 8 Mbits. One second of video. or 30 frames. would require
30*8.294.400 = 248,832.000 bits, or about 250 Mbits. A 100-mi nut e movie would thus
require about 250 Mbits/sec 100 min 60 seclmin = 1500 Gbit . But a DVD can only
hold 37.6 Gbits. To 'tore a movie. a DVD must the video in a compressed format.
6.7 Product Profile: Digital Video PlayerlRecorder
363
A DVD is onl y one of many different di gital video storage media. Digitized video may
be stored on any storage media capable of stori ng Os and 1 S in some form. such as on tape
rn many di gital video cameras). on a fl ash memory (used in digital cameras and cell
pones wtth Video recordlll g capability), on a CD. or on a computer hard drive. All such
media are typicall y still quite limited and thus require compression methods.
MPEG-2 Video Encoding-Sending Frame Differences Using 1-, po, and B-Frames
MPEG:2 video compression was defined and standardized by the Moti on Picture Expert
Group 111 1994 (as an Improvement over the 1992 MPEG- I standard). and is used in DVDs
digi tal television, and numerous other di gital video devices. MPEG-2 compression
range from 30: I to 100:1. or more. The compression ratio i determine by dividing the
number of btts of the dtgttt zed Video before compression, by the number of bits after com-
pression. So if a di giti zed video requires 400 gigabytes uncompressed but onl y 4 gioabyles
compressed. the compression rati o would be 400/4 = 100: I. ate that packing 1500 Gbits of
a movie into 37.6 Obits would require a compression ratio of 1500 Gbitsl37.6 Gbits = 40: I.
. The key observation leading to MPEG-2' s compression method is that typically very
htlle dtfference eXlsts between two successive frames in a video--in other words. video
typi cally has much interframeredundancy. For example. a frame may consist of a person
standlll g 111 front of a mOunt alll , as in Figure 6.85(a). The next frame (which represents
perhaps 1/30th of a second later) may be almost identi cal to the previous frame, except
that the person's mouth has opened slightl y. The next frame may till be almost identical.
with the person's mouth opened li ghtl y more. And so on.
Therefore, MPEG-2 does not merely encode each frame a a di stinct picture. Instead.
to take advantage of the interframe redundancy, MPEG-2 may choose to encode each
frame as one of the foll owing:
An I-j rome, or Intracoded frame. i a compl ete picture.
A P-jrame, or Predicted frame, is a frame that merely describes the difference
between the current frame and the previous frame. Thu . to derive the picture for
thi s frame, one must combine the P-frame with the previous frame.
For example, Figure 6.85(b) shows P-frames that contain only the differences from
the previous frame. A P-frame will obviou Iy require fewer bit than an I-frame. Example
frame sizes mi ght be about 8 Mbits for an I-frame. but only 2 Mbit for a P-frame. Thu .
instead of representing 30 frames as 30 compl ete pictures (30 [-fran,es). a compre ion
method mi ght represent those frames using the foll owing equence of frames: I P P P P P
P P P P P P P P PIP P P P P P P P P P P P P P. The compression ratio in this example
would thus be 8 Mbits * 30 I (2 8 Mbtts + 28 2 Mbit ) = 240 I 72 = 3. ': 1. Obviou Iv.
a pi cture created by combined predicted frames with a previous frame won't be a
represent ati on of the ongrnal ptcture, espectall y tf there is a lot of motion in the video.
MPEG-2 thus trades off some quality for compression.
To achieve even further reduction . MPEG-2 uses a third frame type:
a B-jrome. or Bidirectional predicted frame. is a frame that can store difference
from previous and jl/Illm frames.
B-frames can thus be even smaller than P-frames. n example B-frame size might be
just I Mbit.
31H 6 Optimizations and Tradeoffs
EXAMPLE 6.26 Computing compression ratios involving 1-, P- and B-frames
Assume a 30-frame MPEG-2 sequence has Ihe foll owing frame sequence: I B B P B B P B B P B
B P B B I B B P B B P B B P B B P B B. Assume average frame sizes of 8 Mbils for I-frames.
2 MbilS for P-framl?s. and I Mbit for B-frames. Compute the compression raLi o.
The compression ralio in Ihi s exampl e would be 8 Mbil s 30 I (2 8 Mbils + 8 2 MbilS +
20' I Mbils) = 240 I 52 = I.
The example sequence of frames is in faci fairly Iypical for MPEG-2 vi deo. wilh I-frames
occurring about every 12- 15 frames.
MPEG-2 video encoders may seek 10 create about 30 frames per second. With hun-
dreds of Ihousands of pixels per frame that must be compared with another frame,
MPEG-2 encoding requires a large amount of computation to determine whi ch frames
should be I. P. and B. and what should be the values for the P- and B-frames. Further-
more. much of that comput ati on will consist of the sallie comput ation performed between
corresponding regions of two frames. Thus, many MPEG-2 encoders utilize custom
digilal ci rcuits to parallelize those comput ati ons at the expense of more hardware size.
For example. Example 6.2 1 built a sum-of-absolute-differences circuit using more paral-
leli sm Ihan in Exampl e 5.9. at the expense of a larger circuit size. Such a circuit would be
useful in a video encoder needing to quickly determine whether a frame should be
encoded as a P- or B-frame. or instead should be encoded as an [-frame. Addi tional cir-
cuits might compute the actual values of P- and B-frames.
Likewise. an MPEG-2 video decoder might use circuit s to quickly recompose 1-, P-
and B-frames back into full picture frames-although decoding MPEG-2 video is easier
than encoding because the actual determination of P- and B-frame contents is only done
duri ng encoding; decoding merely needs 10 combine P- and B-frames with their sur-
roundi ng frames.
Transforming to the Frequencv Domain for Further Compression
DCT -Discrete Cosine Transform
We saw in the previous secti on that sending a frame (P or B) that is just the difference
from a previou or future frame can result in some compression. However, the compres-
sion ratios achi eved were onl y about 4; I. Recall earlier that a OYD needs perhaps a 40; I
compre ion ratio to slOre a full length movie. Thus. funher compression is needed.
MPEG-2 therefore funher compresses each 1-. P- and B-frame indi vi dually. The com-
pression method involves appl ying what is known as a discrete cosine transform to SxS
blocks of pixel values within each frame. The di screte cosi ne transfornl is also used in the
well-known ]PEG standard for compressing still images. like those in a digital camera. The
discrete cos;lle trall sform . or DCT, transforms infonnati on from the spatial domain to the
frequency domain. (The OCT is similar to another popular technique known as the Fast
Fourier Transform, or FFT, also used for translating to the frequency domai n.)
Trans lating to the frequency domain is a powerful concept. whi ch is widely used in
digital signal processing. To understand thi s concept. consider wanting to digital ly store the
analog signal shown in Figure 6.S7. usi ng the fewest bits possible. The signal is a I Hz
cosine wave with an amplitude of 10. To store the signal digitall y. we could sample the
signal at frequent intervals. perhaps every mi lli second. and record the measured signal value
6.7 Product Profile: Digital Video Player/Recorder
365
as a binary number, perhaps S-bits wide. One second
would thus result in 1000 S = SOOO bits. On the
other hand. we could just store the fact that the signal
IS a wave with a frequency of I Hz and an
of 10. If we store each of those numbers
as S-bl,' value, then we only need to store S + S =
16 bIts. Sixteen bits is far less than SOOO bits. time (s)
. Of course. nol all signals that we want to di gi-
tt ze are SImple cosine waves . But-and thi s is the Figure 6.87 Digitizing signals by
key idea underlying freque ncy domain representa- translaling 10 the frequency domain.
non-lVe. call applVx;lI/Q/e allY origillal sigllal as a SII/II of cosille lVaves of diffe I
freqllell cles alld all/plillides. If we break the original signal into small regions we ob:
even better a '. F ' n
a I . pproxltnat lon. or exampl e, we mi ght approximate one region as the sum of
Hz. cosme wave of amplitude 5 plus a 2 Hz cosine wave of amplitude 3. We mioht
approxImate another regi on as the sum of 50 different cosi ne waves of different frequ;n-
cles and amplitudes. The small er the region we consider. and the more different cosine
wave frequencIes we conSIder, the more accurate wi ll be our approximation to the real
sIgnal.
Rather than storing the actual frequencies along with the ampli tudes of the cosine
waves. we could mstead deCIde only to consider using panicular frequencies. such as:
I Hz. 2 4 Hz, S H z, 16 Hz, and so on. Then. we can simpl y send the amplitudes of
those pmlcular cosme waves: (5, 3, 0, O. 0, ".). Let' s refer 10 these ampli tudes as
coeffiCIents.
. The OCT in MPEG-2 convens an input 8xS block, whose val ue represent pixel
IntenSItI es. to an Sx8 block representing the coefficients of predetennined "frequencies."'
In the VIdeo domam, each frequency represents a di fferent block pattern. with low fre-
quency bemg an almost constant pattern and high frequency being a changing pattern
(li ke a checkerboard). The OCT determines a set of coefficients such that adding the pre-
detemuned patterns together wi th each pattern multiplied by it coefficient yields ODe
resultmg pattern very similar to the ori ginal input block.
The equation for a two-dimensional OCT applied to an 8x8 block of numbers i :
8 8
F(II , v) = I I D[x. I )Il)c0s ( lt ( - .';; I) \')
.r = 0)' = 0
C(hJ f = 0
11. olherw;se
The input is an 8x8 bl ock. Drx. yj. The outpul is another x block. \ ith F(II," ) com-
puting the coefficient at row u. column I' for the output block.
An MPEG-2 encoder may utili ze custom digital circui t' for fa t OCT computati n
Notice tlmt computing each coefficient requires evaluating the rightmost teml (let' ali
that term the inner ternl) 64 times. and that must be done for each of the IH c ffi ien
... - -- . --- - -- -.
366 Optimizations and Tradeoffs
mea ninn M*6-1 = -1096 eval uati ons of the tenll . And that inner term it se lf requires several
Funhermore, the OCT operates on 8x8 bl ocks. but in a 720x480 I-frame
there will be 5-100 such bl ocks. Thus. the OCT for one I-frame could require 5400*4096
= 22 milli on computati ons of the inner teml. And that encoding may have to occur at 30
frames per second. You can begin to see why an MPEG-2 encoder may need to use
CUStolll digita l circuit s to comput e the OCT quickl y. using extensive parall eli sm and pipe-
lining to obtain the necessary performance.
The OCT computation can be sped up funher by precomputing the cosine terms of
the inner term. Notice that the OCT computes two cos ines based on the input values of /I
and x and the input values of v and y. However. because the OCT operates on 8x8 blocks,
lhe vari ables Ii, v, x. and y only range in value from 0 to 7. Therefore, we can precompute
the M poss ible cos ine va lues needed for the OCT computati on and store those values in
an 8x8 table, whi ch may be programmed into a ROM. We can then rewrite the OCT
transfoml as follows:
8
F ( II. \.) = L L D[x, y ] eos [ x. lI] cos [ y, vl
x = 0)' = 0
Using a ROM to store the precomput ed cos ine va lues speeds up the computation of
the inner term of the OCT.
Quantization
Trans lating to the frequency domain using the OCT does not directl y perform compres
sion-we merely convened an input 8x8 block to an output 8x8 block. That output 8x8
block represents amplitudes of panicular cos ine wave frequenc ies. We can achieve com
press ion by rounding those amplitudes. such that we use fewer bits to represent the
amplitudes. For exampl e. suppose we use 8 bits to represent the amplitude, meaning we
can represent amplitudes ranging from 0 to 255. Suppose we only represent even ampli
tudes. meaning 2. 4, ... . 254. In that case, we can drop the lowe t order bit. in the
representation of the amplitude. resulting in onl y 7 bits. The decoder would merely
append a 0 to the 7-bit number to obtain an 8-bit number again. For example, the 8-bit
number 00001111 would be compressed to the 7-bi t number 0000 111 with an implicit 0
in the eighth bit. The decoder would expand that 7-bit number back to the 8-bit number
0000 111 O--not ice that the decoded number is sli ghtl y different than the original , being
1-1 rather than the original 15 (an exampl e of why MPEG-2 compression loses some
image quality) . We could take thi s rounding concept further, onl y representing amplitudes
that are multipl es of 4 (thu dropping the two lowest order bits. yielding a 6-bit represen
tati on). or are multiples of 8 (dropping the three lowest order bits. yielding a 5-bit
representation). 0000 IIII mi ght be represent ed as 0000 I wi th three implicit Os, tilu
decoded back to 00001000. The decoded number of 8 is different from the original
number 15 due to the rounding.
The rounding described above. achieved by droppi ng low order bit s to achi eve com
pression. i, known a, qIlQl/ti1.l1tiol/ . otice the Iradeoff- more rounding yields more
compre"ion. at the expense o f accuracy. Fort unately. 11//l1/{/lIs dOli 'tllot ice sl/eh rolllldillg
ill the hixhlreqllell cy COIIIIJOIIeIll.1 of the pict"rc-Qur vi,ion ju,t i'I1 ' t Lhat precise. We
6.7 Product Profile: Digital Video Plaver/Recorder 367
also don' t notice mino . .
h
. . r I erences In the hi gh-frequency components of sound-Qur
eanng Isn't that precise Th' k f "
b uk I . In 0 a very hi gh-pitched sound, so hi gh it could perhaps
re ' g ass .. You probabl y couldn' t tell the difference between two s uch high-pitched
ounds of sli ghtl y different f' . . . .
. requencles-they are both Just hi gh. LikeWise. Our eyes can't
detect sli ght rounding of color values in a hi ghl y complex scene. So MPEG-2 applies
quanti zat IOn more aggressively on the OCT output block's hi gh-frequency coefficients
than on the low-frequency coefficient .
After quanti zati on, the 64 va lues in the 8x8 bl ock are treated as a li st of 64 numbers
Those 64 numbers are then run-length encoded. RIIII-length ellcodillg is a compres
method that reduces consecutive occurrences of zeros by a number indicating the number
of consecuti ve zeros rather than representing those zeros themsel ves . For example. con-
Sider wanung to represent the foll owing 5 numbers: 0, 0, 0, 0, 24. If each value is 6 bits
the 5 number require5*6 = 30 bit . On the other hand, we could just send a pair of
numbers, the first IIldl catlng the number of leading zeros, the second indicati ng the
nonzero number. So 0, 0, 0, 0, 24 would be encoded as (4, 24)-4 leadi ng zeros. followed
the number 24. If each value is 6 bi ts. the run-length e ncoded version requires only
- 6= 12 blls. Any of numbers could simil arl y be replaced by a sequence of
number p3lrs, each pUlr replacing a sequence of zeros and a number. The sequence O. O.
0, 0,24, 0, O. 8, O. 0, 0, O. 0, 0, 16 could thus be replaced by three pairs: (4, 24). (2. 8),
(6, 16), reducing the number of bits from 15*6=90 down to 6*6=36 bits. Note that the
number of zeros at the beginning of the sequence or in between nonzero numbers may be
zero, and the last number may be zero. For exampl e, the sequence 2, 0, 0, 63. 2, 0, O. O.
0, 0 could be encoded as (0,2), (2, 63), (0,2), (4,0).
Run-length encoding achi eves good compress ion only if there are many 0 in the
of numbers. Fonunately, the nature of the OCT leads to many 0 numbers (not
all cosme Jrequenc, es are to approximate a signal region. 0 tho e frequencies
wlil have 0 coeffiCients). espeCiall y after quanti zation (many coefficients are ' ust mall
numbers. which become 0 during quaniti zati on). Thus, appl ying run-Ienoth J enCoding
after quanti zation leads to funher compression. e
EXAMPLE 6.27 Computing compression ratios involving Quantization and run-length encoding
Continuing Example 6.26. assume that the 30-frame MPEG-2 sequence has the same frame
sequence and average sizes as that exampl e. bUI that each frame is further compressed by OCT con-
version to the domain fol! owed by and enCoding. A Sume the
DCT OUlput block conSI sts of 64 8bll numbers. thai quantization reduces the average number size
to 5-bil numbers. and that run-length encoding reduces the resulting number sequence ize to 30%
of its size.
The compression ratio would be 8 Mbits * 30 I 5/8 * 0.30 *(2 1bilS + * _ Mbi +
I Mbits) = 240 19.7 = 25: I.
Huffman Coding
After run-Iengtll encoding. each block consists of a sequence of numbers. me numbers
wi ll occur in that equence more frequently than others. HUffman codillg i a method of
reducing the number of bll. reqUIred to represent a et of values, by creating shoner encod-
ings for the frequentl y occurring and longer encodings for the Ie ' \-alue.
368 6 Optimizations and Tradeoffs
Huffman codi ng. a form of encoding known as entropy encoding, is another powerful
concept in digital data compression. Suppose you wi sh to represent an original sequence
of 16 numbers O. 3. 3. 31. O. 3. 5, 8, 9. 7. 15, 14.3. O. 3. O. Assuming 5 bits per number,
a straightforward binary encoding would be: 00000 000 11 000 11 11111 00000
000 11 00 1 a 1. and so on. for a total of 16*5 = 80 bit . We can reduce this total by first
observing that there are only 9 uni que symbols: 0, 3. 5, 7, 8. 9, 14. 15, and 31. We really
only need 4 bits to uniquely identify each symbol. We could thus assign the nine unique
symbols to 4-bi t encodings using the foll owing definiti ons: 0=0000, 3=0001, 5=0010,
7=00 11, ... , 31=1001 (note that the encodings are no longer the binary number represen-
tati ons of the ori ginal numbers). Thus. the ori gi nal sequence of numbers (0, 3, 3, 31 , 0, 3,
5, ... ) would be encoded as 0000 0001 0001 1001 0000 000 1 0010 etc. , for a
tot al of 16*4 = 6-1 bits. The key observati on here is that we can encode numbers using
any arbitrary unique bit patterns we desire, as long as the encoder and decoder are both
aware of the encoding definiti ons.
We can take this definiti ons concept a step fu nher. by using encodi ngs of different
lengths. Observing that 3 and 0 occur more frequently than the other numbers, we might
give 3 and a shoner encodings. So we might create the following encodi ng definitions:
0=00. 3=10. 5=010. 7=0110.8=0111, 9=11 00.14=1101. 15=111 0.3 1=1111. How
these definitions were created is just beyond the scope of this di scussion, though it 's really
not hard to learn. Notice that the encodings are such that the shoner encodings do not
appear at the left of any of the longer encodings. For example. 00 does nOl appear at the left
of any of the longer encodings, like 010, 011 0,0111, etc. This feature all ows the decoder
to know when it has reached the end of the code word-when the decoder has seen 00, it
knows it has found an encoded a (because no other encoding stans wi th 00); when it sees
10. it knows it has found a 3 (because no other encoding stans with 10). But when the
decoder sees 01, it must look at the next bit, and if it sees 010, it knows it has found a 5
(because no other encoding stans with 010). Using this variable-length encoding scheme,
the original sequence (0. 3, 3, 31. O. 3, 5, .. . ) would be encoded as 00 10 10 1111 00
10 010 etc. We have insened the spaces just for readabili ty; the actual encodi ng would just
be 001010 1111 00 1 00 1 0 etc. The total number of bits would be 4 * 2 (for the four Os,
encoded with the two bits 00) + 5 * 2 (for the five 3s, encoded with the two bits 10) + 1*3
(for the one 5, encoded with the three bits 010) plus 6*4 (for the six remai ning numbers 31,
8. 9, 7. 15, and 14, each encoded as 4 bits), totaling 45 bits-much reduced from the orig-
inal 80 bits required by the straightforward binary encoding.
Huffman coding achieves good compression when some numbers occur much more
frequently than other numbers in the sequence of numbers to be encoded. Fonunately,
thi s is indeed the case after OCT, quantization, and run-length tasks are performed on a
bl ock of a frame. For example, there may be plenty of as, Is, 2s, etc. , and fewer occur-
rences of hi gher numbers.
EXAMPLE 6.28 Computmg compression ratios involving Huffman codll1g
Continuing Example 6.27, assume that pairs of numbers after quanlizalion and run- length encoding
are Huffman coded, and that such encoding reduces the number of bil'> by 50%.
The compression ralio would Ihus be 240 I 0.50' 9.7 = 50: I.
6.7 Product Profile: Digital Video Player/Recorder 369
Summary
Summarizing MPEG-2 video enCoding:
The use of 1- P- d B f ' .
. f . ' , an - rames achI eves compres Ion by nOl resending redundant
10 ormatIOn of Successive frames, but rather JUSI sending the differences.
OCT transforms 8x8 blocks of frame to the freq uency domain. which doesn' t
ac leve compression itself, but rather enables compression in lhe next steps.
Quanti zation achi eves funher compression by reducing the number of bits needed
to represent the OCT coefficients, through rounding.
Run-l ength encoding achi eves further compression by replacing sequences of zero
coeffiCients by a number indicating the number of such zeros.
Huffman cod' h' f
. 109 ac leves unher compression by encoding frequently occurring
coeffiCient numbers with shorter encodings than less frequently OCCurring coeffi-
cient numbers.
The sequence of steps is shown graphicall y in Figure 6.88.
... 010t0010t100101010 --.J
,----...._10101111010101001oot - !
.. ! t001001oo0t010t11101 L--;====,--.!.,
101010001000t0111011...
* Uncompressed
OCT
g digital video
MPEG-2 video
(compreSsed)
L.....:.:..=.."'--_r-... 0101OO1011OO..
Figure 6.88 MPEG-2 video compression encoding overview.
Our example compression rali o calculati ons yielded a ratio of about 50: I. In fact, the
compressIOn ratio can be varied by varying each of the above steps. We can use fewer
I-frames to achieve even compression at the cost of degraded video quality. or
more I-frames for Impro.ved Video quality at the cost of more bilS. Likewi e. we can vary
the amount of quantization to trade off quality and compression ratio. Becau e a typical
movie Will have some slow-changi ng scenes and other rapidly changing cenes. and some
complex colored frames and other si mpler frames. the compres ion ratio for different
parts of a video may actually vary. Notice lhe permeating presence of cradeoffs (primaril
between quality and compression ratio) throughout MPEG-2 encoding. y
--------.l Huffman h
=-:-:-::::l decoding
.010tool0ltOO ....
L
r-o-
MPEG2 Video
(compressed)
I Uncompressed 8' fl
J t digital video
I
InvelSe 1-1 ...Ot0100t0110010t010
quantization - "I
! 'is
I
InvelSe L- tOOtOOtOOOtOt011110t Q>
. OCT I - t010tOOO1000tOlttOt1 . Cl 8
L.....:.-
Figure 6.89 MPEG-2 video decoding overview.
[IJ
a 00
=
370 Optimizatio ns and TradeoHs
An MPEG-2 decoder merely needs to appl y the above steps in reverse, as ill us-
tra ted in Fi gure 6.89. to convert an MPEG-2 stream of bit s back into a seri es of
pi ctures. or video.
Clearly. MPEG-2 encodi ng and decoding require a lot of computations performed at
speeds fast enough to create smoot h-looking. good-quali ty vi deo. Custom digital circuits
can help achi eve those required speeds.
6.8 CHAPTER SUMMARY
In this chapt er. we introduced (Section 6.1 ) the idea that sometimes we can improve a
parti cular design cri teri a without hurting other cri teri a (optimi zation). but usuall y we can
improve one criteri a at the expense of another cri teri a (tradeoff). We descri bed (Section
6. 2) the problem of two-level size minimi zation. int roducing K-mups as a visual method,
and then describing automated heuri sti cs for two- level as well as multi level logic size
minimi zation. We discussed (Section 6.3) methods for optimi zation and tradeoffs in
designing sequential logic. including state mini mizati on. state encoding, and Moore
versus Mealy type FS Is. We hi ghl ighted (Secti on 6.4) several alternati ve methods for
implementi ng some datapath components. incl uding a faster adder using carry-lookahead,
and a small er multipli er using sequenti al multipli cation. We described (Section 6.5)
methods for RTL optimizations and tradeoffs. including the powerful concepts of pipe-
lini ng and concurrency as means of achieving para ll el executi on-a key purpose of
custom digit al design. We also described the RTL methods of component all ocation,
operator binding, and operator scheduling. We briefl y menti oned (Section 6.6) some
higher-level methods. includi ng the general idea of serial versus concurrent computation,
and the selection of effi cient algorithms. We also int roduced some basic concepts of
power reduction. incl uding clock gating, and using low-power gales.
A you can see from thi s chapter. there are many methods for improving our design .
Yet. thi s chapter just scratched the surface of such methods. An entire mul tibillion-doll ar-
per-year industry exists that specializes in mak.ing aut omated tools for converting behav-
ioral descriptions of desired system functionalit y into highl y optimized circuit
impl ementations- that industry is known as Electroni c Design Aut omati on (EDA) or as
Comput er-Aided Design (CAD). Thi chapter hopefully gave you enough exposure at
least to understand the basic idea behind circuit optimi zati on at various levels of design
abstracti on. ranging from the gate level up to the RTL level and beyond.
6.9 EXERCISES
SECTIO 6. 1: INTRODUCTION
6. 1 Defi ne the "optimi zation" and "tradeoff." and provide everyday examples of each.
SECTIO 6.2: COMBI ATIO AL LOGIC OPTIMIZATIO S A D TRADEOFF'S
6.2 Perform two- level logic , ize optimizati on for the equati on F ( a . b . c) - a b ' e + abc +
a ' be + a be' u, ing (a) algebrai c method . (b) a K-map. Ex pre" the an" ver, as sumof
product,.
6.9 Exercises 371
6.3 Perform two-level logic s" ".
K E Ize optmll Zatl on for the equati on F(a.b.e) = a + a'b'c + a'e using a
-map. xpress the answer as sum-of-producLS.
6...4 Perform [wo-Ievel louie s' ' "
b ' d ' 0 lze optllnlzall on for the equati on F (a bed) - a' be ' +
_ a e + a bd using a K-map. Express the answer as
two-level logic size opt imizati on for the equation F (a . b . e . d)
usmg a K-map. Express [he as sum-of-products.
ab +
6.6 Perform two-level logic size opti mization for the equati on F ( a . b . c) - a' b ' c + a be.
assummg t,hm IIlput combinntions a ' be and a b ' c can never occur (those two mintenns rep-
resent don t cares). Express the answer as sum-of-products.
6.7 Perfo nn Iwo- Ievel logic size opti mizati on fo r the equation F (a bed) : a ' be ' d +
a b ' cd ' . assuming that a and b C'in never bOlh be 1 at the and thilt e and d can
never both be 1 at the same time (i.e. , there arc don' t cares).
6. 8 Consider the equation F (a be) : a ' e + a e + a ' bU' K d . .
f h f II . _ , " . a -map. etennme whI ch
o , t : ? are implicants (but not necessari ly prime irnpli canls) of the equation:
abe . a b . a ' be . a ' e . e . be . a ' be ' . a ' b.
6.9 Repeat the previous problem. but this time determine whi ch of the terms are prime impiicams
of the function.
6. 10 Forthe equation F (a . b . c) = a ' e + a e + a ' b. delermine all prime implicanlS and all
essential pnme Impl lc3nts of the function.
6.11 Forthecquati on F(a . b . c . d) 3 ab ' e ' + abe ' d + abed + a ' bed + a ' bed'
determine all prime impli cants. and all essenti al prime implicanlS. .
6. 12 the problem, the heuristic method of Tabl e 6. 1 to obtain a two-level size opti-
mi zed equation expressed in sumof-products form.
6.1 3 Use repeated appli cati on of the expand operation to heuri sti cnll y mi nimi ze the equation
F (a ,.b.' c) : a ' b ' e + a ' be + a be. Try expanding each term for each variable. Gi ve
the mlnlll1l zed equati on in sum-of-products form.
6. 1-' Use repe:lIed applicmion of [he expand operation to he uri Ii all y mi nimize the equation
F(a .b . e . d . e) = abede + abede ' + abed ' e'. Try expandingeachtermforeach
variable.
6. 15 Using algebraic methods. reduce the number of gate inputs for the foll owing equation b} cre-
ating a mult ilevel circuit : F(a . b . e . d . e . f . g) abede + abed ' e ' fg +
a bed ' e ' f ' 9 , . Assume on I)' AND. OR. and OT gates will be used. Draw the ci rcuit for
the original equati on and for the multile\'el ci rcuit. and clearl y li st me delay and number of
gate inputs for each ci rcuit.
SECTION 6.3: SEQUE TIAL LOGIC OPTI:MIZATIONS AND TRADEOFFS
Do. 6. 16 Reduce the number of stales
P L U S for the FSM in Figure 6.90 b)'
eliminating redundant
by using an implic3l ion table.
xy=OO xy=10
xy=10
Figure 6.90 FS I e\ rull ple.
- - ---.-.. -
xy=Ol
372 Optimizations and Tradeoffs
6.17 Reduce the number of states Inpuls: x: OulpulS: y
for the FSM in Figure 6.91 by
using an implication tnble.
().IS Reduce the number of Slates
for the FSM in Figure 6.92 by
using an table.
fi.19 Compare the logic size (as
number of gale inputs) and lhe
delay (as number of gate-
delays) of a straightforward
lbit bi nary encoding of the Figure 6.91 Sequence detector for bit patterns "01"' and "10"
FSM in Figure 6.93 with a
3-bi t output encoding and with
a one-hoI encoding of the
same FSM.
6.10 Compare the logic size (as
number of gate input s) and
the delay (as number of gate-
delays) of a minimal bit width
state encoding and an output
encoding for laser-based dis-
tance measurer FSM shown in
Figure 5.20.
6.2 1 Compare the logic size (as
number of gate inputs) and tlle
delay (as number of gale-
delays) of a minimum binary
encoding (if not possible. indio
cate why). output encoding.
and one hot encoding of the
FSM in Figure 3.39.
Figure 6.92 FSM exampl e.
Inputs;::e: out: : w,X,Y r---.
wxy=100 wxy=010 wxy=001 wxy=OOO
Figure 6.93 FSM example.
6.22 Conven the Moore FSM for the code detector circuit shown in Figure 3.46 10 the nearest
Mealy FSM equivalent.
6.23 Conven the following Moore FSM 10 the nearest Mea ly FSM equivalent.
a=O
en=O
Inputs:S,r
Outputs: a,en
6.24 Conven the fOll owi ng Mealy FSM to the
nearest Moore equivalent
6.25 Conven the following Mealy FSM to the
nearest Moore equi valent.
6.9 Exercises 373
Inputs:s,r.
Outputs: u,y
Inputs: g,r
Outputs: x,y, z
g'r'/xyz=010
glxyz=111
SECTION 6.4: DATAPATH COMPONENT TRADEOFFS
6.26 Trace the execut ion of the 4-bit carry lookahead adder shown in Figure 6.59 when a = II and
b = 7.
6.27 Trace the executi on of the 4-bit carry-lookahead adder shown in Figure 6.59 when a = 5 and
b = 4.
6.28 Trace the executi on of the 16bit carry-lookahead adder shown in Figure 6.59 when a = 43690
and b = 21845. Do not trace internal behavior of the indi vidual 4-bit carry-lookahead adders.
6.29 Design a 64-bit hi erarchical carry lookahead adder using 4-bi t carry-lookahead adders. Wbat
is the total delay through the 64-bi t adder? How much faster is the carry-lookahead adder
compared to a 64-bit carry rippl e adder (compute as slower ti me/faster time).
fi .30 Design a 24-bit hi erarchical carry-lookahead adder using 4-bi t carry-lookahead adders.
6.31 Design a 16 bit carry-select adder using 4-bit ri pple carry adders.
SECTION 6.5: RTL DESIGN OPTlMlZA TlONS AND TRADEOFFS
6.32 The adder tree shown in Fi gure 6.94 is used 10 compute the sum of eight inputs on every clock
cycle. where the sum is S - R + T + U + V + W + X + y + Z.
3 7 ~ Optimizations and Tradeoffs
(a) Design J pipcli ncd version
of the adder tree (0 maxi-
mi ze the speed at which we
can operate our clock input
elk.
(b) Create a timing diagram.
6.33 Assume the delay of an adder
is 3 IlS. How fast can we
execut e the ndder tree shown in
Figure 6.94 and how fast can
we execute the pipelined adder
tree designed in Exercise 6.32?
6 3 ~ What are the latency and
throughput of the pipelined
adder tree YOli designed in
Exerci se 6.32?
elk
L-----_l> s
Figure 6.94 Adder tree used to compute the sum of
eight inpUls every clock cycle.
6.35 (a) Convert the foll owing C-l ike code lO a high-level slate machine.
(b) Use the RTL design process shown in Table 5. 1 to convert the high-level tate machine for
the C code to a controller and a datapath. Design the datapath to structure, but design the
controller to the point of an FSM only.
(e) Redesign your datapath to all ow for concurrency in which four multiplications and two
addit ions can be performed concurrentl y.
I npu t s : byte a[256] . b[ 256]
Out put: by t e sum . byt e e[256 ]
MULT:
i nt i =0 :
int s um = 0 ;
whil e ( i < 256 ) {
e[i ] = ali] * b[ i] ;
sum = sum + e[i] :
i ++:
6.36 Redesign the data path and controll er designed in Exerci se 6.35 by allowing up to four concur
rent additi ons and inserting pipeline regi sters to your datapath and updating the controller if
necessary. A suming an adder ha a delay of 3 ns and a multiplier has a delay of 20 ns. how
long will the circuit take to finish its computation?
6.37 (a) Convert the following C-li ke code to hi gh-leve l state machine.
(b) Use the RTL design process hown in Table 5. 1 to convert the high-level state machine for
the C code to a controller and a datapath. Design the dawpath to structure, but design the
controll er to the point of an FSM only.
(C) Redesign your datapath to allow for concurrency in which three compari sons, three addi-
tions. and three multiplications can be performed concurrent ly.
6.9 Exerc ises
I nput s : byte a [256 ] . byte b[256] . byte ey
Ou t put: by t e sumx . by t e sumy . byte e[256]
MULT_OR_ADD:
i nt i -O :
i nt s umx 0:
i nt s umy - 0 ;
whi le( i < 256 ) {
if ( a li ] > 128 ) I
e [i] = al i] * b[i ] :
s umx = sumx + e[i ] :
el se
e[i J
sumy
i++;
a [ i] * ( b [i] + ey) :
s umy + e [i ] ;
375
6.38 Redesign the datapath and controller designed in Exercise 6.37 by allowing up to nine concur-
rent additIOns and inserting pi peline registers 10 data path and updating the controller if
necessary. Assuming a comparator has a delay of 4 ns. an adder has a delay of 3 ns. and a
multiplier has a del ay of 20 ns, how long wi ll the circuit take to fini sh its computation?
6.39 Given the hi gh-l evel state
machine in Figure 6.95.
create two di fferent
designs: onc design opti -
mized for minimum
circuit speed and one
sO = sO cO 51 = 51 +sO"cl 53 = 52+s0 c1 F = 53 54-c2
s2 = sO x2 54 = 50 c1
design opt imized for
minimum circuit size. Be Figure 6.95 High-level Slate machine for Exerci se 6.39.
sure to clearl y indicate the component allocation. operator binding. and operator scheduling
used [0 design the two circuit s.
SECTION 6.6: MORE 0 OPTIMIZATIO SAND 'ffiADEOFFS
6 ~ 0 Trace through the execution of the binary search algorithm when searching for the number 6
in the foll owing sorted li st of 15 numbers: I, 10,25. 62. 7 ~ 75. 80. 4. 5. 6. 7. 100. 106.
III, 121. How many compari sons were requi red to find the number u ing the binllr) search
and how many comparisons would have been required using a linear search?
6A I Trace through the executi on of the binary search algorithm when searching for the number 99
in the following li st of 15 numbers; I, 10.25.62.-74, 75. 80. 4. -. 8 7 ~ 99. 100. 106. III.
121. How many comparisons were required to look for the number u ing the binllr)' search
and how many comparisons are required using II linear search?
6A2 Trace through the execution of the binary search algorithm when searching for the number L I
in the li st of numbers from the previous example. How many comparisons were required to find
the number using the binary earch and how many comparisons are required using a linear
search?
6AJ Using the list of 15 numbers from Exercise 6.41. how many numbers ould \\e find faster
usi ng a linear search algori thm compared with the binary search algorithm?
376 Optimizations and lradeoHs
SECTION 6.7: POWER OPTIMIZATION
6A-l Given (he logic gates shown in Figure 6.96, optimize the foll owing circuit by reducing power
consumption without increasing the circuir' s dclny.
Figure 6.96 Logic gal e li brary. 2/0.5
format means 2 ns delay/O.S nw power.
a
b
d
(l A5 Given the logic gates shovm in Figure 6.96. optimi ze the foll owing circuit by reducing power
consumpti on wi thout increasing the circuit's delay.
b
6..t6 Given the logic gates shown in Fi gure 6.96. optimi ze the foll owing circuit by reducing power
consumpti on without increasing the circuit 's delay.
a
b
h
6A7 Gi ven the logic gates shown in Figure 6.96. optimi ze the following circuit by reducing power
consumption without increasing the circuit's delay.
a
b
~ DESIGNER PROFILE
Smila has degrees in
Electronics Engineeri ng
.md in Computer Science.
and has worked in the
digital design fi eld for
nearl y a decade. She spellt
a lot of time thinking about
the choice of a coll ege
maj or. Whal major should
I invest my focus. energy.
hean. and soul for what
will be some of the 1110St
productive years of m)' li fe?" She chose engineering. for
several reasons. 'Fi rst. engineering is a career in itsclf-
unli ke some other majors. jobs speci fi call y for
engineering majors arc out there. With engineeri ng. I
would le:ml the 1110 S1 va luable and uni versal of ski lls:
problem solving. Second. engineers have many options.
because engineers are highl y valued for their problem
solvi ng ski ll s by other professions, such :'IS management
consulling. marketing. and investment banking. And
electrical and computer engineers cun choose from a
mnge of industri es in which to work: telecommunicati ons.
image proccssing. mcdi cal devices, Ie fabri cati on. and
even banking. This was a phenomenal di scovery for me!"
Smit <l continued her educati on by doing graduate
studies in Computer Science, researching methods for
aut omati call y designing integrated circuil s (I e) or chips-
"a fascinati ng fi eld because it involves a mix of hardware
and soft ware skill s and knowledge. I conti nued in this
profession aft er school and worked for a company that
develops Computer-Aided Design (CAD) soft ware used
by hardware designers who work wilh a type of chip
call ed an FPGA (Fi eld Programmable Gate Array).
FPGAs can be used for an amazing vari ety of appli cati ons
all the way from high-speed tel ecommunicati on chi ps 10
low-speed and low-cost chips thaI go into electroni c toys
and games. Our software saves designers many months or
even years of time. In fact. without our sofl ware, it would
be absolul ely impossible for peopl e to design most chips
even if they had a decade or more to do it:'
Smita (shown mountain climbing above) loves her
work. ' My work is inl ellectuall y stimulating and I have
an opportunity to innovate, create. and actuall y build
something reall y useful.' She al so enj oys the peopl e-
aspect of her work. '1 work in team, of dynamic people
because 111 0s1 proj ects, hardware or software. are done
in leams of 3- 8 peopl e these days. The peopl e on my
6.9 Exerc ises 377
team are also my friends and it' s a lot of fun to work
with them."
I n her decade of work so far. Smita has taken on some
management responsibi lities. "As manager of one of the
four products that my company develops. I pl aya variety
of different roles. I work with my team of 7 soft ware
developers to determine what features to build in the
product and how best to build those features. I work with
the marketing and sales team to understand what the
customers need and how best to message and position our
product. Finall y, I work with other groups that are
involved in releasing a product - technical publ ications.
appli cation engi neering. and product engineering. The
diversity of my job makes it very interesti ng.
Smita enjoys the respect thm engi neers receive. "As an
engineer. I am highly respected by customers, partner
companies, and by our market ing and sales organizations
because I have a deep understanding of our products. I
reall y know my stulT since I built it and I get recognized
for it : And regarding the pay: '1 get compensated very
well for my skill s: She also likes the lifestyle: "I get in to
work around 10 a. m. and leave around 7 p.m. I don't have
earl y morning meetings unlike the folks in marketing and
sales, and I can work from home once a week or more
often if I wish. Thi s is also a great career for women - I can
take time off and return to my job without much penallY
when I have children. I can tailor my work hours as I Deed
as my children are growing up. Lastl y. I realize that I can
move from engineering to other functions such as
marketing and sales. but not the other way around! That's a
great benefi t of being an engineer - more option :.
Smita recommends engineering and computer science
students focus on certain t h i n ~ while in college.
Fi rst. get a good understanding of both hardware and
software. Systems are highl y integrated today and there are
very few compani es that develop one without payi ng very
close attention to the other. For instance. though I write
software. I need to completel y understand the hardware for
which il will be used. My husband. on the other hand.
designs telecommunication chips but works very closely
with hi s oft ware team. especially during the ini tial design
stages when they decide what to implement in hardware
versus software and how to design the hardware interface
so that the software algorithms work efficiently:
So, what do I mean by a good understanding of
hardware and software? In software. 1 think it is mosl
important to develop good software habits. Treat your
program li ke a well -landscaped garden-you want it
378 6 Optimizations and Tradeoffs
DESIGNER PROFILE (continued)
beautiful and weed- free. Understand claw well
and know when ant: is morc appropri ate than the ot her.
Organize your code, be di sciplined. cross the Ts and dOl
the h. document diligently. have your code reviewed by
friends. and finall y. don'! be afraid to throwaway code
and rewrite it if you disCQVC! f a better way,"
"In hardware. understand the b'1Sics of logic design and
then make sure you also understand the capac iti ve.
induct ive. and resisti ve properties of circuit s since these play
a big role in designing the hi gh-speed circuits of today."
"Other than these hardware and soft ware skills, become
adept at math and analysis. Learn to frame problems and
break them down until you can sol ve them. Be
experiment al and try diffcrcllI tools and methods. Have a
hypot hesis and thcn go about proving or disproving it. If
YOll haven' t already, you wi ll soon di scover thai
cngineering is nOI onl y fun. bUl also provides you with
many fulfilling career opportun.ities-so stick with it and
make the most of it !"
7
Physical Implementation
7.1 INTRODUCTION
A di gital circuit design lhat we've created bUl just drawn out . perhap wilh pencil on
paper or as.a 6gure in this book. is just a drawi ng. Somehow. we must event ually imple-
ment that dt gttal circuit dc ign on a real phys ical device. so that the device can then be
placed In some electronic product to
carry out the desi red functi on. owadays,
such a device is usuall y some form of
integrated ci rcuit , Or IC. also known as a
computer chip, or just chip. In ot her
words, looking at Figure 7. 1, how do we
get from (a), the seat belt warning li ght
ci rcuit we designed in Chapter 2. to (b). a
physical impl ementat ion using an IC?
In this chapter, we will describe
several popul ar physical implementati on
technologies for digi tal circuits.
7.2 MANUFACTURED IC TECHNOLOGIES
BeltWarn
Digital circuit
design
(al
Physical
implementation
(b)
Figure 7.1 How do we get from (aJ to (hJ?
If we are willing to wait weeks or months for a physical implementati on of our digital circuit
design, and iF we are willi ng to spend tens of thou ands of dollars to milli on of doll ars for
that physical impl ementation, lhen we might consider implementing our circuit using one of
several technologies that involve the manufacture of a custom or semicustom Ie.
Full-Custom Integrated Circuits
One physical implementation technology is known as a custom Ie. A!ull-CIIstOIll Ie is a
chip created specificall y to implement the gates (actually. the transistors) of the desired
digi tal circui t design (Fi gure 7.2). We digital designers wouldn't usually build full-custom
ICs ourselves, but rather we would send our desired di gi tal circuit design out to a group
or company that specializes in transforming digital de igns int o custom IC . Engineers.
assisted by computer-aided de ign (CAD) tool s. conven our desired digital circuit de ign
379
380 Physical Impl ementation
Accordmg (oot/e
sun'(')', on!., about
/00/.- 0/2002
digital circuits
were Implemellted
aJ CUSlOm tCf.
into a circuit of transistors. and then decide
where to place each transistor on the surface
of the chip. how to ori ent each transistor
(e.g .. left to right. ri ght to left. top to bottom,
ClC.). how big to make each transi stor, etc.
All that infomuHion about how the transi s-
tors should be layed out on a chip's surface
is known as a layol/t . Then. the fu ll -custom
IC engineers send that layout information to
a special factory lhat speciali zes in fabri -
cating ICs. known as a fabri cati on plant. or
Jab for short. Fabri cating an IC is often
referred to as a sili con spill .
BeltWarn
Ie
_ Custom
layout
------ Fab
months
Fabricating an IC is an extremely
costl y. delicate. error-prone process, uti- Figure 7.2 Full-cuslom Ie design.
li zi ng state-of-Ihe-art photographic, laser. and chemi cal equipment that costs hundreds of
milli ons of doll ars. The fabrication process may take many weeks or even months,
because transistors and wires are formed as layers on the surface of a chip, and each layer
may take hours or even days to form through chemi cal processes.
Implementing a digital ci rcuit on a full-custom IC is a compl ex and expensive task.
Costs for setting up the fabri cation of an IC, known as 1I0llreclIrrillg ellgilleerillg (NRE)
costs. can easi ly exceed many millions of dollars for a full-custom Ie. Furthermore, that
setup takes time. perhaps months, and that time may be costl y to us too-the product for
whi ch we are fabri cating the chip may be losi ng market share to a competing product
already compl eted and being sold while we wai t for our chip to be fabricated. Once we've
set up the detail s needed for fabri cati on, the fabrication process itself is less expensive.
But because we custom designed everything, the probability is hi gh that we made a
mi stake somewhere in the transi tors or wiring. Therefore. after fabri cating a full-custom
Ie. we may find errors that necessitate refabri cating the Ie. known as a respill . Respin-
ning may happen two or three times. each time requiring weeks or months, thus costing
us even more. We ought to ei ther be making milli ons of chips, or charging large amounts
of money per chip. to earn back the large NRE costs.
Needless to say. full-custom IC fabrication is not extremely common.
choose to implement a digital circuit on a full-custom IC when they know they will
produce the chip in extremely hi gh volumes , such as a mass- produced chip found inside
calcul ators or wri stwatches, or a mass-produced microprocessor chip like a Pentium.
Hi gh volumes in the tens of milli ons or more are needed to offset the cost and time
needed to produce a custom Ie. Alternatively, designers may choose to implement a
digital circuit on a custom IC if cost is not ti ghtl y constrained but maximum perfonnance
is a must. as mi ght be the case in military or space applications.
Semicustom (Application-Specific) Integrated Circuits-ASICs
Because physical implementation on full-custom ICs is so costl y and time-consuming,
semi custom technologies evolved during the 1980s and 1990s that reduce the costs and
the time of fabricat ing a chip, known as Applicatioll-Specific Illtegrated Circuits, or
ASICs. Two popular ASIC technologies are gate array and standard cell.
7.2 Manufactured Ie Technologies 381
Gate Arrays
The pan of custom IC design is designing and fabri cating Ihe transistors that will
go onto t e surface of the chip. Designing and fabri cating the wires that connect those
transIstors IS somewhat simpler. Gate array ASIC technology utili zes a chip who e tran-
sis tors are predesigned to form rows (arrays) of logic gates on thc chip. as shown in
=I gure 7.3. Gate arrays are sometimes referred to as sea-oj-gates. To implement a desired
Igltal Circuit on a gate array chip, we merely need to create the I"ires that conneci those
gates. Creatlllg the wires represent just the last steps of fabricati on. and thus gate array
technology eliminates much of the time and cost of fabricating a cllip for a particul ar
deSign. A gate array company predesi gns and mass-produces the gate array chi p, and then
customizes some of those chips for each cli ent' s circuit- the chip i somewhat custom-
the term sellliCI/SIOIII . and the customizati on is for a parti cular circuit
appitcatlon. hence the ternl afJfJlicatioll-specific. Figure 7.3 illustrates how we might
Implement our seat belt warning li ght circuit (Fi gure 7. 3(a using a gate array chip
(FI gure 7.3(b. Figure 7.3(c) shows how we might map the desired 3-input AND gate to
two 2-lIlput gate array AND gates. and the inverter to one of the gate array inverter . The
figure also shows how we mi ght implement the desired wi ring among the gate array's
pillS, the gate array AND gate, and the gate array invener. The remaining gate and pins
on the gate array chip would be unuti li zed. Fabri cating these wires would re ult in the IC
being customi zed to our seat belt appli cat ion (Fi gure 7.3(d .
Figur.7.3 Gale array lechnology: (a)
desi red circuit. (b) gate array before
wires are added. (c) gale array after
wires are added. thu implementing
Ihe desired circui l, (d) fabri caling Ihe
wires compl etes the Ie. NOle: real
gate arrays hnve many thousands or
millions of gates. not just a fcw.
w
We point out that the actual mapping of our desired di gital cir uit to a gate array
would typicall y be carried out by an automated tool. Designers rarely. if ever. carry out
that mapping manuall y, and in fact usuall y don' t even see that mapping in any fonn-the
mapping is all done by tools, resulting in huge data files that can be processed by other
tools at a fab to control the fabrication process. We also point out that a typical gate array
chip may hold lIIallY thol/ sal/ds or milliolls oj gates: the gate array shown in Figure 7.3.
having less than ten gates, is trivi ally sma.! I and is for illustration purposes only-gate
arrays wilh ol//y 10 gales do 1101 exis/. Furthennore. we would typically not u e gate
arrays unless our design contained thousand of gates or more. For de' igns with only a
few gate. we would instead use logic ICs: see ceti on 7A.
384 Physical Impl ementation
NOI ice thm our standnrd cell impl ementa-
lion places the cell such thai wiring is
minimi zed. whereas the gate array impl ementa-
tion of Figure 7.4 requi red uS to run the wi res to a
the pre-exi sting gate result ing in b
longer wires. Thus. the tandard cell impl ementa-
tion may be faster than the gate array
implementalion. si nce shaner wires lypically
have shaner deb )'.
Implementing Circuits Using Only
NAND Gates
You may recall from Chapter 2 that CMOS
transistors lend themselves more readil y to
creating NA D and NOR gates rather than
AND and OR. The stated underl ying reason
co = ab
5 = a'b + ab'
cell row
Figure 7.6 Half-adder usi ng
standard cell s.
co
was that pMOS transistors conduct Is well but not as. whil e nMOS transistors conduct
as well but not Is. In any case, gate arrays typi call y cont ain pl enty of NAND aneVor
NOR gates. rather than AND and OR gates. And standard cell designs will also be more
if implement ed using NAND or NOR gates rather than AND and OR. Further-
more. creating a gate array is much easier using just one type of gate, like just NA Ds,
or j ust NORs. rather than having to decide how many AND gates, OR gates, and NOT
gates to pre-instantiate in the arrays. Gi ven the ready avail ability of NAND or NOR gates
in CMOS ASIC technol ogies, we therefore want a method for converting AND/OR Cir-
cuit s to NAND circuit s or to NOR circuit s.
Fortunately, converting any AND/OR circuit to a NAND-onl y circuit is possible
because NAND is a uni versal gate, as was menti oned in Secti on 2.8. A tllliversal gale is
a logic gate type that can implement any Boolean functi on using gates of that one type
onl y. One way to understand NAND's uni versali ty is to recogni ze that we can implement
a NOT gate, an AND gate, and an OR gate by substituting each by an equival ent circuit
of AND gates. Therefore any circuit of NOT, A D, and OR gates can be implemented
using NAND gates onl y.
To implement a NOT gate using AND
gates, we can sub titule the NOT gate by a
two-input NAND gate with its twO inputs
ti ed together, as shown in Figure 7.7. The
truth tabl e in the fi gure shows that the
NAND gate with its inputs tied together acts
the same as an inverter. When the input X is
0, both inputs of the NAND gate are 0,
causi ng the NAND gate to output 1. When
the input X is I, both inputs of the NAND
gate are 1, causi ng the NAND gate to
output O.
Inputs Oulput
x a b F
0 0 0 1
1 t t 0
Figure 7.7 Impl emenling a NOT gale
using a NAND gate
Alternatively, we could simpl y connect X to one NAND input. and a 1 to the other
NAND input. Then if x is 0, the NAND outputS 1. and if x is I, the NAND output 0,
achieving the desired OT gate behavior.
EXAMPLE 7.3
7.2 Manufa ctured Ie Technologies
385
N implement an AND gate using NAND gates, we can subslilute the AND gate by a
Ah . gat e fOll owed by a NOT gale (which we know 10 be a two-i npul NAND gate
wit Its IIlputs tied together), as
shown in Fi gure 7.8. Thi s works
because given in puts a. b, Ihe first
NAND compules (a b ) , , and Ihen
the NOT gate computes (a b) " _
Figure 7.8 Impl emenl ing an AND gale usi ng
a b, which is AND. NAND gales.
To implement an OR gat e using
NAND gates, we can substiwle the
OR gate by a NAND gate wi th each
input invened, as shown in Fi gure
7.9. This works because given
tnputs a, b, the circuit of NAND
K}-
F=(a' b')'=a"+b"
=a+b
gates in Fi gure 7.9 computes Fi gure 7.9 Implemenling an OR gate using
( a ' b ' ) ' . which by DeMorgan' s AND gates.
Law is a " + b" , which simpli -
fi es to a + b - whi ch is OR.
When we repl ace a circuit originall y consisting of A D/OR/NOT gates by a ci rcuit
with NAND gates only using the above substitutions. we may fi nd that cert ain Signals get
doubl e-tnverted- the signal feeds into an inverter and then immedialely feed into
another invener. Double-inverti ng a
signal yields the ori ginal signal, so
double inversions can be replaced by
just a wire, as shown in Fi gure 7.10.
..
Such eliminali on reduces the transis-
tors needed without changing the
Figure 7.10 Double inversions can be eliminated.
circui t's funct ion.
Implementing a half-adde(s sum circuit using NAND gates
Figure 7. 11 (a) shows the sum circuit for a half-adder (sec Seclion 4.3). usi ng AND. OR. and Nor
gates. We can impl ement that circuit using AND gales onl y by substituti ng each gale with an
equivalent NAND ci rcuit. as shown in Figure 7. II{b). Afl cr the substitutions. we note that there are
two signals that are doublc invcncd. Eliminating the double inversions results in the circuit shown
in Fi gure 7.II (c).
double inversion
a
a
double inversion
(a) (b) (e)
Figure 7.11 Implemenling a half-adder's sum circuil usi ng NA D gales only: (a) original ANDIOR!
NOT circuit. (b) circuit oblaincd aft er SUbSlilUling equivalent A D for e3ch gate.
(c) ci rcuit aft er eliminat ing double inversions.
386 Physical Implementation
EXAMPLE 7.4
When convening A D/OR/NOT circuits by
hand 10 NAND ci rcuits. some people find it easier
10 simply draw inversion bubbles rather than the
NAND-based inveners. as shown in Fi gure 7.12.
Then. double inversion bubbles on a signal cancel.
Any remaining isolated inversion bubbles become
a NA D-based NOT gate. Thus, lhe ci rcuit in
Figure 7. 12 would end up identical 10 the ci rcuit in
Figure 7. 1 I (c),
If NAND gates with a fi xed number of inputs
are available. such as 2- input NAND gates onl y,
we can first modify the AND/OR circuit 10 use
only 2-input AND/OR gates (by composi ng larger
gates from smaller ones-see Seclion 5.8), before
convening 10 NAND gates.
Implementing Circuits Using NOR Gates
a-{)x>-a.
b
b
double inversion
double inversion
Figure 7.12 Drawing inverters as
inversion bubbles during
conversion to NAND.
---
a-c[>-a.
Converting AND/OR/NOT cir-
cuits 10 NOR gate circuits is
similar to convening to NA D
circuits, as a NOR gate is also
a universal gate. The process of
lransforming circuit into
NOR gates replaced each
AND, OR. and NOT gate wilh
equivalent NOR-based circuits,
as shown in Figure 7. 13. We
can replace a NOT gate Wilh a
two-input OR gate with the
inputs tied IOgether (or alterna-
Figure 7.13 NOR gate equivalencies.
ti vely, by a two-input NOR
gate Wilh one input tied 10 0). We can replace an OR gale wilh a NOR gate followed by
an inverter. yieldi ng (a+b) " = a+b. We can substitute an AND gate with a NOR gate
having inverted inputs, yielding ( a' +b' ) , a ' '*b' , a b (notice the use of
DeMorgan's Law).
Implementing a half -adder's sum circuit using NOR gates
Earli er. we demonslrated how to represent the half-adder's sum output with NAND gates; we can
just as easil y implement the sum output using NOR gates. The half-adder' s sum circuit is shown
agai n in Fi gure 7. 14(a). We replace each NOT. AND, and OR gate by its equivalent NOR circuit in
Figure 7.14(b), using inversion bubbles instead of NOR-based NOT gates for convenience. We
eliminate double inversions. and replace stand-alone inversion bubbl es by OR-based NOT gates,
as shown in Fi gure 7. 14(c).
EXAMPLE 7.5
(a)
double inversion
a
b
double inversion
(b)
7.2 Manufactured Ie Technologies 387
(e)
. Figure 7.14 Implementing an A D/OR/NOT circuit using NOR, onl y: (a) original ci rcuit , (b) circui t
obtained by substituting AND/ORINOT gates by equivalent NOR circuits. using inversion bubbles
:or of drawing. (c) final circuit after elimi nating double inversions and replacing standalone
inverSion bubbles by NOR-based NOT gates.
The half-adder' s sum circuit was implemented with fewer NA D gates than NOR
gates. Depending on the ori ginal circuit, the reverse cou ld be true. We saw that NAND
gates were well-suited for circuits in the sum-of-products form. NOR gates are best
used when a circuit is in product-of-sums form (a level of OR gates feeding into a
single AND gate).
Gate array and standard cell librari es typically include additional components.
beyond just NAND or NOR gates, that have efficient CMOS implementations. For
example. a popular such component is known as AND-OR-INVERT. or AOl for shon.
Such a component has two 2-i nput AND gates (thus four inputs total). feeding into a
2-input NOR gate. That circuit can be efficiently designed using CMOS transistors. Thus,
we would want to utili ze AOI components, and other si milarl y compact available compo-
nents in a library, as much as possible.
The task of convening a general logic circuit to a circuit using onl y components from
a panicular technology' S library (e.g., a particular gat e array library or standard cell
library) is known as tecllllology mapping. The task of determining where to place tho e
components on a chip is known as placement. and the task of connecting tho e compo-
nents by wires is known as routing. All three tasks, collecti vely known as physical
design , are typically done by automated tool s today.
Implementi ng the seat belt warning light on a NOR-based gate array
Implement the Bel/Warn circuit of Fi gure 7.15(u) using the NOR-based gate array of Figure
7.15(a). Noticing that the gate array has only 2-Input NOR gates. we first conVert the Bel/Warn
circuit to usc AND/OR gates wi th 2 inputs only. as shown in Figure 7. 15(b). We then convert the
ANDI OR circuit to the NOR-only circuit in Figure 7.15(c). using the equivalencies in Figure
7.13, and using inversion bubbl es rather than NOR-based inverters. We then see a double inver-
sion on the wire from input S. so we eli minate those two inversions. Note that we do not
eliminate the double inversion between points 3 and 4 in Figure 7. 15(c). be ause the first in"er-
sian is part of a NOR gate-eliminating that first inversion would convert the OR 2.3.te to an
OR. defeating our goal of havi ng NOR gates onl y. After converting remaining inver-
sions to OR-based inverters. we map the circuit to the gate array's _-input lOR 2ates as in
Figure 7.15(d)-we numbered the OR gates of Figure 7 .15(c) and (d) to show the pon-
dence between the two circuits.
388 1 Physical Impl ement ation
DD-D-
--- - - - -- -- - --- -- --------
DD-D-
(b) P
-- ------ -- - - --- -- - - -----
D-D-D-
(a)
(c)
(d)
Figure 1.15 Implementing the BelllVa,." circuit on a NORbased gat e array Ie: (a) ori ginal gate
array. (b) - (c) convening (he desired circuit LO two- input OR gal e!' onl y. (d) final gate array with
wires.
w
7.3 PROGRAMMABLE IC TECHNOLOGY- FPGA
ManufaClUred IC technologies require at least a few weeks. and usually more like several
months. to canven a desired di gital circui t design int o a physical Ie. What if we are
developing a circuit that we want to implement roda,,? In that case. we can utili ze one of
several programmable IC technologies. In a programmable Ie techllology. we tmpl ement
a desired circuit simply by writing a panicular sequence of btls tnto a memory (or
number of memori es) contained in the Ie. Using a programmable IC technology has the
drawback of worse performance. size. and power compared to custom or semi custom Ie
technologies. But we get our implementation today. and the benefits of that fact may out
wei gh the drawbacks.
-The most popular form of programmable IC
technology is known as a Field-Programmable
GoteArray. or FPGA . An FPGA company prefabri -
cates an FPGA chip, meaning that the chip contains
all the transistors and all wires that the chip will
ever have. We buy lhose FPGA chips. and then
program the chip to implement our desired ci rcuit.
To program in lhis context mean imply to down-
load a seri es of bits into lhe chip's memories-not
to be confused with writing hi gh-l evel oflware pro- Figure 1.16 FPGA chip>.
grams like C or C++ code. Such programming .. .
OCcurs in lhe field. meaning in our lab. or offi ce, or home. 3.', opposed to tn a fabn atton
plant. Hence the words "field-programmable" in Ihe FPGi\ nallle. Funhermore. program
ming typicall y takes onl y seconds. or perhaps minules at most. Fi gure 7. 16 show. SOllie
FPGA chips. The chip al Ihe top. wilh iL, front and back shown, mea., ures ahoul 3/4 tnch on
each side. The chip on the bott om measures just over I inch on each side.
Fil'ld
programmable
gale arrays
(FPGAs) hl1v(' 110
"gale arrays"
iI/side 'hem-
the I/(lme is there
due 10 historical
reasons,
Lookup Tables
Tlte key idea
underlying
FPGAs is '''m
tI memOf)! wilh
N'addre,'is lines
COfl implemenr
1Il1y combi,wliollal
!1If1Cliofi wilh
N i"pIIIS,
1.3 Programmable Ie Technology-FPGA 389
The words "gale array" are lhere in the name because, when FPGAs firsl became
popular in the mid- 1980s. they were marketed as an alternative to gate array technology,
which was very popular allhal lime. Thus, an FPGA was a semicustom IC (nearl y syno-
mous with gale arrayal lhal lime) thaI could be programmed in the field instead of at a
fabrication pl ant. However, be forewamed Ihat the inlernal design of an FPGA chip looks
nothing like a gale arraY-lhe naming is somewhal unfortunate.
The two basic Iypes of components inside an FPGA are lookup lables and switch
matrices. Those components are repli caled hundreds of limes in regular patterns inside an
FPGA. We now describe each type of component.
A basic idea underl ying FPGAs is lhal a memory can implement combinatioltal logic.
More specifically, a I-bit wide memory with N address lines. and hence 2N words, config-
ured 10 read the word corresponding to the present address. can implement any Boolean
combinational functi on of N variables.
Recall that a memory confi gured 10 be read will out pul the contents of the word cor-
responding to Ihe present address al the memory's address lines. So if a 4x I memory's
address lines a 1 a 0 are 00, the memory wi ll outpul the contents of word O. If the address
lines are Ol. lhe memory outpulS the contents of word I. Likewise, 10 reads word 2. and
11 reads word 3.
Implementing a Boolean function wilh a memory can therefore be done simply by con-
necting the funclion 's inputs to lhe memory address lines, and storing a 0 or 1 in each
memory word to match the desired funclion OUlput for each combination of inpul values.
For example, consider lhe function F ( x . y ) = x ' y ' + xy . The truth table for the func-
lion is shown in Figure 7. 17(a). To impl ement the example function, we can connect x and
y to a 4x I memory's address lines a 1 and a 0, respectively. and based on the truth table. we
store a 1 in word 0, a 0 in word I. a 0 in word 2. and a 1 in word 3-in other words. we
slore lhe trulh lable OUIPUIS in the memory. The memory then implements the d ired func-
ti on, as shown in Figure 7. I 7(b). For example. when xy=OO, we wanl the output to be 1.
Figure 7.1 7(c) shows thai when xy-OO, the memory' s address lines will be 00. and thus the
memory will outpul lhe contents of word O. which i the value 1 , as desired.
F =x'y' +xy F =x'y'+xy
4x1 Mem. 4x 1 Mem.
G = xv'
,
x Y
F
0 0 1
';
x
Y F G
/1 ------
--I
0 0 0 1 0 \
0 0
(1 O'{
0 0 - - 0 2 0 , 0
1
0
i
---J 1 x=o 3 1, 0
: 0 1 :
x- a1 ,
\ 1 0 :
y- aO 0 aO 0 :
"---"
+F
y=O
F=1
(c)
F G
(8) (b)
(d)
(e)
Figure 1.11 Implementing logi c functi ons using a memory: (3) _-i nput fun 'li n truth table. to)
corresponding memory contc,llls and 'onnectlo,ns. (c ) the propt!r outpUt appe3f'S for the gi\cn input
values. (d) two functi ons the same two mputs. (e) mcm I) l.. "Ontents for the ' \\ Q functions.
390 Physical Implementation
A with JII bits per word. rather than just I bi t per word. can implement M
runctions. as long as all those M functi ons have the same inputs. For example. consider
thetworunctions F(x . y) = x ' y ' + xy and G(x , Y) - xy '.The tmthtableror
the e two functions is shown in Figure 7. 17(d). A 4x2 memory. which has 2 bits per
word. can implcmcm those two functions. as shown in Figure 7. 17(e).
A memory used to implement a combinational circuit is known (in FPGA termi-
nol ogy) as a lookup table. When used as a lookup table. we typicall y rerer to the memory
by the numbcr of iI/pillS (address li nes) and the number or out puts (bi ts per word), rather
than by the number or \I'onls and the number or out put s. For exampl e, we would refer to
an 8x2 mcmory being used as a lookup table as a "3- input 2-output lookup table," rather
than as an 8x2 lookup table
From this point forward. we' ll assume Ihe memory is configu red for read, and thus
we \\,on't show Ihe read line sellO 1.
EXAMPLE 7.6 Implementtng the seat belt warning light with a lookup table
Use a lookup lable 10 implemenl Ihe seal belt
warning li ght circuit from Figure 7.1. whose
circuit appears in Figure 7. 1 8(a) and whose
equation is:
,./ = kps '
\Ve generate the truth table for the fune
tion. as shown in Fi gure 7.18(b). Because the
circuit has three inputs. we know we' ll need
a 3- inpul I-oulpul lookup lable (memory).
\Ve connect the inputs 10 the memory's
address lines. and store the truth table in the
memory. as shown in Figure 7. 18(c). Ihus
implementing the desired runction. Ir the 3-
input I-output memory is an Ie. then we are
done implementing our design. and can
insen the Ie into the electronic system Wilh
"hich Ihe Ie should inleracl.
o 0
, 0
2 0
o
o
o
,
7 0
(c) Ie 0
w
p s w
0 0 0 ! 0 '
0 0
,
i
O
0
,
0
:0
0
,
:0
0
:0
0
,
:0
0 : ,
I
: 0 I
(b) ' ..
----- .
Programming
(seconds)
X
You've ju t seen an example of a very
imple programmable IC technology-a
memory. We can use a memory chip
with N address lines and hence 2N
Figure 7.18 Lookup lable implemenlation.
word,. and with M per word. to
implement M dirrerent Boolean functions of the sallle N inputs. We can purchase a
memory chip before we need it for our design. and then we can "program" the memory
chip in our lab to implement 3 desired Boolean function.
Partitioning a Circuit among Lookup Tables
Unr rtunatcly. u,i ng a memory to implement a Boolean function doc not work well for
functi on, with numerou, input,. For example, while a <I -input function would need only a
7.3 Programmable Ie Technology-FPGA 391
16-word memory a 16 . .
functi o Id ' -tnput functIon would require a 64 K word memory; a 32-input
same a
nthwou. reqUIre a 4-billion-word memory. The needed memory size grows the
s e size or the f ' .. v
numb f f . . unction s truth table, whIch we know grows as 2' . where N IS the
er 0 unclton tnputs I h . . .
resenl
' t' < f . . n SOrt, a tmth table IS 1101 an effiCIent Boolean function rep-
a Ion .or uncti ons . h ' .
imple' WI t numerou tnputs, and thus a lookup table IS not an efficient
mentatIOn ror runctions wi th numerous inputs.
Partltl omng a funct' ' . .
. . Ion s CirCUit among multiple lookup table can yield more effi-
cIent Impl ementations I .
. . f or arger functIOns. Consider the extended eat belt warning
CirCUI t rom Example 28 L '
. " . " et s eXlend the ci rcui l even more by addina a third "diag-
nostlc IIlput called d th r . 0>
. . al orces the warnlll g li ght 10 tum on when d=l-perhaps a
mechamc IIlvestigating a raulty warning li ght mighl want to force the warnina liaht on to
Isolate whether the li ghl has blown OUI or to help determine ir a seat has
fatl ed. The extended circul' t 's h . F' . . .
. . I S Own III Igure 7. 19(3). That CirCUit can t be mapped to a
3-lIlput I-output lookup table because the circui t has 5 inputs, bUI the circuit could be
mapped onto a 5-lIlput I-output lookup table. Alternatively, we could implement the
CIrCUIt by UStng a 3-input I-oulput lookup table connected to another 3-input l-output
lookup table, as shown in Figure 7. 19(c). We do so by partitionina the oriainal circuit iOlo
two groups. such thaI the fi rst group has 3 inputs and I output. the group has
3 IIlputs and I output. as ci rcl ed in Fi gure 7.19(b). The fir t group' s output. whicb we've
labeled as x, has the equation x = kps '. The second group' S output has the equation
vi = x + t + d. We would program the lookup tables to implement these functions_
as shown III FI gure 7.19(c), thus implementing the desired circuit using two lookup tables.
BeltWarn
(a)
BellWarn
3 inputs
, oulpUI
x=kps'
(b)
k
P
3 inputs
, OUIPUI
w=x+t+d
t
d
8x' Mem.
- .... 0
0
,
0
0
0
,
0
0
(c)
ax, Mem.
o 0
,
2
__ ..... 7
o
w
Figure 7.19 Partilioning a circuil onlO IWO lookup lables: (a) desired circuit. (b) circuil partitioned inlo !!fOUpS with 31
l11os13 inpuls and I OUlpUl, (c) groups mapped 10 IwO 3-inpul I-oulpul lookup lables. -
Notice that the implement ation with two lookup tables has a total of + = 16
1V0rds, compared to 32 words that 1V0uid have been present with a 5-input lookup table.
Thus. partlllol1lng a CIrcUIt among small lookup tables an re-ult in better effi ienc\ than
using one larger lookup table. -
This efficiency can be seen even more dramatically f r e,amples \\ ith rn re
inputs. For example. the runctl on F - abc + de + ghi . , h \\n in Figure
7.20(a). has 9 inputs. Implementing the funcli n on a single lool..up tabk
392 Physical Implementation
require a table wi th 2
9
= 5 12 words . However. we can partition the circuit into groups
such that each group has 3 input s and I output- the first group wou ld compute abc,
the second def. the third ghi, and the fourth would OR the output s of the first three
groups to ge nerate the out put F. Each group could be implemented using a 3-input
I -output lookup table. meaning 8x I memories. The resuiting implementation would
have four such lookup tables. as shown in Figure 7.20(b). The total words for that
four-table implementati on would be a mere 8 + 8 + 8 + 8 = 32 words-far less than
the 512 words required for a si ngle 9-i nput lookup tabl e. Figure 7.20(c) compares the
relative si zes of a 5 12-word and four 8-word memories. Notice the tremendous reduc-
tion in size.
a---.r::::;::::::::---,
b
C-r-'-.-/
d-'-r---....
e
h
(a)
F
afu
512x l Mem.
3-1
F
8xl Mem.
(b) (e)
Figure 7.20 Dividing a many- input circuit among smaller lookup tabl es reduces totallooirup table
size: (a) 9-input ci rcuit . (b) ci rcui t mapped to 3-input I-out put lookup tables. (c) size savings
compared to 9-input I-output lookup table.
Parti tioning a function among small lookup tables is more efficient than imple-
menting a function on one large lookup table. But what is a "small " lookup table-a
table with 2 inputs. 3 input s, 4 inputs. 7 inputs. or maybe even 10 inputs?
Researchers have conducted numerous studi es on large numbe rs of typical circuits,
and found that 3- input or 4-input lookup tables seem to work best for most circuits.
Furthermore. researchers fou nd that 2-output lookup tables also seem to work weli
for mo t examples. Thus. we' ll use 3- input 2-output lookup tables from this point
forward.
EXAMPLE 7.7 Partitioning a circuit among 3-input 2-output lookup tables
Impl ement the circuit shown in Figure 7.21 (aJ u ing 3- input 2-output lookup tables. We begin by
trying to partiti on Ihe ci rcuit into groups such that each group has at most 3 inputs and 2 outputs.
However. the 4-input AND gale prevents us from successfully perfonning such panitioning,
because whatever gate Ihat group is in will have at least four inputs. To remedy this problem, we
decompo,e Ihat gate into two smaller gates. while maintaining the same functionality. as shown in
Fi gure 7.21(b). We can then partition the circui t into two groups. each wi th 3 inputs and l output,
a, <hown in the figure-We've numbered the inputs to each group to make clear that each group hns
three tnputs. We then map those groups onto two 3-inpul 2-output lookup tables as shown in Figurt
7.2 ](C). ot lce that the lookup table's 0 I output is unu<cd. and the second table's DO output is
unu\Cd The fir<ttahle\ DO column implements t-abc. The .,.,cond wblc's 01 column implements
r td + e.
EXAMPLE 7.8
7.3 Programmable Ie Technology-FPGA
393
(a) __- ------------
8x2Mem.
8x2 Mem.
0 00 0 00
00 1 10
00 2 00
10
e--__
(b)
(e)
Figure 7.21 Partitioning a circuit onto two lookup tables: (a) original cireuit. (b) transfonned circuit
that breaks the 4-IOput AND gate IOto two smaller gates and then that haws the 3 . I
. ) . ' -Input -output
groupIOgs. (c mappIOg of each group to a lookup table, with the group's function converted to
programmed btls 10 the lookup table. Italicized bits are unused.
In the previous example, notice that we did not use one of the columns in the first
lookup tabl e, and dtd not use one of the columns in the second lookup tabl ' th U
I . e el er. smg
ookup tables sometttnes results in unused memory cells. Using lookup tables also some-
ttmes results In unused lookup table words, as illustrated in the foll Owing example.
Mapping a 2x4 decoder to 3-input 2-output lookup tables
Let's implement a 2x4 decoder. without enable, using 3-input 2-ourput look'llP tables. A 2x4
decoder has two inputs .. 11 and 10. and four outputs, dO. d 1. d2. and d3. A mapping i shown in
Ftgure 7.22. The equauons for each output are dO = i 1 . i 0 '. d 1- i 1 . i 0, d2- iIi 0'. and
d3=111 O. The lookup tables tmpl ement those equations usi ng the top halve of the tables' words'
the bouom halves are unused. .
8x2 Mem.
8x2Mem.
0 10
0 00
1 01
1 00
2 00
2 10
0 a2
3 00
a2
3 01
0
il
al 4 00
al 4 00
iO aO 5 00
aO 5 00
6 00
6 00
00
tl iO
dO dl
d2d3
(b) (a)
Figure 7.22 Mapping a 2x4 decoderto 3-input _-output lookup table : la) desi!\'d ireuit.lb)
mapping to two lookup tables. ltaltCi zed btl are unused.
Physical Implementation
An FPGA may come wit h tens. hundreds. or even thousands of lookup tables. and thus
can implement large amoullls of combinati onal logic.
Programmable Interconnects (Switch Matrices)
In the previous examples. we have been creating customi zed connections between lookup
tables. However. the point of FPGAs is that the ent ire chip is prefabricated-includi ng
the wires. FPGAs therefore come with programmable illlereollll eels. sometimes call ed
swilch molriees. which all ow us to program the conneclions among lookup tables. Fi gure
7.23 shows a simple FPGA chip having six inputs (PO-P5), IWO 3-i nput 2-output lookup
lables. one -l-input 2-output swilch mat rix. and four OUlPUIS (P6-P9). All three of the left
lookup table's inpuls come from the eXlernal inputs Pl . P2. and P3-that lOOkup table' s
inputs can ' t be changed. However. two of the right lookup wble's inputs may come from
eilher the left lookup wble's outputs. or from Ihe external input P4 and P5. The switch
matrix determines whi ch of those connecti ons will be made.
FPGA (partial) Switch matrix
8x2 Mem. 8x2 Mem.
2-bit
memory
t t
mO
iOS
1 sO
m1
0 00 0 00
1 00 00
2 00
m2
i1 4x1
I-
m3
i2 mux
i3
PO
P1-t-
3 00 P6
4 00
P7
aO 5 00
I-
2-bit
-""
memory
t t
6 00
00
1051 sO
d
f-2.l I-
i3
pg
P4 . ____ _ P5l:
(a) (b)
Figure 7.23 A imple FPGA architecture: (a) an FPGA Ihal includes a swi lch matrix, and (b) the
Ii " itch matri x's internal s , howing two 4x I muxes controlled by twO 2-bit registers. Note: real
FPGA, have hundreds of lookup tables and swilch m3lrices, nOI jusl " felV.
The swilch matrix's internal design appears on the ri ght of Figure 7.23. It consi ts of
two 4x I multiplexers. The lOp mux connect Ihe swit ch malri x output 00 to one of Ihe
matri x' s four input, . The bottom mux connects thc output 0 / to onc of the matrix's four
inpui s. A two- bit memory (whi ch is actuall y a 2-bit register. but call ed a memory for con
sistency with the memory in\ ide a lookup lable) holds Ihe two bits that set each mux'
two select line . Thu,. we can program the de;ired connecti on; simpl y by wri ting the
appropriate into Iho\O two 2-bil memorie;, . otice that each ,witch matrix outpul can
be confi gured independently of the other. In facl. we could even make Ihe same inpul
appear at both output , . though that' s probabl y not in tim FPGA design.
We' lI illLl , trale the use of the switch matnx With an CX;1I11plc.
7.3 Programmable Ie Technology-FPGA 395
EXAMPLE 7.9 A 2x4 decoder on an FPGA with a switch matrix
0
0
i1
iO
We repeal Example 7 8 here us h FPGA h .
inpUlS 10 Ihe fi rs t lookup lable as in E 'In 7.23(a). We Co an easily gel the proper
external in ut ;0 . . x mp e .. y connectmg . external input iI . and
inputs 10 FPGA InpUIS. as shown in Figure 7.24(a). To gel the proper
FPGA . h up tabl e, we first connect external input if and external input ;0 to the
. 1,"Puts t at reed 1010 the switch mmri x. We then configure the swi tch matrix such thaI swi
10 swilch malri x .output 00. which means Ihat eXlemal
. . WI, C Outpul 00. We achieve thaI configuration by programmin 10 into
to: register In switch as shown in Figure 7.24(b). Likewise, we me
:xt
li C
that switch matnx 1113 passes through to swi tch matrix output 01. meaning
,ema. Input passes through to switch matrix output 01. \ Ve achieve that confi ouraLion b ro-
gramlllJllg 11 the boltom 2bit register in the switch matrix. Because the switch matrix
connect 10 the nghl lookup tables ' I '
h
InpU s. we ve successfull y connected external inpuLS if and fO to
I e second lookup lab Ie's in I' d' d
E I . pu s. as e Ire . We program the Iwo lookup tables as we did in
F
xamp
e 7.8. Thus. external OUlpUIS dO-dJ can be found at the FPGA eXlemal pins as shown in
Igure 7.24(a). .
FPGA (partial) Switch matrix
c;,J
8x2Mem.
8x2 Mem.
0 10
0 00
01
00
mO
iOS
1 sO
m1
i1 4x1
1m3
mux
i3
2 00
10
3 00
a2 d3
a1 4 00
aO 5 00
d2
6 00
00 --I- 11
"
iOS1 sO
:E-! il 4Xl
L-.
I:;rtx
'-------'-
n
iO
(a)
(b)
Figure H4 Implementing a 2,4 decoder on the FPGA fabric having a witch malrL': (a1 e.'temal
and bits In the lookup table- and switch matrix. and (b) a look inside the
matnx. showmg the programmed connections between the Outputs and input . Italicized bilS
In the lookup tables are unused.
EXAMPLE 7.10 Extended seat belt warnmg light on an FPGA
We arc, to i.mplcment the, extended seat belt warning light system f Example t't on the FPGA
In Figure (Figure 7. 19 showed how to partition a similar circuit in 1\\0 groups. \\; th
equations X c kps. and w - + t + d. For this example. W - + .) \\'e connect . p.
and 5 to the FPG gOll1g to len lookup lable. and \\ e progrnm thut lookup I,ble to imple-
ment the .fullcllon kps : tl!' shown In FIgure 7.2-. \ Ve connect an utpul of the left lookup mble.
rcprcsentlllg x . 10 the nght lookup table!. b) programming the Itch man;" to ("(lnnect 0 to 0 .
\Ve connect t to the right lookup tnble also. b) connecting to an pin lnn (tXt h .. '
Illtllrix input m2. and then b) onfiguring the itch Il1nIrh to ,,"onnl.: "' \ m2 t 0 1. \\"e then pn gram
the right lookup tnblc to il11plcl1lcl1Ilhc function \ + t . as S.ho\\l1 in Figure -
396 7 Physical Implementation
FPGA (partial )
Bx2 Mem. Bx2 Mem.
0 00 0 00
t 00 01
0
2 00 01
3 00 3 01
w
:r'
00 4
00
s aD 5 00 aD 5 00
6 01 6 00
7 00
---7 --Bfr-
01 DO 01 DO
matrix
6
'
(a) (b)
Figure 7.25 Implementing the extended seat belt warning light circuit on the FPGA fabri c having a
\\ itch matrix: (a) external connections and programmed bils. (b) n look inside the switch matrix,
showing the programmed connecti ons. Italicized bils in the lookup tables are unused.
Notice that. in the previous two exampl es, we implemented two differelll circuits using
the salli e FPGA chip. To implement the two different ci rcuits, we merel y had to program
different bi ts inlO the lookup tables and swi tch matrices. That' s the appeal of FPGAs-
they implement our circuit just by programming.
Configurable Logic Block
In the previous section. the illustrated FPGAs were mi ssing a critical element needed to
implement general circuits. namely.jlip-jlops. Without flip-flops, FPGAs could not imple-
ment sequenti al ci rcuits.
FPGAs may include a flip-flop with each output of a lookup table-two flip-flops in
the case of a 2-output lookup table. The lookup tabl e and its flip-Oops IOgether are known
as a configurable logic bLock. or CLB. A simpl e CLB i shown in Figure 7.26. Each con-
fi gurable logic block has a 3-input 2-output lookup table, and has two outputs and two
flip-flops. Each flip-Oop is loaded every clock cycle with the corresponding lookup table
output. Each output of the CLB can be confi gured to either come from the output's flip-
fl op. or direct ly from the carre ponding l ookup table output. That configuration i s done
by programming a I -bit memory (which itself is a flip- Oop, but we' ll call it a memory to
avoid confusion), shown in Figure 7.26, thai controls a 2x I mux for each CLB output.
The output flip-flops enable us 10 implement cquential ci rcuits, that i s, circuits
having registers, on Ihe FPGA .
PO
CLB oulput
flipflop '
lbit
CLB
outpul-
configuration
7.3
CLB
Bx2 Mem.
a 00
00
00
00
00
00
00
Programmable Ie Technology- FPGA
FPGA
CLB
Bx2 Mem.
0 00
1 00
2 00
00
@laO
00
00
m2
00
m3
Switch
matrix
N
N
N
397
Figure 7.26 An FPGA ' th fi .
table W ' o WI can gurable logiC blocks. which contain flip-flops along with a looku
. e ve put S In all the configuration memory bi t cells in the figure. p
EXAMPLE 7.11 Implementing a sequential circuit on an FPGA
W . h .
e WIS 10 Implemenl Ihe circuit shown in Figure 7.27(a) on the FPGA of F 7'6 W firs
connect a and b to the left lookup tabl e. and C and d to the right I k bl Igure .-. e . t
matrix as shown' F 7 ?7( 00 up ta e throueh the SWlt h
b ' . h . In Igure .- c). We program the left lookup table to output the fun;oons a ' and
. as s own In Figure 7.27(b). Likewi se, we program the right lookup table to output C and d We
program all the confi gurable logic block outpulS to connect to their flip-Hops by pro . . 1
mto the CLB Output configur::uion memories. as shown in Fi gure 7.27(c). grnmnung s
398 7 Physical Implementation
a2
0
0
0
0
0
w y
(a)
Leh lookup lable
at aO Dl DO
b w=a' x=b'
0 0 l ',
o
o
a
b
"
CLB
,
FPGA
CLB
8x2 Mem.
o 00
Ot
to
,/f , ,
1
i 1
o "
" r
0 \ 0
1 ,
\ Q----
9-/'
below unused
(b)
(c)
. . I ., ,n FPGA (0) desired sequenli al circuit.
Figure 721 ImplcmCnlllH! a sequenlW. Cifelli on, . .. ..
Ibl left CLB"S lookup table program bi ls. (e) programmed FPGA. Unused b' IS arc nalt clzed
w
Care should be take n 10 avoid confus ing the outPUI. nip-flops themselves the
CLB OUIPUI configurati on "memori es-the configurall on memones that
program the FPGA to implement the desi red ci rcUit. CirCUit while the
OUlput flip-flops store the bits that the circui t loads dunng CirCUli operat ion ..
The storaoe clements for the lookup table. the CLB OUIPUI configuration, and the
'" itch arc coll ectively known a an FPGA's cOllfigllrotiollmemory, although that
"memory" is comprised of numerous smaller memories and even registers or flip-flops.
Overall FPGA Architecture
Grid of CLBs and switch matrices
A commercial FPGA contains hundreds or even
thou'and, of CLB, and switch malri ces, arranged
in a regular pallem on the chip. A smllple
arra ngement i, shown in Fi gure 7.28. CLBs
connect with horitonWI and vert ical rouling
channel,. whi ch connect to ,wit ch matrices. A
,ample connecti on of a CLB to the routing chan-
neh i, , hown for the top cenl cr LB. The rouling
channel, con,i" of ten, of wires. represented in
the Ilgure ju" as \Ingle bolded wire,. Figure 7 28 FPG archilecture.
7.3 Programmable Ie Technology-FPGA
399
. CLBs and switch matri ces in commercial FPGAs are more complex than described
tn thiS Chapter. For example. CLBs may contain two lookup tables. or direct connections
to adJacenl CLBs to support carry chain . Switch matrices may contain more inputs and
output and more flexible swi tching Options. Furthefl11ore. commercial FPGAs may also
tnclude large embedded RAM memori es for data storage. and embedded mUltipliers or
multipl y-accumulate units for fast multipli cations.
Programming an FPGA
We haven ' t said anything yet. about how we actually program the lookup table. witch
matnx configuration memones, and CLB-output configurati on memories; in panic-
ular, how do we get the program bits into the configurati on memories? The
configuration memories are all the lookup table memori es. the swi tch matrix memo-
ri es, and the CLB-output configurat ion memories. Conceptuall y. programming is
enabled by the FPGA having all the configuration memory bit storage cell s connected
asone big shift register. That shift regi ster' bit storage cell s are spread out acro s the
chip. so don' t represent a traditional regi ster whose bits are u uall y in one place. but
thtnktng of them as a shift regi ster helps understand thei r connectiviry. Actually.
storage cell s connected as a shi ft regi ster are typi call y referred to as a scan chain.
The FPGA will have an extra input pin for programming that erves as the hift input
for the shift regi ster. Another extra input pin indicates that programming is taking
place. During programming. we shift in the bits necessary to impl ement our de ired
ci rcuit. Remember that the configuration memory cell s onl y get wri tten during pro-
gramming of the FPGA-during normal FPGA operation. those confi guration memory
cells become read-only. Thu , one can concei ve of FPGAs whose configuration mem-
ories are made from programmable read-onl y memory technology (PROM. EPROM.
or EEPROM) , although today most FPGAs use RAM and flip- fl op components for
confi guration memories. RAM and flip-fl ops are used probably becau e those compo-
nents need to be programmed quickly using the scan chain method. easily achieved
using RAMlflip-flop components. but not 0 ea ily using EPROM or EEPROM
components.
Automated tools that program FPGAs usually start wi th a file containing the bits
to be shifted into the FPGA chai n-that file is known as a bit file . The tool that
creates the bit file obvious ly must know the number and purpose of every bit ell in
the FPGA scan chain. so uch tools will generate a different bit file for different
FPGA devices.
EXAMPLE 7.12 Programming an FPGA
Thi s exampl e demonstrates programming nn FPGA for the FPGA and de Ired circuit shown
in Example 7. 11 . Figure 7.27 from Exampl e 7. 1 t showed Ihe required con lent of the c "fig-
unuion memory on the FPGA to implement the desi red ircuil. \\'c replicate the C' Oleots in
Figure 7.29(a). Ihis lillie illustraling the llIanner in \I hi h Ihe FPGA h3, the configllr:lti n
memory bits connected 3;:.. a scan chain. Figure 7.2:9{b) :-.ho\\ s h \\ that SClD ' haLO 'on epru-
all y forllls a shift regi ster. Figu,"" 7.29( ) .ho\\ $ Ihe ,'onlenl, of .1 bil rile thlt could be
used to program the FPGA to implement (hI! circuit. \\'r ,:re:lled thJt bit til
follOwing the d3Shcd line that represent!'. the scan hain. placing Is Jnd into the- bl( lile a..,
we sec them in the figun! .
7 Physical Implementation
r-______________ ,FPGA
CLB CLB
Pin
-
Pclk
0
0
a
b
(a)
Pin
-Pclk L _____
Figure 7.29 Programming an FPGA: (a) all configurati on bi t cell s exist in a scan chain, (b) a scan
chain conceptuall y is a bi g shift regi ster. (e) a bit fil e's contents would be shift ed in during
programming-some relati onships between the file' s bits and confi guration bit cells are shown.
How Many Gates Docs an FPGA Implement?
We usually think of a di gital ci rcuit 's size using the noti on of "gates" to represent design
ize. A design with 3000 gates is likely bi gger than a design with 2000 gates. Of course,
whether that statement is true depends on the type of gate5 used in each design (e.g.,
because XOR gates are bigger than NAND gates, 2000 XOR gates may actuall y be
bigger than 3000 NAND gates), as well as the number of inputs to each gate (a 20-i nput
gate is bigger than a 2-i nput gate). Thus, a common method of indi cating de ign size for
a circuit approximates the Ilwllber oj 2-ill/1l1t NAND gates that would be required to
implement the ci rcuit. So when we say lhat a circui t consist, of 3000 gates or 2000 gates,
we typically mean that if were implemented using 2-i nput AND gates,
they would require 3000 2-input AND gate, and 2000 2- inpul NAND gales,
re5pectively.
have lookup lable, and swi lCh malri ce'> in, ide, not gate" FPGA sizes are
therefore typically reported by con,idering how large of a ci rcuit made up of 2-i npul
AND gale, could be implemented using the FPGA ;orchilcclurc. FPGA vendo may
7.4 Other Technologies 401
report FPGA size by saying a particular FPGA h "d .
" 100 000 tical " as a enslly of 100.000 system gates" or
report ' d yp be gates. These numbers are approximations, and many people view such
e num rs very skepllcally (be ' .
FPGA v d ' . cause somettme compani es like to exaggerate).
en ors mt ght also descnbe FPGA size as the number of "10' bl ks"
"lookup t bl " h' h . . gIc DC or
a es, w tC tS useful when comparing sizes of FPGA h . th
of logic blocks or lookup tables. s avtng e same type
FPGA versus ASICs and Microprocessors
FPGAs arc less efficient than ASICs in terms of delay size and powe F I th
. . f F " r. or examp e e
CIrCUtt 0 tgure 7.22(a) could be implemented with a delay of ju t one gate-delay 'i n a
custom or semtcustom IC technology. However, when mapped to the FPGA of Fi e
7.26, thal CtrCUIL w tll have a longer delay- the inputs must pass through the left Ct':-s
lookup lable (whIch may have a delay of two gate-delays), through tbe left CLB's output
muxes (another two gate-delays), through the switch matrix (another rwo gate-del a 5)
through nght CLB's lookup table (another two gate-delays). and finally through
nght s output muxes, resulling in a total of ten gate-delays. In terms of size an
ASIC Implementation of the circuit of Figure 7.22(a) would require about ?O Iran . '
whereas the FPGA implementation using two CLBs and a switch matrix ;;'0 Id StSlOrs.
several hundred transistors. u reqwre
An FPGA implementation of a circuit will therefore be slower and bigger than an
ASIC of the same Ctrcutl. Some studi es have shown that FPGA are
approXlmately .1O slower, and 10-30 times bigger, than ASIC implementations of
the same ctrcutl. SlIntiarly, a circuit implemented on an FPGA may Con ume about 10
ttmes more power than when implemented on an ASIC. But the advantage of being able
to program FPGAs immediately and for almost no cost. rather than having to wai t weeks
or months while spending tens of thousands of dollars. often those
dt sadvanlages. -
. Despite the perfomlance, size, and power overhead compared to ASIC. FPGAs are
sltll much faster than software on a microprocessor for many tasks. in pan because
FPGAs can effectively implement concurrency. pipelining. and bit-level operations. Thus.
FPGAs possess the programming fl exibility of software on a microprocessor. Yet
approach the performance of an ASIC. representing an excellent implementation option
for many designs.
7.4 OTHER 1ECHNOLOGIES
In this section, we describe ot her technol ogies for physically implementing digital ir-
cuits. Some of Iho e technologies are older technogies that are still useful for
situations. Others are newer technologies that are beginning to gain popularity.
Off-the-Shelf Logic (SSI) IC
Sometimes we need onl y impl ement a circui t having just a few gates. In these cas _
using an FPGA may be overkill. as FPGA typi call. upport th usand r milli os f
gates. Likewi se, u' ing an SIC would al 0 be overki ll . For es "here" e only need a
.102 Physical Implementation
few 2.ates. we might instead use one or more. off-
the-,Ilelf lo!!ie I s. A logic IC typi call y cont ains a
few. perhap: ten or less. gat es connected directl y to
the les pi ns. as shown in Fi gure 7.30. The IC
shown has four AND gat es and 14 pins. One pin IS
for power 10 the IC (known as VCC). the other for
2round (eND). The remaining pill S connect 10 the
fou r NO gates in the Ie. as shown in the figure.
Different logic ICs have gate types other than
AND. such as OR. NAND, NOR, or NOT. To budd
a small ci rcuit from these off-the-shelf logiC ICs.
we woul d simpl y pl ace the ICs on a board and
connect the appropriate pins. ICs wit h a few
gates are known as Small-Scale IlIlegratroll chips.
or 551 chips.
7.jOOlCs
vee
114 113 112 111 110 19 18
11 12 13 14 15 16 17
GND
Figure 7.30 Example logic Ie.
The most popular off-the-shelf SSI
IC are known generall y as 7400-
series ICs. A 7-100 IC typi call y con-
tains four to six logi c gales. and aboul
1-1 pins. A particul ar 7400 IC is hown
in Fi 2ure 7.3 1. The IC measures about
112 inch across. The IC package
shown has two rows. or lines, of pins,
and is thus known as a dllal-illlille
package. or DIP.
Figure 7.31 7400-seri es Ie.
7-100 ICs fi rst became avail able in
the earl y I 960s. The ori ginal 7400
chip had four NA 0 gates, and cost
about Slooo each. in 1962. That's
riaht-S looo And that's in 1960s
. when' a U.S. engineer earned
only about S I O.ooo/year. The price
dropped igni fi cantl y during that
decade. Lhanks in large pan to the u e
of huge numbers of the devices by the
U.S. Minuteman Mi ssil e and Lhe
Apoll o rocket programs. and has con-
tinued to drop since Lhen due to
cheaper tran iMors and huge volumes.
Today. you can buy 74oo-seri es ICs
for of cents each.
Parts with different gates have dif-
ferent pan numbers. Table 7. 1 , hows
\ome commonly used 7400 panl from
TABLE 7.1:
Part
74LSOO
74LS02
74LS().I
74LS08
74LS 10
74LSII
74LSI 4
74LS20
74LS27
74LS30
74LS32
74LS74
74LS83
--
74L 85
\""'"
Commonly used 7400-series ICs.
Description Pins
Four 2-inpul NA D 14
Four 2-input NOR 14
Six invert ers 14
Four 2- input A 14
Three 3-input NAND 14
Three 3-input A D 14
Six inveners (Schmil1 LTi gger) 14
Two 4-inpul NAND 14
Three 3-inpul NOR 14
One 8- inpul AND 14
Four 2-i npul OR 14
Two D fli p-nop. positi ve edge 14
tri ggered. with preset and reset
4-hll binary full -adder 16
4-blt 11Iagnllude comparator 16
.. l '11
7.4 Other Technologies 403
Fairchild's 74LSoo subfamily of the 7400 series. In additi on to basic gates. the table hows an
IC wiLh 0 flip-fl ops. full -adders, or a magnitude comparator. Pans al so exi I for XOR.
XNOR, buffers, decoders. mul tiplexers, up-counters, up-down-counters, and more.
There are several different subfamilies of 74oo-series pans-pans from a subfamil y
can be used with other pans from the subfamil y, bUI generall y not with pans from other
subfamilies. The reason is Ihat Ihe voltage and CUrreOi sening of a subfamil y are designed
such that the ICs can be connecled wi thout worrying about adjusling the voltage and current
between ICs. The 74 series (e.g. , 7400, 7402. etc.), is the basic subfamily. based on a type
of Iransistor known as TTL-designers using logic ICs today onl y use 74-series Ie if they
must integrale WiLh old designs, and typically don' t use the series for new designs. The
74LS subfamil y (e.g. , 74LSOO, 74LS02) uses a special Iype of TTL technology known as
ScholLky Ihat results in lower power and Sli ghtly higher speed than the 74 series-the "L" in
the name means low-power. the "S" means Schottky. The 74HC subfamily use high-speed
(denoted by the "H") CMOS (denoted by the "C' ) Lransi lors. The 74F subfamil y was
introduced by FairChild, consi ting of fast (hence Ihe "P') advanced Schon.ky TTL logic.
Numerous oLher 7400 subfamilies exist, with new subfamilies SLill being inlroduced.
Funhemnore, additi onal series of off-the-shelf SSI ICs al so exist in additi on 10 the 7400
series. Another popular series is the 4000 series of ICs. a CMOS series thai evol ved in the
19705 as a low-power alternaLive to Ihe TTL-based 7000 series. More series exi 1100.
EXAMPLE 7.13 Seat belt warning implementati on usi ng oft-the-shelf 7400 ICs
Usi ng 74LS-series ICs shown in Table 7. 1. physicalJy implement the sea! belt warning ligbl circuil
of Figure 7. 1. shown again in Figure 7.32(a). We could implememihe invener using a The
74LS08 has 2-inpul AND gates. and we need a 3-input AND gate. A simple soluti on is 10 decom-
pose Ihe 3-i nput AND into two 2-inpul ANDs. as shown in Fi gure 7.32(b). The final impl ementation
is shown in Figure 7.32(c).
Figure 7.32 Implementi ng the seat
belt warning circuit wilh 74LS-
series ICs: (a) desired circuit. (b)
circuit transformed 10 li se 2-inpul
AND gates. (c) circuitmnpped 10
two 74LS I s. Additional
connections not shown would be
power 10 the 114 pin, and ground
to the 17 pins on ench Ie.
(a)
(b)
(e) ____________________________________ .J
40-1 Physical Impleme nt ation
Preferably. we \\Quld implement the circuit usi ng onL' IC. (0 reduce board size. cost. and
power. onvcning the circuit to use only one type of gatl!. like AND gates onl y, or NOR gUles
onl y. could result in just one IC. For exampl e. if we could t:onvcrt to 3-i npul OR gales. we could
use the 74LS27 chip. \Ve slart by conveni ng the circui t to onl y. as in Fi gure 7.33(1.1). We
remml! the double in\'ersion. and replace the si ngle inversions by 3-i npul NOR gates. The imple-
mentation a 7-tLS27 Ie is shown in Figure 7.33(c),
Figure 7.33 lmplementing Ihe seal
belt warning circui t with one 74LS
IC. namely. the 74LS17 consisting
of three 3inpul NOR gales: (a)
desi red circuil transformed 10 OR
gales with inversion bubbles. (b)
circuit wiLh double inversions
eliminated and si ngle inversion.s
replaced by Iinput OR gales. (c)
circuil mapped to a 74LS27 chip.
Additional conneclions not shown
"ould be power to the 114 pin. and
ground to the 17 pin.
(a)
j - ---- -- ----- - -- --- i
: 0 :
, '
, i
' w
,
,
,
,
... - --- - - ------ - - --- -..!
(b)
Simple Programmable Logic Device (SPLDI
p :
i
l __ .. ..... __ ...
(c)
A programmable logic device. or PLD. is an IC that can be confi gured to implement a
varielY of logic functi ons. ranging from tens to thousands of gat es. PLDs became popular
in the I 970s (thus predati ng FPGAs). as they could implement far more functi onality in a
si ngle IC than pos ible usi ng SSI ICs.
A PLD device comains a prefabricated circuit wi th a set of external inputs feeding
into a large ANDOR circuit structure. with the special feature that the user can confi gure
(via "programming") which external input connect to the AND gat es. For example,
Figure 7.34 shows a basic PLD with three inputs feeding int o three AND gates followed
by an OR gate. The inputs feed into the AND gate in both true and complemented fomls.
Each wi re feedi ng into each AND gate pas es through a programmable node. which can
ei ther pass the node's input to the node's output. or di sconnect the node' input from the
node's output. Thus. by programming the programmabl e nodes. we can program the PLD
to implement allY 3term functi on of three inputs.
The programmabl e node design vari es among of PLD;. Fi gure 7. 35 shows two
types. The type shown in Fi gure 7.35(a) is based on a fu,e. A fu,e conducts like a wire,
unless we "blow" the fuse. meaning we pa;s a higher than normal current through the
fuse. causing the fuse to literall y burn up and break. A blown fuse obviously does not
conduct electricity. The type ..hown in Fi gure 7.35(b) i, baloed on memory and n tran
sistor- we program a 1 into the memory to cau,e the tran,i \l or to conduct. and a 0 to
cause the transiMor to not conduct. We omit the detai l .. of how to program the fuses or
program the memOrI e, Memory based PLD, cal) u,uall y be reprogrammed,
in contra" to fuse ba ed PLD, that can only be programllled once and are known as
11 12 13
, },--
)
.. -.------.+-... __._._ .. __
programmable nodes
7.4 Other Technologies 405
figure 7.34 A basic example of a programmable
logic device. (AND gales are wiredAND.)
01
programmable node
" unblown" fuse " blown" fuse
figure 7.35 Two Iypes of programmable
nodes: (al fuse based. (b) memory based.
olletime programmable (OTP) devices. Fuse based PLDs are po I . I .
. 1' " ". pu ar In e ecmcally
nOI sy app ltke space apphcauons, smce memori es can have their contents
changed from radl3uon In space. are also popular in applications demanding b.igh
securIt y, smce mahclous enemI es can t reprogram the device. Memorybased devices are
more common. however. SInce they can be reprogrammed and thus reduce costs when we
make deSIgn The memones used are almost always nonvolatile. meaning the
memones don t need power to retam theIr stored bits. (See Secti on 5.6 for more info-nna.
tlon on nonvolatIl e memones. )
. You might be wondering how those AND gate work when the programmable node
IS programmed to dI sconnect an Input-how does the AND gate treat an input wi th no
conneclt on? As a O. as a 1, or a something else? Actuall y. PLD don' t use nonna! AND
gates. Instead. they tYPIcall y use what IS known as 'wiredAND." Explaining how wired
AND works .IS beyond the scope of thIS book, and instead the Subject of a COllISe on tran.
slstor: level Clrcutts. For our purposes, we can thmk of a wiredAND gate as an AND gate
that SImply Ignores unconnected Inputs. -
Real PLDs have more than just three t1 12 t3
inputs. three AND gat es, and one output. PLD
structure drawi ngs thus need a more concise
way of drawing the ci rcuit s. A concise method
or drawi ng PLDs is shown in Figure 7.36.
Such a drawing docsn' t show the progrdm
mabie nodes. and simply utili zes an "x" to
indicat e a connect.ion. In the drawing. wi res
that cros. each other are 1101 connected unl ess
an "x" exi sts at the cros ing. FunhernlOre.
such a drawing uses a singk wire to repre em
all the A D gate inputs. representing the
-H....;IO+.-oI"'- .c., --..,; - ------- -----]
,
,
,
PLDIC :
-----------------------..!
figure 7.36 implified PLD dr:l\\;ng.
01
Physlcallmplementallon
\\ The shows how we would use such n drawing to indicate the connec
ti on; needed to the term 13* I 2 ' . The "x" on the left represe11l s 12 ' feeding
int o the top AND gale. The " X' on the ri ghl indical es 13 feeding i11l 0 Ihe lOp AND gale.
EXAMPLE 1.14 Seat belt warnlllg light using a simple PLD
to impkllll.! nt the belt warni ng light
!:iv... tem of Fi2ure 7. 1 the PLD of Fi gure 7.36.
can do b) the PLD as shown
in Fi gure 7.37. \Ve generate the desired tenll kps I
by p;ogrammi ng connecti ons ror the top AND
2at c J!) shown. \Ve want the bottom tWO AND gates
;0 output 0, so th:l t the OR gate' s output equal s the
top AND gatc' !:i out put. We can achieve Os by
ANDing an input with complement- the result
of a*a ' I.!o. al\\3)!! O. The fig ure shows two ways
of O!!. " iIh the mi dd le gate using j ust one
of lhe Jnd the bottom gale using all three
inputs-the re... uh is lhe same.
p
-- --- --- ----------------------.
Figure 7.37 Seat beiJ warning system
on a simple PLD.
w
PLDs Iypicall y have more than j ust one OUIPUI. Figure 7. 38(a) shows a PLD wilh IWO
outputs instead of just one. Each ompul is an OR of up W Ihree terms. , .
Many PLDs have a D flip-flop thai stores each OUI PUI s bl!. and Ihe PLD s oulpUI pill
can be programmed 10 connecl from the OR gale oUlput or Ihe flip-flop OUiPUI , known .as
combi nali onal or re2istered outpul, respeclively. A PLD supponlll g comblllalionaUregls-
tered OUIPUI is in Figure 7.38(b).
11 12 13 11 12 13
-------- ---------- -- -- ---------,
'
,
, ,
, :
: 01
! PLO Ie: 1 __ - _______________________ _ _ ...
(a)
-------------------(bi----
Figure 7.38 (0) PLD wilh IWO oUl pu15. (b) PLD ",ilh programmabl e regislcrcd OUlput S.
AnOl her eXlension is to allow Ihe PLD oUlput 10 be either the tnJe or complememed
.alue of the OR gal e or fli p-fl op output. using a 2x I mux controll ed by a progr:llnmable bit
Yet anOlher eXlension is for the oUlput 10 feed back 10 Ihe input array. One use of feedback
is to impl emem fu nclions wi lh more lenns. achi eved by feeding bac k Ihe combi nati onal
OUiPUI val ue. Another very common use of feedback. achl e"ed by feedlllg back Ihe
7.4 Other Technologies 407
regislered OUlput. is 10 implemem a slate regisler and control logic (i.e., a controller}-Ihe
AND array gelS ils inputs from Ihe regi slered oUlputs and eXlernal inputs. and Ihe OR gates
then generale Ihe eXlernal outputs and Ihe next values for the state register.
Some PLDs nOi only have a programmable AND array, bUI also have a program-
mable OR array, meaning Ihe OR gale can gel its inputs from any of Ihe A D gates.
SPLD versus PAL versus GAL versus PLA
Li ke so many names in Ihe rapidl y evolving field of hi gh lechnology, names for PLDs are
a bil blurred and confusing. Ori ginall y ( 1970s). PLDs consisted of programmable AND
arrays and programmabl e OR arrays, and were known as programmable logic arrays. or
PUs. In Ihe mid- 1970s, a company named AMD (Applied Micro Devices. lnc.) devel-
oped PLDs Ihal instead had OR gales with fi xed rather Ihan programmable inputs. as in
Figure 7.38 and Ihe OIher PLD fi gures we' ve shown, and referred 10 such device as Pro-
grammable Array Logic, or PALs ("PAL" is a regi stered trademark of AMD). PALs were
origi nall y fuse-based and hence one-time-programmable. A company named Latti ce
SemiconduclOr Corporalion developed a PLD using a memory-ba ed programming
approach ralher Ihan fuses, resulting in reprogrammability. and referred to such device
as Gell eric Array Logic. or GAL (which are regi stered trademark of Lattice Semicon-
duclor Corporal ion). As PLDs became more complex (a well di cu in the next
seclion). PLDs based on PAL or GAL architectures (PLA archi tectures seem to be prett),
rare) became known as Simple PWs. or SPWs. to conlraSlthem with the more complex
PLD vari elies. Today. numerous companies manufacture SPLDs. and often state Ihat Iheir
SPLD archilecture is based on "PAL" or "PAUGAL" architectures. wilh Ihe di stinction
bel ween PAL and GAL nol seemingly relevant in that comexl.
SPLDs Iypicall y support lens 10 hundreds of logic gates.
Complex Programmable Logic Device (CPLD)
As IC rransislor densilies grew in the 1980s. companies began to build PLDs 10 suppon
thousands of gales. However. the PLD archilecture described in the previous ectioD does
nOI scale well 10 thou ands of gate -who needs one big huge circuit of two-Ie"el logi 0
Inslead. archit ectures evolved Ihat consisted of numerous SPLDs on a in21e de,; e. con-
nected using switch malrices (also known as programmable imer e tion
7.3 for delails on swi lch matrices. These devices today are lenO\\ n as Complex PWs, or
CPWs. CPLD can Iypicall y implemem designs with thousands of gUles.
SPLDs versus CPLDs versus FPGAs
What' s Ihe dilference among SPLDs. CPLDs. and FPGAs' In general. Ihe tenn PLD is used
for devices thm support lens 10 hundreds of gates. CPLD for devices Ihat suppon thousands
of gales. and FPGAs for devices Ihal suppon tens of thousands f gates 10 million of g31
Funhemlore. loday. SPLDs and CPLDs are almo t always nonvolati le. me:min; Ihe\
can store their program even after power is removed. whereas FPGAs are aIm ' t
volati Ie. meaning Ihey lose their program when power is remO\ ed--and must in lude
external circuitry thOi lores the program in nonvolatile mem and that progrnn ' Ih
FPGA from IhOi memory on po\\ cr up of Ihe FPGA. FPGAs are likely, Iatile
because of the wlIy Ihey are programmed using a s an chain. \\ hi -h is easy using fl ip-fl ops
and RAM cells. bUI would be diflicull u ing nom lalile mcm bit: . Ho\\ 'cr.
hysical Implementation
,'unceptuall), any of SPLD" CPLDs and FPGAs could be made to be volatile or nonvolatile,
and une might 'lI1ticipate thai fulure FPGAs wi ll include FPGAs that are nonvolatile,
Flows
An interc"ing new technology that has evolved in the earl y 2000s is Ihat of creating an
AS IC from an FPGA- ba,ed design, Many designers usc FPGAs for ASIC protolyping,
The) 'hC automalcd tools 10 implement Iheir circui t on FPGAs, and Ihey then extensively
lest Ihe ci rcuit in Ihe circuit ' s environmenL for example, in a prolot ype DVD player, The
FPGA-based prolotype implementali on may be larger, costl ier, and more power-hungry
Ihan an ASI C-ba,ed implement ati on, but can be very userul for detecting and correcting
in the circuit. (IS well for demonstrati ng the event ual producl. Once sati sfied wi th
the circui t. mi ght then use tools to rci l11 plement the circuit on an
ASIC. The ASIC implementation traditi onall y did not ul ilil e any informali on from the
FPGA implementat ion.
Implementing large ci rcuits on ASICs is a diffi cul t lask, even with automated tools.
Nonrecurring engineering costs may exceed hundreds of thousands or even millions of
dollar>, and fabricat ing the IC may take many week, or monlhs. Furthermore, any
problem \\ ith the fa bricated ASIC may require a second fabri calion cycle, requiring addi-
tional \\cck, or month" Problems may arise in Ihe ASIC Ihat didn' t appear in the FPGA
due to the completely new implementati on of the circui t as an ASIC- perhaps timi ng
problem, might arise, for exampl e, due to the circuit being placed and routed in a com-
pletely different fash ion Ihan was the case in the FPGA.
To ea,c the migrati on of a circui t From FPGA 10 AS IC, some FPGA vendors offer a
'truclUred ASIC approach. In a siruelllred ASIC approach, an automated tool converts the
FPCA illlplelllell/(llioll to an ASIC implemenl at ion, in conLrast to converting the origillal
circuil to an ASIC impl ement at ion. In oLher words, a truclUred ASIC will refl ect the lookup
table and ,wi tch matrix truclUre of the ori ginal FPGA. However, the structured AS IC will
nOI be programmabl e. and thu will have faster lookup tablCe and faster switch matrices,
Ihei r conte Ill;, will have been "hardwired" into the ASIC. The structured ASIC's
can be preplaced. with only wires left to be completed to implement a part.icular cir-
cuit. The re,u lt i, les, I RE cost (tens of Lhousand!. of doll ars rather than hundreds of
thou,and, or mi ll ions) and le;s time-to-s il icon (weeks rat her than mOlll hs), as well as less
chance of problems, The drawback is that the ASIC will be larger, slower. and
more power-hungry than a tradi tional ASIC. but ti ll bell cr than un FPGA.
The advent of IC, cont aining a billion tran,btor has led 10 I ;, Lhat contai n what used to
exi,t on multiple ICs, Thu;. a single I may contain dozen, or hundred, of microproces-
'Of>, cU'lOm digital ci rcuits. memori es, bw,cs, elC, An I wit h numerous processors,
CU\i()m circuits. and memori es is known as a System-oll-a- Chip, or SOc.
While many SOC, are creatcd by dc,ignw, for a pani cul ar application (e.g" for a
particular DVD player), ot her SOC, are crealed to be uscd in a vari ety of diffe rent appli
cauons. Such platform SOCs might conl ai n prOCe"Of\ and ell ' tom circui ts specifi cally
(or an apphc:lllon domain. For a platform SO for Ideo processing mi ghl
7.5 IC Technology Comparisons 409
cOlllain custom di gi tal c ' t h .
' d . IrCUI s aving hardware optimized for hi gh-speed low-power
VI eo compression and deco . (k
d '. . mpresslon nown as eodees)-such pl atforms often contain
co ecs or a Wide vanety of protocol s (e.g, MPEG 2 MPEG 4 H 264 ). th
pl at form co Id b d ' . . ' , ,. , etc, ,Since e
. h N u . e u e In different products supporting di fferent standards, An example
IS t e .. expena pl atform from Philips. Furthermore, some pl atform SOCs contain FPGA
In addll ion to one or more microprocessors and custom di gital circui ts on the Ie Exam-
pl es mc lude the V,nex II Pro platfoml from Xilinx and the Excalibur platform from
Ahera. DeSigners might uti lize a platform SOC to prototype an ASIC, or to physicall y
Illlplement a syslem In a final product.
7.5 IC TECHNOLOGY COMPARISONS
Relative Popularity of IC Technologies
/" 2002 aloll e.
m'ari), 80 billiun
l e s (o[ on /)'{Jes)
were prodll ced.
(Source: Ie
' "sig/us McClean
He{Jar/. 200J.)
TABLE 72.: Sample 'to of new
implementations in various
technologies. Total is more than 100%
due to overlap among categories,
Technology %
We' ve described numerous technologies in thi s
chapt er. In thi s secti on, we' ll give you some idea
of the relati ve popularity of some of those technol-
ogies. Tabl e 7.2 provides the rel ati ve percentage of
deSigns that were physicall y implemented in
various technologies in 2001, based on a particular ________ -_5_%_
study. The table considers each new unique design Gate array 5%
onl y once, meaning that it doesn' t mailer how :S=-y-s- te-m---o-'- n--a---C-h-i-p------3-Q% -
many copi es of the same design were manufac-
lUred. That tabl e' s data does not include off-the- :F:: U:: II-: -C:-U-:s:-to_m c:-_ _ ___ ___ '_w_"'_
shelf SSI ICs or SPLDs (both represent only a tiny CPLDIFPGA
fracti on of the IC market from a total dollars per-
specli ve, and are thus often excluded from such 5'l-
surveys). A different study describes 2002 IC reve- Sou,t:<: Synopy , DAC 2002 panel.
nues (as opposed to unique designs) totaling $11 billion as follows: standard cell fuji
custom 20%, gate array 10%, PLDIFPGA 17%, and other 5% (source: WSTS, lC Insi2hts),
Yet anoLher study li sts 2002 ASIC revenues at S I 0. 9 billion, PLDIFPGA revenue at- _-
billion, and SOC revenues at $7.6 billion (source: Busi ness Communications Company,
2003). Numbers from different studies vary: we provi de these numbers just to !rive you a
general feel for the popul arity of the vari ous technologies. - -
Some general trends seem to include the increasi ng popularity of FPGA . the
increasing use of structured ASIC approaches, and the increa ing appearance of y rem-
on-a-chip.
The 10015 used to map di gital designs to phy ical impl ememations, ollecti\elv
known as Eleelronie Design All tomation tools, or EDA, themselves fom) a market with
of $3 billion in 2002. $3, 6 billion in _003. and predi ted re\'enues of billion
in 2006 (source: Ganner DataquesL 2(04).
Tradeoffs among IC Technologies
Figure 7 39 the general tradeoffs anlOng the key I techn I gies dc_ -ribed in this
chupter. Technologies toward the ri ght an be more customi zed t a parti ulac d ired dr-
cui t, and thus may have fustcr perfomlance, higher density (smaller chip for a giw n dn:uitt
b
-110 Physical Implementation
lower PO\\cr. and larger chip capacil y (more circuils on " single chip). BUI such customized
lechnologies \\ ill be more cosil y 10 design and wi ll lake time 10 design. Technologies
loward Ihe Icfl are less cuslOmi zed 10 a particular desired circuit. and Ihus may be more
quickly a\ ail able and have lower design COSI. bUI al Ihe expense of slower performance, less
densi lY. higher power. and less chip capacilY (fewer of our circuits on a single chip). More
generally. lechnologies loward Ihe righl allow for more oplimizali on. Technologies to the
left yi eld less oplimizali on. bUI yield easier design.
Figure 7.39 Tradeoffs among
sc\eral lC technologies.
,
E:
!!' ,
FPGA 0>'
e:
PLD 5}i
Full -custom
Siandard cell (semicuslom)
Gale array (semicustom)
Quicker availability .......--
Lower design cost ......-
----.. Faster performance
-- Higher density
----... Lower power
-- Larger chip capacity
Easier design More optimized
Furthermore. FPGAs and PLDs nOl onl y enable easier design, bUI may be reprogram-
mabie. a feature Ihat enables changes 10 the circuit lale in Ihe design cycle, or even after
the circui!"s IC has been deployed in a fina l product.
Choosing an IC lechnology for a parti cul ar design wililherefore depend on the con-
straints imposed on Ihat design. If a design needs 10 gel 10 market quickly, Ihat constraint
fa"ors PLD and FPGA lechnologies. If a des ign must be extremel y fast, that constraint
favors emicuslOm or full-cuSlom technology. If a design must consume very little power
or lake up "ery little space. Ihose constrainls favor emi cu 10m or full-cus tom technology.
If changes 10 the ci rcuil are likely_ thai constraint favors PLD and FPGA technologies.
Choosing the besl lechnology is a hard problem, requiri ng careful consideration of
numerous compeling con traints.
Ie Technologies versus Processor Varieties
IC technologies and proces or varieti es are onhogonal implementati on features. Two imple-
mentation feat ure are orthogol/al if we can seleci each independentl y (in mathematics,
orthogonal means forming a right angle). We know Ihatlhere are several proce sor vari eties
thai can each implement a desired system function. including a custom proce sor, or a pro-
grammable processor. Fi gure 7.40 ill ustrates that Ihe ch ice of processor variety is
independent of the choice of IC technology. Point 1 illustrates the choice of implementing
desired system fu ncli onalit y using a cu tom processor circllit wi lh a fu ll -cu 10m IC tech-
nology. That choice re, ult, in a hi ghl y optimized de ign. Point 2 illu>lrates Ihe choice of
implcmenting a cu'tom processor ci rcuit on an FPGA. While the ci rcuit may be optimized,
the FPGA I lechnology results in a less-optimi7.ed implemcntalion (compared to full-
cu"om) but an c;"ier Point J illustrates the choice of implementing de ired system
funcllonalit y as ,oftwarc execuling on a programmable procc\Sor, where the programmable
7.5 IC Technology Comparisons
processor is implemented in stan-
dard cell s. Point 4 illu trates the
choi ce of implementing software
on a programmable processor,
where the programmable pro-
cessor is aCl uall y implemented
on an FPGA. While that concepi
may seem strange, a program-
Custom
processor
.(2) .,/
More optimized
Easier design
/'
mabl e processor is jusl another Programmable
circuit , so Ihat circuil can be processor (4)
(3)
mapped to an FPGA just like any
ot her circuil. Programmable pro-
( 1)
cessors mapped to FPGAs are in Gale Siandard Full-cuSlom
fact becoming increasingly pop- array cell
ul ar. because a designer can Figure 7.4ll Ie lechnologies and processor varieties are
choose how many processors to onhogonal implementation features. Four of the ten
put on a single IC (perhaps the possible choices are shown.
411
desioner wants 9 proorammabl e
. 0
0
o processors on one IC), and because a designer cao put
sln)e-purpose processors alongside programmable proces ors-all withoul havin2 to
fabrlcale a new Ie. -
Of course, programmable processors can often be purchased as off-the- helf ICs. so
a deSigner uSing a programmable processor may nOl have to worry aboul the processor' s
IC lechnology.
But des igners must place a programmable processor within their own
IC. coexlstlll g with other processors. When a programmable processor coexists on an IC
along with other processors (programmable or custom)_ that pro!!Tarnmable proces or is
often referred 10 as a core. -
Our discus ion of IC teChnologies and proce sor varietie has thus far assumed just
one type of each Item (e.g., one type of FPGA). In reality. each type it elf has maoy vari -
eties. For example_ dozens of differentlYpes of FPGAs are available. varying in their size.
speed. power. co t. elc .. Likewl e, dozens of different rype of programmable proces 011;
are avrulable. also varying In those features . And we know thai we can create different
Iypes of cuslom processors, varying also in their size_ speed. power. etc. Thus_ each point
Fi gure 7.39 and Figure 7.40 IS actually a large collection of points that spread out in
different dtrecllons on Ihe plots. and may even overlap with other types_ Funhennore_
other IC lechnologles as well as proces or varieties exist and continue to e\"ol"e_
We also point out Ihat a single IC may actually incorporale se"eral different IC tech-
nologies. So a single IC may have some circuits created u' ino full-custom technol02Y.
and other circuils created using ASIC or even FPGA Like\\ i' e. a si ngle p;;;'
cessor may dl.fferelll parts Implemented in different IC technologies. F r example. a
common situati on IS for a programmable proce sor 10 ha\'e it ' datapath implemented in
ful l-cuslom technology. but ils controller implemented in ASIC rens n
being that the datapal h is very regular. while the ontTOlier is mo' th rured combi-
national logic. -
In summary_ designers have a hllge number of choices in ch ' ing proc :;sor \':lli-
eli es and IC Ie hnol ogies to implement their s:nem' .
n
H 2 1 Physical Implementation
Ie Technology Trend-Moore's law
I" tl 2()().J \/,1'1:(-11.
lUI I,,'t-/I/(',-
prt'\Illt.ml
\//1l'tf'flt'tl Ihllt 11('
II(lU
IrlllUUIOrf U,S
('{It'IIf/I,II, jrt!t'.
Under,tandi ng thc trends or IC
technologies requires
of loorc', Law. il'l oore's LAw
rough I) state; that IC capacit y
double, evcry 18 month;. Figure
7.-1 I plots , uch doubling. begi n-
ning with about 10 milli on
tran, istor, pCI' IC in 1997. The
plot u,os a logarithmic ,calc for
the l'axis-each tick mark repre-
10 more than the
prc\ ious ti ck mark. The growth
rUle is astounding-ICs increase
from 10 milli on tran;istors in
'"
100.000
I to.ooo
Q; 1,000
Cl.
!!?
tOO
*
';;'
c 10
Figure 1.41 The (rend of incrc<lsillg transistors per Ie.
1997 to owr 10 billion transistors in 2015. That means that the 20 15 IC can hold 1000
time, more tran, i; tof> than the 1997 Ie. In other words. the 20 15 IC is as powerrul as
aboUl 1000 1997 IC,. Thi s increasing capacit y trend has also resulted in the cost per tran
!-l istor dropping at nearl) the same astounding rate.
The IC capacity trend has many implications. One implicat ion is that digital
designers can creale Illilssively parallel designs that usc huge numbers of functional units
and register>. to create high-perrormanee systems not previously practical. The number or
required transistors ror such des igns might have been considered absurd just a decade ear
lier. Another impli cation is that the size overhead of FPGAs compared to ASICs (about
lOx) become, Ie,s rel evant . making FPGAs an increasingly popular choice in more sys
tems. Yet another impli cation is that designers increasingl y need automated tools to help
build multimilli on Lransistor circu its. and Illay increasingly wish to use RTL and
even hi gher or design (e.g .. C-based design) as the method ror describing circui ts,
Iea\ ing the remai ning design to tools.
At point. Moore's Law must come to an end. because transistors cannot shrink
to an infi nit ely ize. When that end wi ll occur has been a subject of debate ror many
yea". Some people ha\e predi cted Moore 's Law wi ll continue a couple decades into Lhe
2()()(h.
7.6 PRODUCT PROFILE: GIANT VIDEO DISPLAY
In the late 1990, and 200(h. giant color video di splay, became popu lar at sport stadi ums.
car dea le"hip'. ca, ino<;. rreeway bi ll boards. and various ot her locat i ns. Most such video
d"play, utillLe a huge grid or light-emitting di odes (LED,) driven by digital ci rcuits.
A light-emil/iug diode (LED) is a semi conductor device that eilli ts li ght when current
pa,<'c, through the device. In conLrast. a traditi onal " incande,ccnt" li ght bulb emits light
when current p""e, through the bulb" int ernal filament. which i, a hi gh-resistance wire
that heat' up and glow' when current fl ow' Lhrough the wire- the wire, however. doesn't
hurn II " cnclo,cd in a vaccum or inert ga, "ithlll the bulh. BeclIu, c LED light
come, rrom a ,emlconductor material and not Imm a hot glowlIlg hl ' lnlent in tl bulb. LEDs
U'>C Ie" rower, ia't longer. ;lnd ca n handl e vihration' that would u regular light bul b.
LEOs have long been used to
di play simple device status (e.g. , on or
off) .. text messages. or even simple
graphi CS. However, until recently. LEDs
were onl y avail able in white. yell ow.
and green colors, Hnd were not very
bri ght. Thus. earl ier LED video dis-
plays were typicall y small , used onl y a
Single color. and were designed for
IOdoor li se. However, wi th the deve lop-
ment of the blue LED in 1993. and the
1.6 Product Profile: Giant Video Display 413
Traffic
illclIlIdescefllligill
(Illd red plastic em'er
Traffic light made from
se\'eral hllfldred red LEDs
development or brighter LEOs. ru ll - figure 1.42 LEOs arc replacing incandescent
color LED displays evolved that can bulbs in Iraffic li ghts. as well as other areas.
display video in much the same way as a computer monitor or televi ion. even in unny
outdoor .. In ract . LEOs. being a semiconductor technology. have been
IIn provlO" at a rate sllndar to transi stor (which also use semiconductor technol ogy). The
Improvement has followed what is known as HailZ's LAw (the LED equivalent of Moore ' s
Law), statll1g that the LED " nux per package" doubles every 18-2-1 months. which has
been the case ror several decades. Due to thi s improvement. many people predict that
LEOs WIll repl ace incandescent li ght bulbs ror home and office lighting. LEOs ha"e
already begun to replace lOeandescent bulb in traffic light s. as illustrated in Figure 7.-12.
. Figure 7.43(a) shows a large LED video di splay capable or di splaying full-color
Video on a 15 x 8 screen. Because each LED is relati vely large ( 1/8th of an inch
Wide. for exa mple) In comparison to the pixe ls or a computer monitor. one has to tand
several feet away from the LED display to view the image wi thout notiCing the indi\' idua!
LEOs. Ir we look closer at the LED display. as seen in Figure 7.-l3(b). we can see the
IOdl vldual lines or the di splays. Ir we look even closer at the di splay. we can finallv
the individual LEOs wi thi n the display. as shown in Figure 7.-13(c). That figure
the LEOs are cl ustered Into groups or red. green .. and blue LEDs-each cluster represents
one pIxel. For the LED Video display shown 10 FIgure 7.-13. each cluster or LEOs consi ts
of five LEOs: two red. two green. and one blue LED. Giant video are indeed
Intended to be viewed from a distance. 0 mo t viewers don't see the indi\ idual LEOs.
one pixel
(a)
figure 1.43 LED video di>pla) : (al a large LED di phI) (aboul 10) ard< \\ ide and - ards tam. [h) a
do,cr \ ie\\ ing :lOoUI I yanl. (c) :l \ C'r) cl "e \ le\\ 'ho\\ mg ahout I 'Quare incn--Ib
"pi ch" can be ,cell . each pi\cI ing :2 red lo\\t:r-nghll.'Ir pl \ "1 . gre 'n
right and Itl\\ "r-Ieft of pi,el). and I blue LED (\'ellll'r 01 PileI).
Physical Implementation
A>sume we wanl to creme an LED video di splay capable of di splaying a nOx480
pixel video. where each pixel simply consists of one red. one grecn. and one blue LED. If
each LED cluster has a width of just over 3/8 inch ( 10 millimelcrs) and a heighl of 3/8
inch. our di spl ay will be roughly 2-1 feet wide and 16 feet hi gh. Furthermore. our di splay
wi ll contain over one million indi vidual LEOs. because 720 * 480 = 345.600 pixels. and
the LED, per pixel results in 1.036. 800 LEOs.
Controlling every LED using a single digital circuit wou ld require millions of output
pins and miles of wire to connect all of the LEOs. Insteael. as depicted in Figure 7.44. an
LED vieleo display is construcled of smaller and smaller component s. The LED di splay
consists of an arrayal' small er components call ed /lollels. shown in Figure 7.44(a). The
panels are large di splay components typicall y designed in a modular fashi on such that
di spl ay manufaclUrers can easily create custom-size video displays and repair broken
components within a di splay simply by replacing individual panels. The LED di spl ay
panels are further divided into LED lIIodllles that control the physical LEOs. shown in
Figure An LED modul e is the basic di splay component and. depending on the
design of the module. can cont rol anywhere from a few hundred to a couple thousand
LEOs. For exampl e. in designing a pixel di splay. we may want 10 use an array
of 6x6 panels. where each panel consists of an array of 5x5 LED modules. Each LED
module would then need 10 control an array of 24x 16 pixels. where each pixel is com-
posed of three LEOs.
The LED video di splay functions by dividing the incoming video stream into sepa-
rate streams for each panel. The panels furt her process the video stream by di viding the
incoming video stream into even smaller streams for the LED modul es. Finally. the LEOs
modules di splay the video frames by controlling the LEOs to output Ihe correct colors for
each pi xel. or LED cluster.
LED Module
The LED module controls the individual LEOs wi thin the video display by turning the
LEOs on and off at the proper times 10 create the fi nal color images. Because each LED
module can consist of thousands of LEOs. directl y controlling each LED would require
too many wires. Instead. as shown on Fi gure 7.45, the LED within the LED module are
connected in a matrix with a single control wire for each row and three control wires for
each column (one wire for each colored LED within the LED clusters). In the fi gure. the
LED module controll er control s an array of 2x3 pixels, where each pixel consists of three
indi vi dual LEOs. for a total of 18 LEOs. But as shown. the controller u es onl y 9 wires 10
control those 18 LEOs. The wire aving using thi s row and column approach becomes
even more signifi cant for more pixels. An LED module with 24x 16 pixel s and three LEOs
per pixel would have 24* 16*3 = 11 52 LEOs. but the controller would require only 16
(one per row) plus 24*3 wires (three per column). for a total of onl y 88 wires.
The largest LED
displtl}' in 2004
was J 35 jeel wide
by 26 feet wl/.
buill fI.fill g 10
large FI'GAs, 323
lIIo(/erale-si:.e
FPGA.r. 333 flash
II/l' lIm rie,f, 1I11l1
3800 PLDs.
(Source: Xedl
j (mmal. Wiml"
21JO.1).
Panel Panel Panel
Panel Panel Panel Panel
Panel Panel Panel Panel
(a)
7.6 Product Profile: Giant Video Display 415
Module
,
,
,
"\\
000
000
...
... Module
000 .. .
(b)
0
Q
...
0
Blue Red
Green
(e)
Figure 7.44 LED video displays are designed hierarchicall y: (a) Ihe LED d' I .
larger panels. whi ch can be composed to create different sized di spl"ys d
lsP
conSIsts of several
. d' . .. . an w lch can be
In IVldually replaced to repair broken panel s. (b) each panel consislS of several smaller LED
modul es. responsible for controlhng the IndiVidual pixels. and (c) each pi xel .
red. green. and blue LEOs. Con ISIS of a cluster of
The LED module controll er
di splays a video image by
sequenliall y scanning. or
enabling. each row and di s-
playing the pixel va lues for each
column within the video image.
Us ing thi s teChnique. onl y one
row of LEOs is illuminated at
any given time. However. the
LED module scans the rows fast
enough such that the human eye
perceives all rows a being
illuminated.
The LED module must
control the LEOs to create the
desired color for each pixel.
LED Module
Controller
Rl
R2
R3
C2 C2 C2
(R) (G) (B)
Figure 7.45 LED module circuit consi ting of a matrix of
red (R). green (G). and blue (8) LEOs controlled b\ the
LED module controller. R IIR2IR3 are row I 3.
and el fe2 are columns I and 2-thus the matri . is
2x3 pixels. or 6 pixels total. with 18 LEOs total (3 LEOs
per pixel ).
Each pixel wi thin a video frame is typically represented usino an RGB I
e co or pace. An
RGB (red/greenlblue) color space is a method to create any color of li2ht b dd' 0
'fi' . . . h f d - ya tn. spe-
Ct c 1I1tenSllles. or bng rnesses. are. green. and blue colors. Each pixel within a -video
frame may be represented as three 8-bi t binary numbers where each b't be .
. - I num r peCt-
fies the intensity of the red. green. or blue colors. Thus. for each alar. the LED od I
111uSt be able to provide 256 distinct brighlness levels. However. an LED by itse7f
support s IWO values: 011 and off. or full Inl enstly and no intenSity. .
. To support 25.6 brightness levels. the LED module controller u e pul e width modu-
lal1 on. In pulse IIIld,h lIIodulatlOlI (also known as PWM). a controller dri\es a wire \\ ith
a 1 value for a specific percentage of a time period-the signal being 1 is kno\\ n as a
pulse. the duration of the 1 is known as the pulse s width. and the pen:-enta2e of the
period spent at 1 is kno\ n as the dilly cycle. When thm pulse drive ' an LED: a \\ ider
pulse causes the LED to appear bri ghter 10 the human eye. Figure 7 A6 illu>t:rates ho\\ the
LED module cont roller uses pulse width modulation to suppon \ ariou, brighrn -, le\eIs
for the LEOs. To illuminate an LED at full brightne s. the controller Jri\C, the
416 Physical Impl ementation
LED with I for the ent ire period. as shown in Figure 7.46(a). To illuminate the LED at
half brightness. the controller uses a pul se with a 50% dUl y cycle. as shown in Figure
7.46(b). For 25% bri ghtness. the controll er sets the pul se to I for 25% of the period.
mean ing a 25% dut y cycle. as shown in Figure 7.46(c). For an LED video di splay. the
LED modul e cont roll er divides the length of time each row is scanned int o 255 time seg-
ment s. and cont rols the bri ghtness of the LEOs by turni ng each LED 0 11 for 0 to 255 time
segment s. thereby support ing 256 levels of intensity.
Period 1
,
Period 2
,
Period 3
,
Period 4
,
, , , ,
, .. 1'-
"i
" : III
.,
(b)
(C)-{l
r r r
Figur. 7.46 Pul se widlh modulat ion can be used 10 create various LED brightness levels: (a) for full
bri ghtness. the LED is always on. (b) for half bri ght ness. the LED i" tumed on 50% of the time. and
(c) for quart er bri ghtness. the LED is turned on 25% of the time.
Because an LED modul e cont roll er must provide precisely timed signal s at a fast
rate. custom processors are commonl y used rather than just mi croprocessors. FPGAs are
a common choice for impl ementing those custom processor circuits in LED video di -
pl ays. due to several reasons. First. FPGAs are fast enough to support the required scan
rates. Second, the circuit on the FPGAs can be easil y changed, making it possibl e for the
di splay manufacturer to fix bugs in the circui!. and even upgrade the circuit , without
requiring the high cost of creati ng a new ASIC. Third, the di splays themselves are fairly
large. expensive. and consume much power. and therefore the larger size. hi gher cost. and
more power consumpti on of FPGAs compared to ASICs do not impact the overall dis-
play' s size. cos!. and power 100 signifi cant ly.
7.7 CHAPTER SUMMARY
In this chapt er, we discussed (Section 7. 1) the idea thai we must map our circuits to a
physical impl ement ati on so that those circuits can be inserted int o a real system. We
introduced (Secti on 7.2) some technologies that require that a new chip be fabricated to
implement our circuit. Full-custom technology gives the most optimi zed implementati on,
but is expensive and time-consuming 10 design. Semi custom technologies give very good
impl ement ati ons while costing less and taking less time 10 des ign. through the prede-
signing of the gates or cell s that will be used on the IC. We described (Secti on 7.3) the
increasingly popul ar technology of FPGAs. and showed how a circuit could be mapped
onto a set of programmable lookup tabl es and switch matri ces. We hi ghli ghted (Section
7.4) several other technologies, including off-the-shelf SSI/MSI ICs. and programmable
logic devices. We gave some data (Secti on 7.5) showing the relati ve popularity of the
technologies described in the chapter.
An interesting trend in physical impl ementati on is the trend toward programmable
ICs (FPGAs in parti cul ar). Impl ementing functi onalit y on an FPGA involves the task of
7.8 Exercises .j 17
downl oading a bitstrcal1l into the FPGA I devi ce. One might notice the , imi larity of tlwt
task wllh the k .01' implement ing functionali ty on a mi croprocc>sor. which also involve,
downioadlllg bus Int o an Ie devi ce. Thu,. Ihe diff Tence between softw:lfe on a micropro-
cessor and cu, tom di gitnl ci rcuits continue, to be blurred "peciall y when one can. iders
that modern FPGA., can al,o incl ude one or several l1l icroprocc;.sor. within the salli e I .
For more IIlfOrmall on n the blurri ng. ;.ec "The Soft ening of Il ardware." F. Vahi d. I EEE
COlllplll er. April 200
7.8 EXERCISES
SECTJOI 7.2: I C TECII NOt.OGI ES
7. 1 Explain why gal e nrray I technology ha ... II :-. honcr prmJucliulI l i me thull fulJ.CU... lol1l I
lechnology.
7.2 why Ihe u,c of NAND or lOR gale' illu CMOS gmc ttrray clfell i l
IYPlc:lil y preferred o\er all AI O/OR! OT 1ll1pl cIl1Cni al lUil 01 :t 1: 11"CUll .
7.3 Omw OJ gale Ie havi."g three rows. the firM row IHl vi ng f( ur 2-illplil AND sal e ... . lhc
: OW 2-lllpli l OR gale... , and Ihe third row huving lour OT gale!'. . Show
how to IIlSla n li atc 10 the gate array to implcmclIl the lum:li on F (d b e) - ab +
a' b ' c ' . . . C
7.4 A" ulnc it cell library h,n ... ;1 2: inpul A D gUle. n 2-inpul OR gale, und a NOr gUIC.
Usc ;:1 dr;:I\\.I.11l g 10 , how how 10. and place M:mdard On ;:IIl I :.t nd wire Ihem
,to I mplement the f unction 111 Exercise 7.3. Draw your thl.! ... Ihe
111 Exercl!)c 7.3. and be your row ... ;:Ire of equal ... i/ c.
Gs 7.5 Draw II gal e ;:1"::1)' Ie Iwvi.ng three rows, Ihe fi rst row having four 2-illpUI AND gales. lhe
sccond 2-lIl pul OR galC!), unci the third row havi ng fOllr NOT gUl es, Show
how to Illsl anll al c wi res t Ihe galc arra), 10 implcmclll the equtlliull F (a , b , c .d )
a ' b + cd + c ' .
7.6 As ume " "Iandard cell li brary ha, a A D gate. a 2-il1l>ut OR gate. and a NOT gate.
Usc a dr:)\vlIlg 10 how 10 II1:, wnlmLc and pl ace ... tandard cell ... on an I and wire Ihem
IOgcther to implcmcnl Ihc runction in Excrci l"c 7 .5. Be ... urc to draw yuur cell , Ihe '3me 'lizc a!o.
the in Excrci ... e 7.5. and be "' lIre your rows ure of equal ') i /c. . -
7.7 Consider the implcmenlali ons or a Imlf-addcr wil h a gale array in Figure 7.4 ;:lI1d with Mundard
cells in Fi gure 7.6. A" ul1w each gate or cell (including inverte",) I"" a delay of I n". Abo
assume th?t every of wire each II1ch .111 your draw.ing. nOl an an (lcili al I ) a delay
of 3 11 ... (wire:, arc re lall vcly 111 the Cr.l of li ll y fal,l lran"' I'o tor\ ). the delay of the gate
array and the standard cell ci rcui ts.
Gs 7.8 For your soluti ?ns to Exerci"es 7.3 and 7.4. ' " ume that cuch gate and cell ha, a delay of Ins.
and Ih;'1( evcry Inch .of wire ( for ench II1ch 111 your dmwlIlg. not on an actuul Ie ) corresponds to
n del ay of 3 ns. lhe of the gme array and cell ci rcuits.
7.9 Draw a circui t using AND. OR. and OT gates for the foll owing equati on: F ( a . b . c) _
P L US a ' be + a be ' . Pl ace inversion bubbles on that circuit to conVe rt the circuit to:
(a) NAND gates onl y.
(b) OR gate, onl y.
7.10 Draw a ci,,;ui t AND; OR. and OT gates for the foll owing equati on: F ( a . b . c) _
a be + a + b + e , Pl ace II1verslon bubbl es on that ci rcuit to con\'ert the circuj t to:
(a) NA D gates onl y.
(b) NOR gates onl y.
r
.US Physical Implementation
7. t I Ora\\' circuit using AND. OR. and NOT gales for the following equation: F ( a . b. C) ""
(a b + c) (a ' + d) + c '. Convert the circuit (0 a circuit using:
(a) lAND gates onl y.
(b) NOR gnles only.
7.12 Draw a circuit usi ng A 'D. OR. and NOT gales for the following equati on: F ( w . x . y . z) ""
(\'1 + x) (y + z) + \'/Y + X Z. Convert the circuit to a circuit usi ng:
(a) NAND gates onl y.
(b) NOR gates onl y.
7.13 Draw:l ci rcuit Ll silH! AND. OR. and NOT gates for the following equat ion: F ( a , b , C , d) =-
(a b) (b ' + c) (a I d + c ' ). Convert the circuit to a circuit using:
(a) NAND gates onl y.
(b) NOR gates only.
7.14 Create J template for convening a 3-inpul AND gate to a ci rcuit using only 3-input NAND
gates.
7.15 Create a template for converting a 3-inpul OR gale to a circui lllsing onl y 3-input NAND gates.
7.16 Create:1 Icmplale for converting a NOT gate to a ci rcuit using onl y 3-input NAND gates.
7.17 Assume a standard cell library consisting of 2-input and 3- input NAI D gates with a delay of
I ns each. 2-input and 3-i nput A D and OR gates with a delay of 1.8 ns each. and a NOT gate
wi th a delay of I ns. Compare the number of transistors and the delay of an implementation
using onl y ANDIOR! OT gates with an implementation using onl y NAND gates for the func-
tion: F ( a . b , c) =a b ' c + a ' b. For calculating the size of an implementation. assume each
gate input requires IWO transistors.
7.18 Assume a standard cell library consisting of2-input AND and OR gates wit h a delay of Ins
each. 3-input AND and OR gates wi th a delay of 1.5 ns each, and a NOT gate wi th a delay of
I ns. Compare the number of transistors and the delay of an implementati on using only
2- input AND/OR gates and NOT gates with an impl ementation using onl y 3-input AND/OR
gates and NOT gates for the functi on: F (a , b , c): a be + a ' b ' e + a' b ' e'. For cal-
culating the size of an implementation. assume each gate input requires two transistors.
7.19 Assume a standard cell library consisting of 2-i nput NAND and NOR gates with a delay of
I ns each. and 3-i nput NAND and NOR gates with a delay of 1.5 ns each. Compare the
number of transistors and the delay of an implementati on using only 2-input ANDINOR
gates with an implementation using only 3-input NANDINOR gates for the function:
F ( a , b , C): a ' be + a b' e + a be ' . For calculating the size of an implementation.
each gate input requires two transistors.
SECTIO ' 7.3: PROGRAMMABLE I C TECHNOLOGY-FPGA
7.21) Show how to implement on a 3-input 2-output lookup tabl e the function F (a , b , c) : a +
be .
7.2 1 Show how 10 implement on tWO 3-input 2-output lookup tables the function F (a , b , e ,d ) : ab
+ cd. you can connect the lookup tables in a custom manner (i.e.. do not use a switch
matri x. directly connect your wire ).
7.22 Show how to impl emcllI on Iwo
3-input 2-output lookup tables the fo l-
lowing function: F (a . b . e , d)
Bx2 Mem.
a ' bd + b' cd ' . Assumc the two
lookup tables arc connccted in the
manner shown in Fi gure 7.47. You
may not need to use every lookup -... a2
table Output. -... al
7.23 Show how to implcment on two 3-inpul -"" ao
2-output lookup tables the following
functions: F (x , y . z) : x ' y + d1
xyz ' and G(w , x . y , l) : w' x ' y '-----"r---';
7.B Exercises 419
Bx2 Mem.
a2
a1
d1 dO
+ vi' xy Z ' . Assume the two lookup
tables are connected in the manner
shown in Figure 7.47.
Figure 7.47 1\"0 3-input 2-output lookup tables
Impl emented using 8x2 memory.
7.24 Show how to impl ement on two
3-input 2-ouput lookup tables the following functions: F ( a , b . e . d ) - a be + d and
G "" a'. implement both F and G with only two lOOkup tables connected in the
manner shown In Figure 7.47.
7.25 a 2-bi t comparator that compares two 2-bi l numbers and has three outputS indi-
cating greater-than, less-Ihan, and equal-to, using any number or 3-input 2-output loo"-.'1Ip
tables and Custom connections among the lookup tables.
7.26 Show how to implement" 4-bi t carry-ripple adder using any number of 3-input 2-input
tables and cll stom connectIOns among the lookup tables. Hint: map one full-adder to each
lookup table.
7.27 Show how to implement a 4-bi t carry-rippl e adder using any number of 4-input t -oUtpUl
lookup tables and custom connecti ons among the lookup tables.
7.28 Show how to implement [1 comparator that compares two 8-bit numbers and has a sin21e
equal-to output . using any number of 4-inpul I-output lookup tables and custom onnecti;ns
among the lookup tables.
7.29 the bi t file necessary to program the FPGA fabri c in Fi gure 7.29 to implement the func-
ti on F (a , b, e ,d ) = a b + cd. where a. b. e. and d are external inputs.
7.30 Show the bit file necessary to program the FPGA fabric in Fi gure 7.29 to implement the func-
Iton F (a , b. e. d) : abed. where a. b. e. and d are external inputs.
7.31 Show the bit file necessary to program the FPGA fabric in Fi gure 7.29 to implement the func-
tion F (a , b, e, d) = a' b' + e ' d. where a. b. e. and d are external inputs.
SECTION 7.4: OTHER TECHNOLOGIES
7.32 Use any combination of 7400 ICs li sted in Tabl e 7. 1 to implement the function F (a . b , e . d)
: a b + cd.
7.33 Use any combinati on of74oo ICs li sted in Table 7. 1 to implement the functi on F( a , b . e. d )
= abc + ab'e' + a' bd + a'b ' d ' .
7.34 By drawing XS on the circuit. program the PLD of Figure 7.38(0) to implement a full-adder.
7.35 By drawing Xs on the circuit. program the PLD of Figure 7.38(a) to implement a
equality comparator. Assume the PLD has an addi tional 14 input.
7,36 *(a)Design a PLD device capable of supporting a 1-bit carry-ripple adder. B, drn\\ing s on
your PLD circuit. program the PLD to implement the 1-bit arT) -ripple adder -
J
.UO Physical Implementation
(b) Using a CPLD device consist ing of several PLDs frolll Fi gure 7.38 and <.Issu ming you can
connect the PLDs in a cli stom manner. impJcmcllI the 2-bit c:1rry-ripple adder by drawing
Xs on Ihe PLDs.
(c) Compare Ihe size of your PLD and Ihe CPLD by delermining Ihe gales required for bolh
designs (make sure you compare the number of gales within the PLD and CPLD and not
the number of gates used for your impl ementati on).
SECTION 7.5: IC n:CHNOLOGY COMPARISONS
7.37 For each of the system constraims below, choose the 1110S1 appropriate technology from among
FPGA. standard cell. and full-custom Ie technologies for implementing a given circui t. Justify
your answers.
(n) The system must exist as a physical prolOlype by next week.
(b) The system should be as small and low-power as poss ible. Short design time and low cost
are /lat priorities.
(c) The system should be reprogram mabi e even after the final product has been produced.
(d) The sy lem should be as fasl as possible and should consume as lillie power as possible.
subj ect to being completely implemented in just a few months.
(e) Only five copies of the syslem will be produced and we have no more Ihan S 1000 10 spend
on all Ihe ICs.
7.38 \Vhi ch of the following implementations are "at possible? (I) A custom processor on an
FPGA. (2) A cuslom processor on an ASIC. (3) A cuSlom processor on a full,cuslom IC. (4)
A programmable processor on an FPGA. (5) A programmable processor on an ASIC. (6) A
programmable processor on a full -custom Ie. Explain your answer.
Programmable
Processors
8.1 INTRODUCTION
Seat belt
warning lighl
singlepurpose
processor
Digital circuits des d ..c' .
r h Igne to pell ornl a slJ1gle processlJ1g ta k. such as a seat belt warning
Ig t. a pacemaker, or an FIR fi lter, are indeed a very common cia s of digital circuits. We
;Ighl refer to a circuit perfornling a single processing task as a sil/gle-purpose processor.
IJ1glepurpose processors represent a class of di gital circuits enabling tremendously fast
or powerefficlent compulation. However, another class of digital circuits. known as pro-
grammabl e processors, is also extremely popular, as well as being more widely known.
The programmabl e processor is largely responsible for the computing revolution that has
taken place in the past several decades. leadi ng to what many call the infonnation age. A
programmable processor. also known as a gel/eralpurpose processor, is a digital circuit
whose panl cular process ing task is stored in a memory. rather than being built into the
CirCUit It self. The representat ion of that processi ng task in the memory is I... 11own as a pro-
gram. Figure 8. 1 illustrates singlepurpose versus general-purpose processors. We could
creale a custom digi tal circuit for a seat belt warning light system (Chapter 2) or an FIR
filter system (Chapter 5). or instead we could program a general-purpose processor circuit
10 Implement those systems.
3'lap FIR Ii Iter
singlepurpose processor
3'lap FIR filter
program
Other
programs
Figure 8.1 Single.purpose versus geneml'purpose processors.
Generalpurpose processor
-'22 Programmable Processors
Some programmablc processors. like thc well-known Intel PCllIium processor or
Sun', Spare proce"or. are illl ended for use in dcsktop computers. Other programmable
proces,ors. ARM. MIPS. 805 1. and PIC processors (whi ch arc widely known in the
design community but kss known by the general public). arc illlendcd for embedded sys
tems. like cellul ar telephones. automobil es. video games. or even tenni s shoes with
blinki ng lights. Some programmable processors. like the PowerPC. arc intended for both
de -ktop and cmbedded domai ns.
A benefi t of a programmabl e processor is that its circuit can be mass-produced and
then programmed to do almo. t anything. Thus. the same programmable desktop pro-
cessor can fun \Vindows 98. \ Vindows XP. Linux. or whatever new operating system
program comes aboll l. Likewise. that same processor can run appl icati on programs like
word processors. spreadsheets. video games. web browsers. ctc. Furthermore, the same
programmable cmbedded processor can be used in a cel l phone. aut omobil e. video game.
or tenni, shoe by programming the processor for the desired processing task. Mass-pro-
duction result s in low costs due to amorti zat ion of design costs (sec '"Why such cheap
calculators"" in Chapter -' for a discussion of amorti zati on).
Of course. because programmable processors arc ma s- produced and then used for a
wide \'ariety of appli cations. there aren't as many unique programmable processor
designs as there arc single-purpo e processor designs. It foll ows then that there are far
fewer programmable processor designers than there arc single-purpose processor
designer. evertheless. even though you may never design a programmable processor as
part of your job. it i interesting and enlightening to understand how such a program-
mable processor works. Some people argue that people who understand how a processor
works are even bett er software programmers. And technology trends have led to the si tu-
ation of designers being able to create semi custom processors ("appl ication-specific"
processors) that have ju t the right archi tecture for one or a mall number of applications,
making knowledge of programmable processor designs important. Finally, there are
indeed people who do de ign programmable processor architectures. and you never knolV
if you might end up being one of them.
In thi s chapter. we show how to design a simple programmable proces or using our
prc\iously-described digital design method Our purpose is mainl y to demystify these
and to provide an intuiti on of how programmable proce sors work. We point out
that real mass-produced proces ors are designcd using different methods. and their designs
can be much morc complex than the de ign described in thi s chapter- learning about those
proce. so,,' designs is the subject of many textbook. on computcr architecture.
8.2 BASIC ARCHITECTURE
Basic Datapath
A programmable prOce%or consist of two main parts: a datapath and a cont rol unit.
We' ll provide a general imrodu ti on to those two parts in thi , ,ccti on. then we'll provide
a more detailed look at tho,c parts in a subsequent sccti on.
We can view procc"lI1g generally as:
Lnlldllll( data. meanll1g reading the data on whIch we ",i,h to work from some
Input locution,.
8.2 Basic Architecture 423
Trall sforming that data. meaning perfomling some computati ons with that data
that result 111 new data. and
Storillg the new data. meaning writing the new data to some output locations.
h example. a SCat belt warning sy tern reads bit data from sensors representing
w et er a seat belt is fastened and whether a person is sitting in a eal. transforms that
data by comput1l1g a new bit indi cati ng whether to tum on a warning li oht. and writes that
new data to a warning I" h A FIR fi . e
'. Ig t. n Iter read data represenllng the most recent set of
Input SIgnal sampl es. tran forms that data by performing multipli es and add. and writes
new data to an output representing the filt ered signal.
A data memory holds all the data that a program-
mable processor can access. as input data or output
data-for now. assume the word in that data memory
are somehow connected to the outside world (e.g .. to
the seat belt sensors or to the FIR input and output
SIgnals). To process that data. a programmable pro-
cessor needs to be able to load data from data
memory into one of several registers (typi call y a reo-
ister file) within the processor. need to be able
feed data from some subset of regi sters throuoh func-
tional units that can perfoml all ;os ible
trallsformatioll operations (typicall y an ALU) we
might consider wit h results stored back into a register.
and needs to be able to srore data from a regi ster back
int o data memory. Therefore. we ee the need for a
programmable processor to include the basic circuit
shown in Figure 8.2. showing a data memory. regi ster
fil e. and ALU. That circuit is known as the program-
mable processor s datapat" . The basic datapath
shown in Fi gure 8.2 can perform the following po _
sible datapat" operatiolls in a given clock cycle:
somehow
connected
to the
outside
world
Figure 8.2 Basic datapalh of a
programmable processor.
Load operatioll: Thi s operation loads (reads) data from anv location in the data
memory into any register in the regi ster file. A load ope';tion is illustrated in
Figure 8.3(a).
ALU operatioll : This operation tran forms register data by p sing am two fegi --
ter through the ALU configured for any of the ALU' supponed ",;"d
back 111t O any regt ster of the regIster file. An LU opcrmi n is illustrated in Figure
8.3(b). Typical ALU include addition. subtraction. logical A..;U.
logical OR. etc. -
Store operatiol1: This operJtion stores (write) dara from regi -ter in the regi.ter
fi Ie to an)' data 111e1110ry location. A store opemtion is illustrated in Figure '.3( ).
These possible datapath.operations are in Figure .:. E:tch ,uch opcnti n
requires the appropnaw setllng ot the c?ntrOl I11putS f the uara mem 1\ . I11U'.
file. and L - those control 111pUtS wlil be sho\\n For n(\\. Just familiarize
Programmable Processors
EXAMPLE 8.1
, ourself \\'ilh Ihe basic dalapalh's abililies. NOlice Ihm Ihe dmapalh in Figure 8.2 cannol
direcll Y oper:lI e on dala memory locmions wil h Ihe ALU in one clock cycle. because lhe
dOIa n1l,,1 firsl he read inl o Ihe regisler file. which il self requires a clock cycle, before lhe
dala can be opcralcd on by Ihe ALU. A dalapalh Ihm requires all dala 10 firsl pass through
Ihe regiSicr lik before Ihal d:lIa can be Iransformed by Ihe ALU is known as a load-store
architecture .
(a) (b) (e)
Figure 8.3 B"ie dalapalh operali ons: (a) load (read). (b) ALU opcr:lIion (transform). and
(c) "ore (wri le).
Understanding data path operations
Which of Ihe following are valid single-clock-cycle dalapalh operalions for the datapalh of
Figure 8.l?
I. Copy daw from a data memory location inlo a regiMer file locali on.
2. Read dala from two d:U3 memory locati ons into IWO rcgi,(cr file locali ons.
3. Add data rrom IWO data memory locat ion... and l) IOre the result in a register fi le location.
Copy dma from one regi<ler fil e locolion 10 anol her regi"er fil e localion.
5. Subtract data in a rcgi'lcr file localion rrom a d::!la memory loc:lIion. storing the result in a
register fi le location.
( I) " a valid operali on. "nown a, a load opermi on. (2) is 1101 1I valid operali n. We cannol read more
Ihan one daw memo!) local ion during a dmapmh operalion (for Ihi s dUlaplllh). and we cannOI wrile
10 more th:.m one regi\ lcr fi le locati on during:1O operati on. (3) b ,w/ :1 valid operati on. Not only can
we nm rCild from two data memory during onc opt:rution. but wc cannot reed the read
,alue, dlreclly inlo Ihe AL 10 perform Ihe add- we mu" fi rsl perronn opemlion Ih.1 read Ihe
duta lIem; II1tO reg"ler hie lac. lion;. (4) is. v" lid oper,"ion. We can configure Ihe ALU operalion
10 'Imply pa" one of ," "'PUb Ihrough to .he QUIPUI (perhap' by adding 0) and slore Ihe re,uil in
Ihe reg"l hie. (5) "/lor a valid operal. on. We cannOI feed a read duw memory lacOlion directly 10
Ihe AL -Ihere " no such co"neeli on .n Ihe d.II.'pmh Vulue; read fr m dala memory mu I be
loaded InlO Ihe hie hr.1
8.2 Basic Architecture
425
Basic Control Unit
Suppose we walll 10 use Ihe basic datapath of Figure 8.2 LO perfonn the simple processing
lask of addmg dala memory local ions 0 and I logelher, and wriling the resull in data memory
9-m olher word, we WamlO compule 0[9} = O[OJ + O[ I J. We can achieve this
processmg lask by "inslrucling" the dalapath lO perfonn the following operations:
load datapalh memory localion 0 LO regisler file regi sler RO (i.e. , RF[OJ = O{O/),
load daLapmh memory lOCation I lO regisler file regi sler R I (i.e. , RF{ I) = O{ I /),
perform an ALU operati on lhat adds RO and RI and wriles the resuil back into R2
(i.e., RF{2/ = RF[O/ + RF{ I J), and
Slore R2 into data memory localion 9 (i.e., 0[9) = RF[2J).
NOle lhal we could have used any regislers in the regisler file, rather than RO, RI. and R2.
If 0 [0/ contained Ihe value 99 (in binary, of course), and O[ I J contained the value
102, lhen afler carryi ng oUllhe above operalions, 0[9J would cOnlain 201.
You mighl lhink lhal having 10 instrucl the datapath lO perfonn four distinct opera-
lions is a rather cumbersome way of adding IwO dala items. If you could build your own
CUSlom digilal ci rcuillO implemenl 0[9/ = 0[01 + O[ I }. you would likely just feed OlOI
and 0[1 J lhrough an adder whose ompul you would conneCl LO 0[9 J, thus avoiding Ibe
four operali ons involving the regisler fil e and ALU. We see the basic tradeoff of single-
purpose versus programmable processors-programmable processors have the drawback
of compulali on overhead because Ihey have to be general, but they provide the benefits of
a mass- produced processor lhal can be programmed lO do almo l anything.
Somehow we need 10 descri be the sequence of operations-RF[OJ = OlOJ, Iben
RF{ I }=O{ I}, lhen RF{2/ = RF{O/ + RF{ I}, then 0 [9J = RF{2/-that we desire LO execute
on the dalapalh. Such a description of desired processor operations are known as instruc-
t i OIl S, and a colleclion of instruclions is known as a program. We will tore Ibe desired
program as words in anOlher
memory. cail ed the ins/rue/ioll
memory. We'll describe how 10
represenl lhose instructions Ialer.
For now. assume lhat the four
instruclions are somehow slored
in locations O. I. 2. and 3 of Ihe
inSlmcti on memory I . as shown in
Figure S.4.
ow is where the comrol
unil plays a role. The cOllfrol
lill i/ reads "'Ich insll1J clion from
insll1Jcli on memory. and lhen
execules lhm inslmclion on the
dalapalh. To execule our simple
program. the conlrol unil would
begin by perfonning Ihe fol-
lowing lasks. known us stages. to
arry OUI Ihe firsl insinl lion:
Instruction memory I
0: RFIO]=DIO]
1: RFlll=Dll]
2: RFI21=RFI01+RF(1]
3: DI9]=RFj2l
Control unit
Figure 8.4 Tho control unil in 3 programmable p
'16 Programmable Processors
l. Fetch: The control unit would stan by reading l i D} into a local register, a task
known a, fetching. Thi s stage requ ires one clock cycle.
1. Decode: The control unit would then det ermine what opcrati on thi s instruction is
requc,ting. a task known as decoding. Thi s stage also requires one clock cycle.
3. Exewte: Seeing that thi s inslructi on re luests the datapath operation RFIO} =
010}. the cOlll rol unit would set the cOlllrol lines of the dalapath to read DIO},
pas; the read data through the 2x I mux in front of the register fi le. and write that
da ta int o RIO}. The task of carrying out the operation is known as exeClilillg. Most
operations arc datapath opermions (such as a load operat ion. ALU operati on, or
slOre operati on). but not all operati ons require thc datapal h (an example is the
jump instructi on to be discussed later). Thi s stage requires one clock cycl e.
Thus. the basic stages the control unit carries out for thm first instructi on are: fetch,
decode. and exeellle. requiring three clock cycles to compl ete just thm first instruction.
The local register in whi ch the control unit IOres the fetched instruction is known as
the illstructioll register. or fR. a shown in Figure 8"+. NOIi ce thm the cOl1l rol unit needs
10 keep track of the locati on in instruction memory from which to fetch the next instruc-
tion. Since the instruction locations are usuall y in sequence. we can use a simpl e up-
counter 10 keep track of the currelll program instructi on-such a counter is known as a
program COli liter. or PC for shan. The processor stans with PC=O, so the instructi on in
flOJ represents the fir t instructi on of the program.
Figure 8.5 illustrates the three stages of executing the instructi on RFfOJ = DIO}
stored in flO}. Assuming PC was previously initi ali zed 10 O. Figure 8.5(a) shows the first
-- -- ------ ------- -------------------------------1
: ________ __ -------1--------------------------:
I InstructIon memory I :
0: R F[O]=D[O] , _________ - - --- --- - -- __L --- --- ----- - -- -- --- ------,
1: RF[1 ]=D[t] i Inst ruction memOlY I
Conlrolier
______ -l _____
(a)
2. RF[2]=RF[0]+RFP] i 0: RF[O]=D[O]
3: D[9]=RF[2] : 1: RF[l]=D[I]
(b)
: 2: RF[2]=RF[0]+RF[1 ]
j 3: D[9]=RF[2]
.
: L __ __ --=_,, _-=' _ __
(c)
f,gure 85 Three tage, "f p"J<:c",ng nil' on,'ruWnn (a) fetch. (h) decode. (e) e,cnll"
__-_-_J.J
8.2 Basic Architecture 427
stage fetching flOj' s co t h '
h d n ents, t e Instruct ion RFIO}=OIO}. into fR. Figure 8.S(b) shows
tl state decoding the instruction and thus determining that the instruction is a
0" II1struct lon F,oure 8 5{c) h h
. h ." . sows t e controll er executing the in lruclion by confio-
urlng t to read the value of 010} and storino that value into RF/O]. If D/OJ
contall1e 9. then RIO} wi ll cont ain 99 after completi on"of the execute stage.
After proces II1g the instruct ion in IIO}, the control unit would fetch the in lructi on
that IS 111 III J. decode that instruction, and exeCUle that in trucli on {lhus executing
RF/ I} Of I J), requiring another three cycles. Next, the control unit would fetch the
II1strucllon that i in 1{21, decode that instructi on, and execute that instruclion {thus
executing RFI2} = RFIO} + RFlf Jl, requiring anOl her three cycles. Finall y. the control
unit would fetch the instruction that is in 113}, decode tltat in lruction. and execute that
IIl structl on (thus executing 019} = RF/2J), requi ring anolher three cycles. The four
IIl structl ons wou ld require 4*3 = 12 cycles 10 run 10 completion on the programmable
processor.
Tlt e control unit wi ll require a controller.
like those de cribed in Chapter 3. that in thi s
case repeatedly performs the fetch. decode.
and execute steps (after having initialized PC
10 O)-nOle that a controll er appears inside the
control unit in Figure 8.4. An FSM for that
controll er appears in Figure 8.6. Tlt e con-
troller increments the program counter after
fetching each instructi on in state Fetch. so tltat
the next fetch state wi ll fetch the next instruc-
ti on (nOlice tlt at PC gets incremented at the
end of tlt e fetch stage in Figure 8.5(a)). We ll
describe the actions of the Decode and
Execllte states later.
Controller
IR=I[PC]
PC=PC+t
Figure 8.6 Basic controller states.
Thu , the basic pans of the control unit include the program counter Pc. the in auc-
tion regi ster fR, and a controll er. as illustrated in Figure .-I . In previous hapters. our
non programmabl e processors consisted onl y of a controller and a datapath. Notice that
the programmable proce sor instead contains a control unit. which itself consi IS of some
regi sters and a controll er.
To summari ze. the comrol unit processe each instruction in three tages:
l. first fetchillg tlte instructi on by loading the current inslTUction into fR and in ce-
menting the PC for the next fetclt .
2. next decodillg the inslnlcti on to determine its operation. and
3. finall y execlltillg the operati on by setting the appropriate ontrol lines r the data-
path. if applicable. If tlte operation is a datapath operation. the b<!
one of three possi ble type :
(al/oorlillg a data memo I) locmion into a register fi le location.
(bl tmn,rorming <lma using an AL opemtion on register file locations and
writing results back to n register fik I 'ation. or
(el ,fwrillg a regi,ter file loc:nion into a data memOI) ation.
- ---...-- -
-'28 8 Programmable Processors
EXAMPLE 8.2 Creating a simple sequence of instructions
Crt'a{(' a 'l't of In,tmction, ror the.: in Figure MA to compute 0/3/ = 0/0/ + Df 1/ + D12/.
Each in!-tnlclion mu,t f\!prC'Cnl a valid operati on.
\Vc might ... lan \\ ith opl..'J"alions that read the data memory locati ons into register file
location,:
O. R131; 0101
I. RI.JI; 0111
2. Rlcl; 0121
NOlI..' lhat \\(' intcllIionall y arbitrary regi ster IOC;lIi oll s. to make clear thaI we can use any
rcgl'ler:-..
Ne\t. we need to ;Idd the three va lues a nd store the result ill a register fi le locati on. say R/ J j.
In other \lord,. \Ie wanll O perronn ille roll owing opermion: Rill; RI21 + RI 31 + R141. However,
the datapath of Figure.: SA cannot ;Idd three register file locations in a single operati on. but rather
can ani) add 1\\0 location .... Instead. we can describe the desired addit ion comput ati on by dividing
the computation into 1\\ 0 dmapat h operations:
J . Rill; RI21 + RIJI
Rill = Rill + RI.JI
Finall). \Ie ,\[il< Ih< re,uil inlO 0131:
5. Df3I ; Rill
Thus. our program c:onsisls of the six instructions appearing above. whic h we might store in instruc
tion memo) location ... 0 through 5. <411
EXAMPLE 8.3 Evaluating the time to carry out a program
Deh!mline the number of clock required for the processor of Figure 8.4 10 execute the si.x
instruction program of Example 8.2.
The procc"i\or require'\ 3 cycle\ (0 process cach inst ructi on: I cycle to fetch the instruction. I
(0 decode! the fetched and I to execute the At 3 cycles per instruction. the
(olal cycle\ for 6 i\: 6 in\lr * 3 cyc leslinstr = 18
8.3 A THREE-INSTRUCTION PROGRAMMABLE PROCESSOR
A First Instruction Set with Three Instructions
Thc v. ay v.e repre,ent in,tructions in the in!>lruction memory. and the li st of allowable
in,truction,. arc known as a programmable illstrtlctioll set . Let's assume that a
processor uses 16-bi t instructi ons. and that the instructi on mcmory I i 16-bits wide.
In,truclion set, typically a certain number of bit s in the instruction to denote whal
operation to perfonn. The remaining biL, pecify any additi onal infonl1ation needed to
perform the operalion. ,uch a, the source or destination registers. We define a simple, three
In,truction ,el. with the most signifi cant (meani ng leftmost) 4 bi t identifying the appro-
priate operation and the lea,t ,ignificant 12 bi t> containing register fi le and data memory
addre"c,. 'l' fo\low,:
i1Jod In'truclion 000 r
J
r
2
r
l
r
O
d, d6dSd,dJd2dldo: Thi, in; truction specifi c a
move of daw from the memory local ion whose addre<,<, is speci fied by the
nih into the regl,tcr hie rcgi,tcr who,e locati on is specified by
8.3 A Three-Instruction Programmable Processor 429
the bits r)r2rlrO' For exampl e, the instrucLi on "0000 0000 00000000" speci fies
a move of data memory locati on 0, or DIOI. into register file locaLion O. or
other words. that inslruction represenlS the operati on RFfOI ; DfOf.
LikeWi se, "0000 0001 00101010" specifi es RFfll=Df42f. We've inserted
spaces bet ween some bits for ease of reading by you the reader-those spaces
have no other significance and would not exi sl in the insLructi on memory.
Store instruction-OOOI r)r2rlrO d, d6dSd,d)d2dldo: Thi s instrucLi on specifies a
move of data In the opposite directi on as the instructi on above. meaning a move
from the register file to the data memory. So "0001 000000001001 " specifies
DI91;RFIOf.
Add instruction-OOIO ra)ra2ralraO rb)rb1rblrb
o
Thi in truction
speCifies an addition of two register file registers specified by rb
3
rb
1
rb
l
rb
o
and
rc)rc2rclrcO' with lhe result stored in til e register file register specified by
ra)ra1ralraa For example, "0010 0010 0000 0001" specifies the in truction
RFf21;RFIOI+RF{ II. Ole that add is an ALU operation.
None of these instructions modifies the contents of the instructions' ource operands.
In other words. the load instructi on copies the COnlents of the data memory location to the
specified register, but leaves the data memory location itself unchanged. Likewise. the Slore
instruction copies the pecified register to data memory. but leaves the register' contents
unchanged. The add instructi on reads its band c registers without changing them.
Using thi s instruction
set. we would describe our
earli er program that com-
putes Df91;DfOI+Df I f as
shown in Fi gure 8.7.
Not ice that the first
four bi ts of each instrucLion
are a binary code til at indi-
cates the instructi on's
operation. Those bit are
known as the instructi on's
operation code. or opcode
for shan. "0000" means a
move from data memory to
register file. "0001" means
a move from regi ster fi Ie to
dat a mcmory, and "0010"
means an add of two regis-
ters. bascd on the
instruction set defined in the
bullcted li st above. The
remaining bits of the
in tnlcti on represent oper-
allds. whi ch indie,lle what
dma to operate on.
Desired program
0: RF[O]=D[OI
1: RF[l]=D[l]
2: RF[2]=RF[0]+RF[1]
3: D[9]=RF[2]
Instruction memory I
0: 0000 0000 00000000
1: 0000 0001 00000001
2: 0010 0010 0000 0001
3: 0001 0010 OOOOtOOl
Computes
0191= 0101+0111
Figure 8.7 pn.lgram illal ,-ompUl<' D['I);D[O]+D[II.
u:;ing a 2h"en instrul'li n set. IOsened
bet\,"'t'eo"'lhC: instRu:tion memof) 's bits for
donOt 11\ tht'
----
no Programmable Processors
0: 0000000000000101 /I RF[OJ 0[5J
1: 0000 000100000110 /I RF[1J 0[6J
2: 0000 0010 00000111 /I RF[2J 0[7J
3: 0010 0000 0000 0001 /1 RF[OJ RF[OJ + RF[1J
/I which is 0[5J+0[6J
4: 0010000000000010 /1 RF[OJ RF[OJ + RF[2J
/I now 0[5J+0[6J+0[7J
5: 0001 0000 00000 1 01 /I 0[5J RF[OJ
Figure B.8 f\ program to compul e
D/5/=D/5/+D/6/+D/ 7/ li sing the three-
instruction instructi on sel.
We could \\ ritc a different program
the lhrec- in:' lfucti on instruc-
lion set. For example. we could write a
program that compute, DI51 = D[51 +
D16} + DI7f. We mu,t perfonll that
computati on lI sing instructi ons chosen
fr0111 the three-instruction instructi on
se t. \ <\Ie might \\ rit e the program as
sho\\ n in Figure 8.8. The number before
the colon represents the instruct ion' s
addrr" in the instruction memory I .
The text foll owing the two forward
slnshes (1/) represe nt comments. and are
not part of the instructi ons.
Ole how that program ultimately comput es the de ired sum. Thi s mi ght be the first
time that lOU have had to think of computati ons in terms of low- level programmable pro-
ces,or instructions. Think.ing in terms of such regi ster- level operati ons can be diffi cult at
firs!. but become easier as you see and develop morc programs at that level.
:\Iachine Code versus Assembl y Code
As you have seen. the instructions of a program exist in instructi on memory as as and Is. A
program represented as a s and Is is known as machille code. Writing and readi ng programs
represented as Os and I s are tasks that humans are not pani cularly good at . We humans can' t
understand those Os and Is easil y. and thus will li kely make pl ent y of mi stakes when writing
such programs. Thus. earl y computer programmers developed a tool , known as an assem
bier (which itself is just another program). to help humans write other programs. An
as embler all ows u to write inst ructions using IIl1l emoll ics. or symbols. that the assembler
automaticall y translates to machine code. Thus. an as embler may tell us that we can wri te
instrUction, from our three-instruction instructi on et using the foll owing mnemonics:
Load instructi on-I\(OY Ra. d: pecifie the operati on RFlaj=Dldf. a must be 0.
I ..... or 15-so RO means RF/Oj. R I means RFII f. etc. d must be O. I ... .. 255.
Store in,tntcti on-MOY d. Ra: specifi cs the operati on Dld}=RF/af.
Add instruction-ADD Ra, Rb, Rc: specifics the operati on RF[a}=RFlbl
+RFl cf.
COMPUTERS WITH BUNKING LIGHTS.
Big computer., shown in the mo\ies often have many
ro'" of ,mall bltnking light,. In the carly day, of
compuung. computer programmer; progrummed u"ng
machine code. and they cntered that code tnto the
tn,trucUon memory by nipptng "''' tehe, up and down
to repre",nt 0, and h To enahle dcbuggtng of the
program. a., "'ell "' to ,how the computed data. tho",
earl y compute" u\Cd row, 01 Itghh-on Itght' meant
1,. off li ght' meant 0,. Today. nobody in their ri ght
mind would try writing or debugging a program by
u,ing machine code. So computer, today look like big
boxes-with no row, of li ght, . But big plain boxes
don't make for in movies, so
movie make" continue to U\C rnO\, II.: prop, wi th lots or
bhnl.ing It ght' tu rcpre,ent computc,,- lights that IlrC
u clc"", bUI cnh;r1i.H11 IIlg.
Turning on a personal com pUler causes the operating
system to load, a process known as "booting" the
computer. The computer executes instructions
beginning at address O. which usuall y has an
instructi on that jumps to a built-in small program that
loads the operating system (the small program is often
call ed the basic input/output system, or BIOS). Most
computing dictionaries Slate that the term "boot"
ori ginates from the popular expression "to pull oneself
up by one's bootstraps." which means to pick yourself
up wi thout any help. though obviously you can' t do
thi s by grabbing onto your own boot traps and
pulling-hence the cleverness of the expression. Since
the computer loads its own operating system. the
computer is in a sense pi cking itself up without any
help. The term bootstrap eventuall y got shortened to
boot. A colleague of mine who has been around
8.3 A Threelnstruction Programmable Processor 431
computing a long time claims a different origin. One
way of loading a program inro the instruction memory
of earl y computers was to create a ribbon with rows of
holes. Each row might have enough room for say 16
holes, thus each row would represent a 16-bi t machine
instruction-a hole meant a O. no hole a 1 (or vice
versa). A programmer would punch holes in the ribbon
to store the program on the ribbon (using a special
hole-punching machine). and then feed the ribbon into
a compUlcr's ribbon reader. which would read the rows
of Os and 1 s and load those Os and 1 s ima the
computer" instruclion memory. Those ribbons might
have been several feet long. and looked a lot like the
straps of a boot. hence the term bootstrap. hortened to
boot. Whichever is the actual origin. we can be fairly
sure the term "boot" comes from the bootstraps on the
boots we wear on our feet.
Using those mnemoni cs. we could rewrite the program D{9}=D[O}+DII} as follows:
0: MOY RO, 0
I: MOY RI. I
2: ADD R2, RO. R I
3: MOY 9. R2
That program is much easier to understand than the Os and Is in Figure .7. A
program wri uen using mnemoni cs that wi ll be transl ated to machine code by an as ern-
bier is known as assembly code. Hardly anybody writes machine code direclly these day.
An assembler would automaticall y translate the above assembl y program to the mac rune
code shown in Fi gure 8.7.
You mi ght be wondering how the assembler can di stinguish between the load and
store in truction above, when the mnemonic for both instrU tions i the ."
The assembler di stingui shes those two types of instruction by looking at the first char-
acter after the mnemonic "MOY"-if the first character i an "R." then that operand i a
register, and thus thut instructi on must be a load instrUction.
Control Unit and Datapath for the Three-Instruction Processor
From the definition of the three-instruction instrUction set and an und rs!anding of the
basic Oturol unit and dat apath archi tecture f a programmable proces' or as -ho\\ n in
Fi gure .-1 . we can desi gn a complete digit al circuit for a three-instru lion progrnmmable
processor. The de ' ign process is actuall y vet') si mil ar to the RTL de 'ign proces i
haptcr - .
432 Programmable Processors
We begin wilh a hi gh
level Slale machine descriplion
of the syslcm. shown in Figure
8.9. Assume Ihal 01' i, , hon
hand ror IRI I S .. / 2/. meaning
Ihc leflmosl four bil ;' of Ihe
instruction Likewi se.
a,sume Ihal ra i, . hOrlhand for
IRIII .. 81. rl> i, shonhnnd for
I RI 7 .. .JI. rc is , horlhand for
IRI 3..01. and d i, , hOrl hand for
IR/ 7 .. 0/.
High- level Slalt! maclllne dc... riplion of::t
thrcc! -in tnlction progmll1mablc procc,,",or.
Recall Ihal Ihe nexi SICP in
Ihe RTL design process was 10
crellle Ihc dalapalh. We already
erealcd Ihe daaapaah in Figure
.4. whi ch we refine lO , how every cOnlrol ; ignal from the comrollcr. a. hown in Figure
8. 10. The relined dalapath ha, comrol signal s for cach read and wri lc pon of Ihe regi ler file
(sec Chapler 4 for informaaion on regi sler file;). Thc regisler fil e has 16 regislers because
Ihe inslrll eli ons have only 4 bils wi lh whi ch 10 address rcgisle",. The dmapmh has a conlrol
signal 10 Ihe ALU call ed a 1 u_s O-we' lI assume Ihc ,i mpl c AL adds ils inpul when
a 1 u_s O- I. and jusl passes inpul A whcn a 1 u 50=0. The cialapalh has a ,c1eellinc for lhe
Figure 8.10 Refi ned dalapalh and control unil for the threeinslruction processor.
83 A Threelnslrucllon Progrommoblo Pr ocossor 433
2x I mux in front of Ihe re' ,. . .
conlrol " I ' gl Icr hie , 'HHe dm.1 pon. hllall). \I . II l1\e uhu includ '.1 Ihe
. Igna , lor Ille dOlJ m' I h
I
\\ II \\C J"UIIlC ha, .1 'llIg.!.: addr'" pon. tlml 'nn
IIU suppon onl) a read or h
256 d
. a \I rile. III IIlJ1 rolll '1IlIllil.llleou,1 . Tile <lain 111'111<")' hlh
wor s. Mn e the ,"'Inl I h '
Th
II on) hit, \\llh \\ )l1eh to .Iddu:" Ih ' lInlU I11Cll1Ur .
e dalapmh " no\\ ahle 10 .. II ' ..
. h . eMl) mit 01 Iht: lo;td/ ... tul"\: 0pCI.IIHHl , lI ml ullt hnH.: II C
operall on'l at \I e need lor Ihe lugh lc\d '1U1C maellonc 1'1<"" FI gure H.'! . Tllu, . we ell ll
proceed 10 Ihe IIHrd , Iep of Ih . R'rl d
. c . c'lgn prllCC" ul connertlll!! Ihe Jawpal!! Wllh II con
Iroller. FIgure . 10,11 '" Ihl"e COlllleCIII"". (h \l ell,,, Ille e<llU1CeIl0l" " I III ' cnulfllll'r
10 the PC and IR reui'IC'" III II . I
eo ' I e l:Olltro lllll! . ;I IlU 1011)(.' 11I,IIII C1HHl IlI CIII{HY I .
n,e 10.'1 ,Iep of Ihe RTf.
-AF{11tr-otdl
o
o rd. 1
AF s. 1
RF Waddr. r.
AF W wra l
D{dl RFtr'l-
o oddr d
o wr I
AF 6 X
AF Ap oddr ,ro
RF Rp rd I
PG-o-I
Inc- 1
- AFjreJ-
RF Rp oddrerb
RF Rp
RF 9- 0
RF Rq addrerc
RF Rq
RF W addr. ra
RF W wr. l
alu sOlZ l
dc;ign procc" b 10 dem e
Ihe coni roller"> FS I. \ e
can do Ihi, 'imighlforwurdl)
b replacing the high Ie, cI
aClion of the :...HlIC machi ne
in Figure 8.9 by BIKlle:m
opermion, on Ihe Con
lroller', inpul and oUIPUI
lines. 3; ,hown in Figure
8. 11 . (Remember Ihal Of'. d,
ra. rb. and rt: arc , honhancl
nOlali on, for IRII S .. 121.
IRI 7 .. 01. IRI 11.. 81. IRI 7 .. -l I.
and IRIJ..OI . re'peclively.)
We could Ihen fini,h Ihe
controll er's design by con
veni ng the FSM 10 a ,laiC
regiMer and combi nali onnl
logic, using Ihe mClhod,
rrom Chaplcr 3.
Figure 8.11 .. M for fhe prucc" ur\ confroll cr.
We would have Ihu;, de,igned a programnwblc proce"or.
leI's trace lhrough Ihe comroll er\ FSM behavior 10 ,ec how a program would
execule on Ihe Ihree in; lrucli n A, II rcminder. remcmber Ihlll we follow Ihe
FSM convenlions Ihal all are implici ll y A Ded wi lh a ri , ill g clock edge. and
ihal any comrol ;ignal nOI explicill y a"igned a va lue in a SlalC i;, implicill y as;,igned a O.
The FSM inili all y Sian, in Male Illit . whi ch SCi, PC c 1 r - I, cau, ing Ihe PC reg
iSler 10 be cleared 10 O.
The FSM on the neXI clock cycle enlcr;, the Fetch SlaIC. in whi ch Ihc FSM reads
lhe inslruelion memory al address 0 (because PC i, 0) and loads Ihe read value
inlo IR-lhal read value will be Ihe inWuclion Ihal wa ... ' IOrcd ill 1/01. Allhe same
lime. the FSM incremenls the PC' s value.
The FSM on the nexl clock cycle enlers Ihe Decode SlaIC, which has no aClions
bUI which branches on the nexi clock cycle to one of Ihree ; lales. Load. Store, or
Add. depending on Ihe value of Ihe highe; 1 four bilS of Ihe IR regisler (lhe currem
instruclion' opcode).
Programmable Processors
In the L{I(/d ,tatc. the 10 M sets the data memory address line, to the low eight bit
of the IR and ,ets the data memory rcad enabl e to 1. setS the 2x I mux's select line
to pa" thc data memory output to the register fil e. and sets the register fil e wri te
addrc" to IRIII .. BI and the write enable to 1. causing whatever gets read from
the data mcmory to be loaded int o the appropriate register in the register fi le.
Likewise. the Store and Add states set the control lines as needed for the store and
add operati ons.
Finall y. the FSM rctllrl1S to the Felch state. and begins fe tching the next
instructi on.
NOti ce that becau,e the Sture state does nOt writ e to the register fi le. then the value of
rhe register mux select lines don't mUHer. so we've ass igned signal RF_ s=X in thai
meaning the signal's value does not maLler. Using slIch don't care values (see
Section 6.2) can help u 10 min imi ze logic in the controller.
You may wonder why the Decode state is necessary when that state contains no
we not have j ust had Decode' s transiti ons originate instead from state
Ferch" Recall from Section 5.3 that register updates listed in a state do not actuall y occur
until the next clock edge. meaning that transiti ons ori ginati ng from a state use the pre-
\" iou register \alues. Thus. we could not have originated Decode's transiti ons from the
Ferch , tate. because those transitions would have been using the old opcode in the
instruction register IR. not the new value read during the Ferch state.
8.4 A SIX-INSTRUCTION PROGRAMMABLE PROCESSOR
Extending the Instruction Set
Clearly. having onl y a three-instructi on instructi on set limits the behavior of the programs
that we can wri te. All we can do with those instructi ons is add numbers. A real program-
mable processor wi ll support many more instructi ons. perhaps 100 or more. so that a
"ider variet y of programs can be writt en.
Let's extend our programmable processor s instructi on set with a few more instruc-
tions. in order to give you a sli ghtl y better idea of how a programmable processor wi th a
full instruction set woul d look.
We'll begin by introducing an instructi on able to load a constant value int o a register
file register. For example. suppose we wanted to compute RFIOI = RFI I I + 5. The 5 is a
constant. A cOllstall1 i, a va lue that is part of our program. not somethi ng to be found in
data memory. We need an instruction that all ow uS to load a constant into a register, after
which we could add that regiMer to RFf II using the ADD instruction. Thus. we introduce
a new instructi on with the fol lowing machine and ",sembl y code representati ons:
Load-coll sl all l in\tructi on--{)O I J ' j ' Z'1 ' 0 c, c"csc. CjCZClco: specifies that the
binary number represented by the bit ' C, c6clc.CjCZCICO . hou ld be loaded into the
register specified by rl rZrlrO' The binary number being loaded is known as a co,,
\10111. The mnemonic for thi s in,truction i, :
\IOV Ra, #c- 'pccifies Ihe operati on RVl al=l'
8.4 A Six-Instruction Programmable Processor 435
a can be 0, I , ... , Or 15 A . , .
c can be - 128 - 127 . ssumlng two s complement representation (see Section 4.8).
instructi on f' . .. . O ... 126. 127. The "W' enables the assembler to di stinguish this
rom a regul ar load Instructi on.
We continue by introducin . .
ters simil ar t dd" g an lI1StructlOn for perfonning subtraction of two regi s-
. 0 a ili On of two registers. having the foll owing machine and assembly code
representatI ons:
SlIbtracl instructi on-1l 100
bt
. f , a),aZ,al,a
o
rbj,b, rblrb
o
specifies
su ractl on 0 two reoister fi le' ' fi -
. h h e registers speci cd by rb)rb, rb,rb
o
and rC3rc,rc trco
WII t e stored in the register fi le register by For
example, 0100 0010 0000 0001" specifies the instruction RFlil=RFfOI-
RFI II. The mnemoniC for thi s instruction is:
SUB Ra, Rb, Rc-specifies the operati on RFfal=RF{bl - RFlcl
Let' s also introduce an instruct ' I II .
Ion l,at a ows us to Jump to other parts of a program:
llllllp-if-zero instructi on-1l101 raj raZratra
O
O, 06050.0j010100: specifies thai if
contents of the register specified by ra
3
ra
Z
ra
i
rao is O. we should load the PC
Ihe current value of PC pl us o,06050.030Z0tOO' which is an 8-bit number in
two s complement fonn representing a positive or negative offset amount. The
mnemoni c is:
J MPZ Ra, ofTset-specifi es the operation PC = PC + offser if RF{al is O.
By using two's complement for the jump off et. whi ch all ows representation of positive
or negati ve numbers. the program can jump backwards in the program, thus imple-
menting a loop. With an 8-bll offset, the instruction can specify a jump forward by 127
addresses, or backward by 128 addresses (-1 28 to + 127).
Table 8. 1 summari zes the six-
instructi on instruction set. A program-
mabl e processor typicall y comes with a
databook that lists the processor's
instructi ons. and the meaning of each
instructi on. using a fonnat similar to the
format of Table . I. Typical program-
mabl e processors have dozens. even
hundreds. of instructi ons.
TABLE 8.1 Six-i nstruction instruction set..
Extending the Control Unit and Datapath
The three new instructions require some
extensions to our control unit and data-
Instruction
MOY Ro. d
MOY d. Ra
ADD Ra. Rb. Rc
MOY Ro. #C
SUB Ro. Rb. Rc
l MPZ Ro. ofrset
Meaning
RF[a] = O[d]
O[dl = RF[a]
RF[ol = RFIb]+RF[ 1
RF[al = C
RF[al = RFIb]-RF[ 1
PC=PC+<>ffset if
pat h of Figure 8.1 0. wi th those extensions shown in Figure . 1_. Fin;!. load con ranI
instruction requi res that the register file be able to load data from IR{ .. OJ. in addition to
data from data memory or the LU output. Thu . we \ iden the register file" ' multiple,.r
from 2x I 10 3x I. add another muX control signal. and al 0 create 3 ne\\ signal oming
from the ontrol ler labeled RF_III_tflllo. whi h will onnect \\ith lR{ .. OJ-these banges
are highlighted by the d", hed circle labeled .. r' in Figure e ond. the ' ubtract
.
-'36
8 Programmable Pr ocessors
s1 sO ALU operation
o 0 pass A through
-
o 1 A+B
t 0 A-B
addr rd data I I D addr S
D
% 16
I
addr
D rd rd
frD wr wr 256x16
.......... '.
.. ', I ".
dataS t f1 6
... I 2 1 0 .:
51 51 16.br\ ,
'/ sO 3x1
Id
u
16
f
RF W addr 4 W data
RF W wr
W_addr -
Controller RF Rp addr 4
W_wr
RF Rp rd
Rp_addr
16x16
RF Rq addr 4
Rp_rd
RF
RF Rq rd
Rq_addr
Rq_rd
.... j;; Rp dala Rq data
RF _Rp zero
__ ....
: al u sO : ALU
>
Control unit
Datapath 16
Figure 8.12 Control unit and datapath for the sixi nstruction processor.
instruction require that we use an ALU capable of subtracti on. so we add another ALU
control signal-highlighted by the dashed circle labeled '"]" in the figure. Third, the
jump-if-zero in truction requires that we be able to detect if a register is zero, and that we
be able to add IR(7 .. 0( to the Pc. Thus, we insen a dat apalh component to detect if the
register file's Rp read pon is all zeros (that component would just be a NOR gate), labeled
as dashed-ci rcle "3a" in the figure. We also upgrade the PC register SO it can be loaded
wi th PC plus IR(7 .. 0j. labeled as "3b" in lhe fi gure. The adder used for thi s also subtracts
I from the sum. to compen;ate for the fnct that the Felch state already added I to the Pc.
We also need 10 extend the FSM for the conlroll er within lhe control unit to handle the
three additional in'>tructions. Figure 8. 13 shows the extended FSM. The Illil and Felch stales
,tay the same. We added three new transitions from the Decode state for the three new
,",truction opcode ... We made a minor revision to the UllId, 1Ore. and Add tates' action
(the new action, are italicized) s ince the file mux has a mux with two select lines
,",tead of ju,t one. Likewise. we revised the Add ,tatc action .. to confi gure the ALU with
two conlrol hne, '"'tead of one. We added four new ,tate.,. /"I}(/(/-('oll '</OIII. SlIblracI.
8.4 A Six- Instruction Programmable Processor 437
D_addr=d
D_wr=1
RF_sI=X
RF_sO=X
RF _Rp_addr=ra
RF_Rp_rd=1
RF_Rp_addr=rb
RF_Rp_rd=1
RF_sI=O
RF_sO=O
RF _Rq_add=rc
RF_RQ..rd=1
RF_W_addua
RF W wr=1
alu_sO=1
RF_s1=1
RF_sO=O
RF W addr=ra
RF=W=wr=1
Figure 8.13 COnlrol unit and dat.path for the six-instructi on processor.
RF _Rp_addr=rb
RF_Rp_rd=1
RF _s1=0
RF_sO=O
RF _Rq_addr=rc
RF _RQ..rd=1
RF W addr=ra
RF=W=wr=1
alu 51=1
alu=sO=O
RF_Rp_addr=ra
RF_Rp_rd=l
e
'"
d
a:
... 1
a:
JUlllp-iJ-zero, and JUlllp-iJ-zero-jll1p, for the three new instructions. The new in tructi on
states perfoml the following funclions on the data path:
In Ihe Load-cOI/Slalll state, we configure the register file mux to pas the
RF_W_da ta signal. and we configure the regi ster file to write 10 the addres pec-
ified by I'a (which is IR( 11.. 8]).
In Ihe Sublracr Slate. we perfonn the same action a in the Add tate. except that
we configure lhe ALU for subtraction instead of addition.
In Ihe state, we configure lhe register file 10 read the register pee-
ified by ra onto read pon Rp. If the value of the read register Rp i all 0 .
RF_Rp_zero will become 1 (and a otherwise). Thus. we in lude two transi-
tions from the JlIIl/p-iJ<ero slate. One tran ition will be laken if RF_Rp_zero i
O. meani ng the read regisler was nOI all OS-lhat transition takes the F M back
to the FeTch state. meaning no actual jump occurs. The other tran ition will be
taken if RF_ Rp_zero is 1, meaning the read regi ter was all Os. That tran iti n
goes to another Slate, which hould actually carry out the
jump. That slat e carries OUl the jump simply by etling the load line f the Pc.
Notice Ihal with Ihe addi tion of a instru tion. the proce or may take up
to four cycles 10 complete an instruction. nmely. when the ra regisler of a
instnlction is all as, Ihen an extra slate is needed to I ad the PC with the address f the
instrucli n 10 \Vhi h 10 jump.
-'311 8 Programmable Processors
8. 5 EXAMPLE ASSEMBLY AND MACHINE PROGRAMS
Usi ng the ,i x-in, lrucli on inslruclion sel of lhe previou, TABlEB.2 Instruction opcodes.
$eclion. we no\\ provide an example of
guuge programming the six-inslructi on
to perform a parti cul ar task. and we show how the
"" cmbly code woul d be converted 10 machine code by
an a"embler. n acmbl er would make use of lhe table
shown in Tabk 8,1. which maps inslrucli ons 10 opcodes.
EXAMPLE 8.4 Assembly and machine programs for a simple program
\Vritc a program that COll n t.;; the number of words that arc n OI
equal to 0 in daw, mt!l11ol) ...j. and 5. and that stores the
result in data memor) locat ion 9. the possibl e result s that
\\Qul d be i n locati on 9 arc Lero. one. or two.
Instruction
MOV Ra. d
10V d. Ra
ADD Ra. Rb. Rc
MOV Ra. #C
SUB Ra. Rb. Rc
JMPZ Ro. offsel
Opeode
0000
0001
00 10
0011
0100
0101
U\lI1g the in-, tfuction ... et of Table 8.2. we can wri te an program as shown in Figure
.I..l(a). The progrJm mai nt ain.., the count in register RO. whi ch the program initializes to O. The
program mil} need to add I to lh b register latcr. so the program loads the value I into regi ster RI.
The program next load, data mem ry locmi on 4 inlO regi:-tcr R2. The program then jumps 10 the
in,tTUClion labeled "lab I" if the \'alue of R2 is zero. If R2 i:-. not l Cro. the program will ex.ecute
an add imlruetiOn that add ... one to register RD. and will then proceed to the instruct ion labeled
"Iabl" ... inee that in"' lruclion the next instructi on. The instructi on labeled "Iab l" loads data
MOV AO, #0; /I initialize resuil lo 0
MOV Rl , #1 ; II constant 1 for incrementing result
MOV R2, 4; /I get data memory location 4
JMPZ A2, labl ; II if zero. skip next inslruction
ADD AO. AO, Al ; /I not zero, so increment result
tabl :MOV A2, 5; /I get data memory location 5
JMPZ A2, lab2; /I if zero, skIp next inslruction
0011 0000 00000000
0011 0001 00000001
0000 0010 00000100
0101 001000000010
0010 0000 0000 0001
0000 001000000101
0101 001000000010
ADD RO, AO, A 1; /lnol zero, so incremenl ,esull 0010 0000 0000 0001
lab2:MOV 9, AO; /I store result in dala memory location 9 0001 0000 00001001
(a) (b)
Figure 8.14 A program 10 counl lhe nu mber of nonlero numbe" in Df.Jj and Dj5 j. sloring lhe
r",uil 10 Dj9j: (a, ""embly code. and (b) corre_ponding machine code generaled by an assembler.
The In Ihc machine code\ 16-bi t nre lherc for your cOlwcni encc as you read thi s
hook; actual machine code no , lIch \ p3Ce\,
memory locallon 5 1010 reg"lcr 112. The program j ump' 10 lhc in' lrucli on labeled "lab2" if Rl is
fero. II R2 1\ not fCro. thc program execute' an add in"trucli oll that add, one LO rcgi ter RO. and
lhen proceed' to lhe neXl '"'lrucllOn. "hich i, the in' lruelion labeled "lab2." ThaI ilmruclion SIOres
the conlcnl\ 01 rcgl\ICr RO to data memory 9.
In ,""rltlng the .' ......emhl)' program. we cho ... e tIll! regl\ tcf' thaL we used to the
rc'\ull. the e,:un... t,lOt I. the u'ila memory locatinn COI)Y. We coul d h;,ve u ... cd any registers for
thoCi.C purpn\C,. r'or example. we could huvc u\cd ... tcr 1< 7 10 hold the rc\uit . meani ng all occur
renl:C" of RO In the codc would In\lcnd h.lve neen }(7 h n1 hcnTIorc. In writing the assembly
p"'gr .. m "c ,lfhll,"nly eh .. ," lhe label' "Iabl" "nd "1,lh2," We could have pI cked olher nu,"c\ for
8.6 Further Extensions 10 the Programmable Processor 439
lhose labels, such as "ski I" .. .. .. ,... ..,
live label s lhal hel p and done. or Fred and George. It s best, lhough. 10 u e descrip-
A P people readll1g lhe a sembly code 10 undersland the program.
n assembler would .
Figure 8. 14(b). For aUlomal lcally convert lhe assembly code 10 the machine code shown in
type by I k' each IIlstrucuon. the detenmnes the speCifi c instruction
prime 0 the as well all the operands if necessary. and thcn outputs the appro-
assen blpe e Ils (four blls) for lhat inslruclion lype. as defined in Table 8.2. For exampl e the
1 er would look al lh fi ' . .
leuers "MOV" lhat " e rSl II1s1rucllon "MOV RD. #0" and lhu know from the fi rst lhree
oper d . thi S IS one of the data movement the assembler would look at the
an s, and seeing "RO" Id k .. .
fina l! h . . wou now thi S IS either a regular load or a load-constant instruction;
putt" y. t assembler would see the "#" and conclude this is a load-constant instructi on, thus out-
. St mg. t e Opcode " 0011" for a load-cans ram instruction. as shown in the first machi ne
In ruCll on of the fi gure.
"00;; lhe operands 10 bilS also, converting "RO" of the firsl instructi on 10
. and #0 10 00000000 ," as shown in lhe firsl machine inslruction of the figure
The JMPZ ins' . .
if ' lruellon requires some extra handling. The assembler rccogni zes thi s as a Jump-
lhus OUlPUlS the opeode "0101 . " The assembler converts lhe firsl operand.
. 10 00 10. The as emblcr then reaches lhe second operand. "lab I." and does not know
to output. since the assembler doesn' t yet know the address of the instruction label ed
labl. ' as lhe assembler hasn' l reached lhal instruclion yel in lhe program. To solve lhi problem
many assemblers actuall y make I WO passes over the assembly code: during the pass. the
creales a lable of all labels and lheir addresses. and lhen on lhe second pass the as embler
? (PUlS. machine code. Such an assembler would therefore know during the second pas thal th
II1S1rucll on label:.d is al address lWO addresses beyond lhe first JMPZ in
the lab l instructi on IS at address 5. whli e the JMPZ III truclion is at address 3
(assuming lhal lhe firsl inslrucli on is al address O. nOl I). Thus. lhe assembler would amp t
off,sel of 2 10 jump forward 2 addresses. I alice lhal lhe labels "Iab l" and "lab?" do nOl ap u an
lhe h' od - pear 111
mac me c e-thcy are merely a convenience construct thai the assembler provides for the
programmer.
8.6 FURTHER EXTENSIONS TO THE PROGRAMMABLE PROCESSOR
Instruction Set Extensions
EXlending the instrucli on sel wilh further instructions would require similar types f
eXlenSlons and modifi cations 10 the control unil. datapath. and FSM. A prog:ramm
processor mighl cont ain dozens more dolo movemelll instructions. which
between data memory and lhe regisler file. or belween regi slers. For example. a processor
1llIghi have in lruclions for copying lhe contenlS of one regi sler 10 another (e.g .. !\IOV
RO. R I. whi ch would copy RJ' contents into RO). and would carry out that instru tion
uSlIlg a tale thal reads lhe source reg ISler. pas es .the read \'alue through the AL
unchanged. and wriles the ALU oulPUl 10 the desllnatlon reglsler. As another exam Ie
mighl have inslrucli ons lhm would use lhe COnlenlS of a regi ' ler as the
from whIch to read data memory. known as IIld,reC( addre' slIlg.
programmable processor would also conlain dozens I' arilhmeticflogic in
lions. whi h perf nn arilhmelic and logi operalions on registers in the register
example. a processor mighl include nol ju ' t add, and sublracl instru ti ns. \.luI also :
menl, complemenl. decremenl. AND. OR. XOR. shIft left. shift right . and other
insllllctions that could be carried OUI by an AL .
_ _____ ..... -.J_ ,
Programmable Processors
A programmable processor would furtherlll Ore sevcraIJlow-of-colltrol illstruc-
tiolls . \\ hi ch detenninc the next value of the Pc. I-or example. a processor IHl ght lOciude
not j u,t a j ump-if-lcro insLnlcti on: but also a lin un.condlll onal jump, an
indirect j ump. and perhaps e\en jump-Ji -negall vc and sImIl ar such InstruCll ons. Further-
more. a proceor may include instructions that can jump farther than j ust a small offset
fTom the current Pc. and perhaps even to an absolut e address rather than an offset address.
Input/Output Extensions 256x16 D
Section 1.3 introduced a basic mi croprocessor ==::
htl\ in2 input s 10. II . .... 17. and eight
outpu7s PO. PI . .... P7. We can extend the basic
programmabl e processor of Fi gure 8.1 2 to
23gB
implement such external input s and outputs .. One
method for such an extension would ullhze a
'pecially designed data memory. In that data
memol). we mi ght replace the last 16 words of
the memory by direct connections to the input
and output pins. as illustrated in Fi gure 8.15.
The data memory stores locati ons 0 through 239
in a normal RAM. Location 240. however. is
aCLUall y a special word whose hi gh 15 bits are
all Os. and whose lowest bit comes from a OIP-
flop loaded every cycle with the value on
external input pin /0. Thus. reading locati on 240
will ""uil in either 00 . .. 0 1 (i llleger I). or
wr
240: . . . I 10
241: '
248:: . PO
255:' P7
Figure B.15 Connecting to
external pins.
00 ... 00 (i nteger 0). depending on the value appearing at 10. Likewise. location 241 is
connected to pin II . locati on 242 to 12, and so on. Wi th locallon 247 connected to 17.
Location, through 255 arc connected to pins PO through P7. except the pms are can
nected to tho.e locati ons' flip-flop outputs rather than input . For example, writi ng to
location \\ rite the Oip-fl op with either 0 or 1 (onl y the low-order bit matlers during
the \Hite). and lhat flipOop drive external output pin P7.
Thu.,. an a"cmbly- Ianguage programmer can read or write a microprocessor's
external data pin' ,impl y by readi ng or writing parti cular data memory locations.
EXAMPLE 8.5 Motiontn-the-dark detector in assembly language
Secll()n 1.3 IIleluded an exampl e. ill u'lwted in Figure 1. 13. Ihat ulili/.cd to mi croprocessor to imple
menl a deteclor. That ,coli on utili7ed C code 10 c mpulc the expression PO
[0 && ! [1 In th" example. we ,how thc underlying ""cmbl y code Ihm would implemenlthat
C expre,,,on. A"umlllg lhat the lIli croproce"or' , eXlemal pin' 10 .. 17 'II1d PO .. 1'7 are mapped to
Uo.s ltJ memory hx:atlon ... a., 10 Figure 8 15. we can program Ihe c"' pre,\lOn 111 a"' 5,clllbly as follows:
o MOV RO. 240 /I move 0/240/. whI ch" Ihe value at pin 10. IIll 0 RO
MOV R I. 241 /I mOve 0/241/. will h " that va lue at pill II . into R I
'(JT R I R II/compute 'II . ""unllng e,,'lencc "f a complemenl instruction
AND RO. RO. R II/compute 10 && ' II . ""ullll ng '"I AND I", tructi on
\110\ 24K. KO 1/ move re,ult 100/248/. wh, ch" pill PO
8.7 Chapter Summary 441
Performance Extensions
One difference between real processors and the basic processor architecture in this
chapter IS lhat many real processor are pipelined (see Section 6.5 for an introduction to
plpeltnmg). The basic. three-instruction architecture uti li zed a controller with three
stages: ferch, decode. and exeClIIe. By inserting appropriate pipeline regi sters througbout
the deSign and modifying the controll er appropri ately. we could pipeline the fetch,
decode, and execute stages. In other words, as the control unit decodes instruction I. the
control unit could be simultaneously fetching instruction 2. Next. as the control unit exe-
cut es lnstruction I, the control unit could be decoding instruction 2. and felching
InstructIOn 3. Thus, rather than processing one instruction every 3 cycles, the control unit
could be processing one instruction every cycle. Each instruction still takes 3 cycles to
process (3 cycle latency), but the pipelini ng results in single cycle throughput. The net
result woul d be that programs would execute three times faster.
Another extension involves creating deeper pipelines. Thu . rather than just three
stages (fetch, decode, execute), we mi ght break the stages down to stages of even finer
granul ari ty (e.g., fetch. decode, read operands. execute. store reSUlts). Creating finer
grained stages may shorten the longest register-to-register delay. which enables a fasler
clock frequency. The net result would again be faster program execution.
Another extension involves having multiple ALUs in the dalapath. The control unil
may then perform mUltiple ALU operations simultaneously in the datapath. One fonn of
thi s extension involves a processor whose instruction set use in tructions with multiple
opcodes and associated operands in a single instructi on. known as a Very Large Instruc-
tioll Word (VLlW) processor. Another form uses a processor with a control unit that reads
in multiple instructi ons simultaneously and then ass igns those instructions to execute
simultaneously on avail able ALUs, known as a superscawr processor. A high-end
deskt op processor may support perhaps 5 si multaneou instruction . with 10
stages of pipelining. Thus. at any moment. such a processor may be in the middle of pro-
cessing 5*10 = 50 different instructions. Needless to say. modern proces or architectures
can become quite complex.
This chapter described the basic idea of how a programmable proces ors d ign
works and how the design could be extended to support a fuller instruction set. We lea:'e
the role of describing a complete processor. as well as modern processor de i!!n lecb-
niques for improved performance (such as pipelining. caching. elc.). to on
computer architecture.
8.7 CHAPTER SUMMARY
III thi s chapter. we stated (Section 8. 1) that programmable processors are \\idel)' u"ed for
implementing a system's desired functionality. due in part to their easy :\\'ailabilil\ and
short design (namel y. writing software). We provided (ection - l the basic
tecture of a programmable processor. consisting of a general-purpose datapath ha"ing a
register file and ALU: a ol1trol unit having a controller. Pc. and IR: and memories r
stori ng the program and the data. The control unit would fetch the in tru tion from
program memory. de ode the instruction. and th.en the U n nfiguring
the datapath to carry alit the instru tiOIl 'S peCltied perau n. \\e then de ' igned t tion
442 8 Programmable Processors
8.3) a simpl e Ihreei nstruclion programmable processor. and showed how a program
woul d be represent ed as Os and I s (machine code) in the processor's program memory.
We wenl fu rther 10 des ign (Secti on 8.4) a six instruction processor. and di scussed how
further eXlensions could be made to add more instructi ons and hence achieve a more rea
sonable processor archil ecture. We provided (Secti on 8.5) an exampl e of assembly and
machine code for Ihe six instructi on processor. We di scussed a rew extensions to the pro
grammable processor archit ecture (Section 8.6).
Programmabl e processors are typi call y produced in huge quant iti es (numbering in
the tens or milli ons. or even bi ll ions). and so tremendous a!tent ion is given to thei r
design. Readers should rea lize that the programmable processor des igns in thi s chapter
are extremel y simplistic and used for illustration purposes on ly. Yet, seeing even the si m
pl isti c designs, you hopefull y now have an understandi ng of the principle of how a
programmabl e processor works. Modern commercial processors are based on the same
principles-instructi ons arc stored as machine code in program memory, control units
felch. decode. and execut e Ihe instructions. and datapaths support the operations of the
instrucli ons using register lil es and ALUs. Modern processors just do a much better job,
usi ng concurrency. pipelining. and many other techniques to obtain hi gh ciock frequen.
cies and fast program execlItion.
8.8 EXERCI SES
SECTION 8.2: BASIC ARCHITECTURE
t;;:";J.S 8.1
8.2
r---. 8.3
PLUS
If a processor' s program counter is 20-bits wide, up to how many \,,-/ords can the processor's
instructi on memory hold (ignoring any special tricks to expand the inslruction memory size)?
Which of lhe foll owing are legal singlecycle dalapalh operalions for Ihe dalapalh in Figure
8.2? Explain your answer.
(a) Copy data from a memory location into another memory location.
(b) Copy two registcr locati ons into two memory locations.
(c) Add dala from a regisler fil e locali on and a memory locali on. sloring Ihe result in a
memory location.
Whi ch of the following are legal single-cycle datapath operations for the datapalh in Figure
8.2? Explain your answer.
(a) Copy data from a register fi le locat ion into a mcmory locati on.
(b) Subtract data from two memory locations and store the result in anot her memory location.
(c) Add data from a register fi le localion and a memory location. storing the result in the same
memory location.
8.4 Assume we are using a dual-port memory from which we can read two locations si multa-
neously. Modify Ihe d3lapath of Ihe programmable processor of Fi gure 8.2 10 support an
instructi on that performs an ALU operation on any two memory locations and stores the
re ult in a register fil e locati on. Trace through the execution of thi s operati on. as illustrated
in Figure 8.3.
8.5 Delermine Ihe operali ons required 10 instrucl the datapalh of Fi gure 8.2 to perform Ihe opera
lion: DI 81 = (D[4] + D15J) - D[71. where D represent s the data memory.
8.8 Exercises ... 443
SECTI O 83' A THREE I ST
. . " . R CTi ON RLE PRO ESSOR
8.6 If :1 processor's instruction has 4 bilS for thl! 0 'f '.
processor Suppan? pcode. ho\\ many 1Il 'l lnl ti on., can the
8.7 does the foll owing a\SCl11bly pro mill which II . ' '.
Ihl s chapter. com pUl e') MOV R5 19' AgD' toe, the thrcc- IIl!'oo lruCIi On Ili stmeti on SCI of
. " D R5. R5. R5: MOY 20. R5.
8.8 Whal doe, Ihe followi ng , . bl .
thi s eh a y program. which Ihe in!<.lruc( . r
8.9
8.10
8.11
8. 12
. apler. COmpUle? MOY R4. 20: MOY R9 18' ADD R4 R4 Rl). . 'On SCI a
R4, R5: lOY 20. R9. '" . . MOY R5. 30: ADD R9.
Using the three-instructi on .sct of Ihi!- chapt er. wri te Ull I
updales Ihe (lain memory D as foll ow.: DIOI = DIOI + Dill . ' . b y program Ihal
Using Ihe Ihree I '" .
'" inS nl CIl 11 II\Mructl on of 1111 :-' chupler. write nn )r .
Update' Ihe dala memory D a' foll ows: DI41 = DIII. 2+ D1 21. y I ogr,II" Ihal
the following :I\scmbl y program 10 machine code ba,,"-cd on the Ih . ' .
"" Irucll on ,el of Ihi s chapler: MOY R5. 19: ADD R5. R5. R5: MOY 20. R5. rcem, lruCli on
Lisl Ihe b' Isic re . I I f
" gl!-. cr memory IrJns crs and opcmli on ... Ihut OCcur duri ng each clock c
the foll owlflg progmrn. on the Ihrce-ill \ trucli on of Ihi J yclc for
I: MOY R I. 9: ADD RO. RO. R I. . ' lOpler: MOY RO.
SECTION 8.4: A SIX INSTRU Tl ON I'ROGRAMMAIJ LE I'ROCE SOR
8.13
8.14
8.15
Li !o. I the basic regbtcr/mclllOry and operali on) Ihal occur during c'lch J k
Ihe foll Owing progmm. on the \ ix-instruction il1\ tnl cli on \ct f thi s' h-C oc - cycle for
Ihal Ihe COl1lenl or DI 91 i, O' MOY R6 # I' MOY R5 9' JM""Z 1'5 I I II' c ,'pIer. "", uming
. . . ,-,- ,." >0 : ADD R5 R
label I : ADD R5. R5. R6. Whal b Ihe vallie in R5 after Ihe program Compieles? . 5. R6:
Add u new to the \Cl of thi ... dl' lpl er Ih '
b' '. ' al perfo
II wlse AND of two registers and stores the rC5ult in .a third rcgiMcr E:<tc d h nns a
cont r I unit. and the cont roller's FSM as needed. . , n l c dat<Jpalh.
Add a to thc six-i nstructi on :-oct of this ch3ptcr that rfom '
uncondlt, onal Jump Uumps always) 10 a location specified by a 12.bil on,cl E pe IS an
palh, control un il, and Ihe controll er's FSM as nceded. . Xl end Ihe dala
8.16 a new. instruction to the six-instruction in truction sct of this Ch3plcr that perform.s .
,f Iwo reglSlers are equal. 10 a localion speci fied by a 12-bil offset. EXlend lhe
control un'l . and Ihe cont rollers FSM as needed. lapalh.
8.17 the six-instruction instructi on set of this chapter, wri te .111 assembl y program for th
10wll1g C code, whi ch com pUles Ihe sum of Ihe firsl N numbers. where is an h e fol
D191 . Hillt: Usc a regisler 10 fi rsl SlOre N. Ol er name for
i -l :
sum-O :
wh i Ie (i !
sum sum + i;
i 3 i + l :
8. 18 Using Ihe eXlended inslrucli on sel you designed in Exerci se 8. 16. wrile an 35sembl
for Ihe C code in Exercise 8.17. Y program
44-' Programmable Processors
SECTION 8.5: EXAMPLE ASSEMBLY AND MACHINE PROGRAMS
8.19 Define twO new daw movement instructions for the ... ix-i nstruction i nSlruction sel of thi
chapter. Extend the datapath. control unit. and the controll er's FSM as needed.
8.20 Define two new arilhl1lcLic/logic instructions for the six-i nstruction instruction set of this
chapter. Extend the datapath. control unit. and the controller's FSM as needed.
8.21 Define two neW now-or-control instructions for lhe six-instruction instruction set of lhis
chapler. Ex.tend the datapath. control unit. and the controll er's FSM as needed.
8.22 Assuming that the microprocessor's external pins 10 .. / 7 and PO.. P7 are mapped to data
memory locations as in Figure 8. 15 and an AND instruction has been added to the six- instruc-
tion instructi on of thi s chapter. create an assembly program th'l l will output 0 on P4 if all
eight inputs 10 .. 17 are is.
Carole grew li p in 3. country
where the best swdcnts went to
engincering school. as
engineeri ng was highly respected.
"1 W:lS good in school. so
engineering secmed like a natural
option. I \Vas also very interested
in building things. and very
curious about how one builds new
things-so I was attracted to
engineering aI an early age. around 10 years of age."
Carole has worked at I ntel for 15 years. She was Olle of
the original architects of the popular MMX (Multimedia
Extension of the Intel Architecture) pan of Pentium
processors ... It was fascinaling to learn the algorithms
used to compre s video and audi o, and to invent new
instructions for the Intel ArchitecLUre to run these
applications efficiemly. It is not always easy for processor
architects to quantify the benefi ts of new fCJlUres, and to
motivate the expense in si li con area (or chi p die size) for
new instructions. I n the case of mult imedia appli cutions.
the benefits are well understood: running a video clip at a
few frames per second. or running it in real time (about 30
frames per second) makes a huge. visible difference to
everyone:' As is the case with so many engineers. she is
very proud of what she accomplished: "When the first
Pentium processor with MMX came up. it was really
rewarding 10 think th3t a small piece of my mind was in
all of these machines running video real time popping up
everywhere."
Carole was also one of the archit ects on the Intel I
Hewlett Packard tearn that defined the Itanium computer
architecture. "This was a unique opponunity to define a
processor 'from scratch.' Technicall y thi s was a very
challenging project. and worki ng with so many top notch
architects was very enriching. But I also learned what it
takes to bui ld something big. involving a very large team,
and two large companies. The two compani es had different
cultures. diffe rent methodologi es, and reconci ling the
differences was sometimes more challengi ng than solving
the technical problems. But thi s is all pan of ' bui lding
things: and this was a gre.u lesson in leadership."
\Vha( Carole likes most about her career is "the
constant change. After 22 years as a computer architect, I
a111 still doing new things every day. Computer science is
a work in progress. and it offers new opponunities that
one has 10 grab. and run with. Thi s is where the fun is."
Asked to give some advice to students. Carole suggests
two things:
"Stay at school as long as possible. Get a PhD if you
can. To be able to adapt to constant change, you will
need a very robust. and theoret ical foundation. Onl y
learning how to do things is not enough; it will get you
a job for 2 years. but then your skill s wi ll be obsolete."
"Be open for change. It is imponantto bui ld an in-depth
expenise in one area. in my case, it is computer archi-
tecture. But one has to be ready to use thi s expertise in
many different proj ects. with different people, and more
and more in different pans of the world. Fifteen ye""
ago multimedia appli cat ions were the focus of many
computer architects. Today it is bioinformatics and data
mining. Change requires a lot of work to learn new
domains. but not adapting to change is not an option."
9
Hardware Description
Languages
9.1 INTRODUCTION !
In thi s book, we have been drawi ng the circuit. that
we destgn. For exam I . Ch .
. , p e. III apter 2. we deSIgned
an door opener circuit and drew the circuit
shown til F,gure 9. 1. A drawi ng has more informa-
tlon.than is reall y necessHry to descri be the ci rcllit. In
particu lar, the drawing gives information about the
DoorOpener
locatt on of the inputs and outpUts: in the drawing or Figure 91 D .
Ft gure 9.1. the inputs are on the Icft, the output on . ruwn clfcuit.
the nght , and the c input is on the top, the h input in the midd le. and the in
bottom. The drawlIlg also gives infomlation about the size alld I . P put on the
. I . . . . ocmlon of th
nents III tl e Clrcull: the IIl vertcr IS Ht the top the OR gate below tl ' e compo-
D h ' , 'lC IIlvcner th
"ate on t e nght . and each component is abollt a half inch by a I alf ' I ,e A D
. . f " ,1 II1e 1. Th d .
gtves til ormati on about the wIres tOO' the wire from the inverter e rawlllg
d h . .' goes to the ri h h
own, t .en to the nght agalll. for example. However, all that informati on . g t. t en
drawlIlg tS really Irrelevant , and has not hing to do with how the des ign will be the
tmplement ed. We had to draw the circuit somehow. so we chose to dr'I \V th . P ystcall y
h
. ' e CIrCU Il ' h
manner s own III the figure. But we coul d have drawn the circui t many oth III t e
drawlIl g of a circuit is commonl y referred to a a ci rcui t schematic. er ways too. A
A problem with drawing all our ci rcuits arises when we deal with large . .
tl h
.. F" r CtrcullS Does
le sc III tgure 9.2 mean anythlllg to you? That schematic has J' u t .
a coupl e d
components-what if there were a couple t1lOusand eomponelllS. as is' oZen
D
. . . ' qUll e commo ?
rawlIl g a large CtrCUIl would reqUIre tremendous effort on Our pan to fig n .
I
.' .
P ace each componelll 111 the drawIIlg. and how to route Wires among the co v 0
' f mponenLS A d
t . a tool generated the circuit , the tool would have to spend compute time to fio . n
vtsually-appealing way to draw the ci rcuit (rather than a paghctti-like me ) "ure OUt a
. . . . d '11 I . 5S . and ueh
com put all on tS tlmeconsumtng an Sll may not resu ttll a good drawing. Funhenno
ti les used to store such schematic would be very large. as those tiles would re, the
th
. <. . I . d' f eontmn all
at extra lIl/ ormallon about the prectse ocall on an SIze 0 every component All th
. at extra
I Substamial content or thi s chapter was contributed by Roman Lysccky.
44S
-------- --- -
446
(aJ
Hardware Descriplion Languages
Figure 9.2 Schematics become h .. rd to read beyond a dozen or so component s-the
graphical inronllation bccolllc!ol a nui sance raLhcr than an ai d.
efron. file size. and lime. woul d be needed for somel hi ng Ihal is reall y nOI very u erul-
humans can'l comprehend circuil drawings or more Ihan perhaps a hundred or so gale. so
what 's Ihe poim or drawi ng such circuil s? What we reall y wanl is a way 10 just describe the
ci rcuil ilselr- whal arc Ihe inputs and outpulS. whal components exisl. and what are Ihe con-
neclions? Ideall y. we would do thi s descripti on in a texwal language, a that we humans
coul d Iype such descri pli ons wilh a compuler keyboard. j ust like we type email messages and
C programs.
We coul d Iherefore describe Ihe circuit in Fi gure 9.3(3) using Ihe lextual language or
English as shown in Fi gure 9.3(b). We've given names to each gate in the circuil and 10
the illlernal wi res in Figure 9.3(3).
(b) We'll now describe a circui t whose name is DoorOpener.
The external inputs are c, h and p, which are bits.
The external oUlput is I, which is a bit.
We assume you know the behavior of these components:
An inverter, which has a bi l inpul x, and bit output F.
A 2-input OR gate, which has Input s x and y, and bit output F.
A 2-input AND gate, which has bil input s x and y, and bit output F.
The circuit has internal wires n1 and n2, both bits.
The DoorOpener circuit internally consists 01:
An inverter named Inv_1. whose input x connects to
external input c, and whose oUlput connects to n1.
Figure 9.3 Describing a circuil using a
tcxtual language rather than a graphi cal
drawing: (a) schemalic. (b) lextual
description in the English languagc.
A 2-input OR gate named OR2_1. whose inputs connect to external
inpuls hand p. and whose outpul connects to n2.
A 2-i nput AND gate named AND2_1, whose inputs connect to n1
and n2, and whose oUlput connects to external outpul I.
That's all.
or course, Engli sh is not a good language ir you want to use a computer tool to read in
the descripti on-a computer tool requires a language with a precise syntax and precise
meaning ror every language construcl. Com pUler-readabl e languages thus evolved in the
I 970s and I 980s ror describing hardware circuits. Such languages became known as hard-
ware descriptiol/ lal/guages, or HDLs. Hardware descripti on languages not onl y enable us
9.2 Combinalional Lo . D " .
glc escrlptlon USlllg Hardware Description Languages ... 447
10 describe the slruclural illlerconnections .
descri be Ihe beh ' r or componelll,. hUI abo II1clude melhods for t" 10
aVlor a componelll Ihen I ' Mod . . .
the use or HOLs al ' II . 1>e Vo, . em dlgll al de"gn relics heavil y on
. . ,t tage; of de.<lgn.
We II prOVide n bri ef imroducti n I
ouages-VHDL ' 1 10 I1C mOSI popular hardwnre de,cri pli on lan-
o . en og. and YSlcmC 111 Ih' h b I
one may wanl 10 con; uit I : -:- "c aplcr. ul 10 rca Iy Icam each language.
thi s chapler ca be SpeCificall y dedi :Hed 10 each Iangll age. Each secli on or
aft er Chapler 2
n
S' 3 IIllmedlately aft er correspondi ng carli er chaple" (Seclion 9.2
arler Ch'lpler 5>-.:,cI,
l
on . after Chapl er 3. eCli on 9.4 aft er hapler 4. and ecti on 9.5
c11'1pters' Fir I 1e,e sections muy be covered all at onCe "rler compleling Ihose earli er
' r S un 1ermmore. each seclion hus three pa riS. one ror VHOL one ror Veri log and
one lor ystemC Each of Ih se n.-, " d "
d
. '. . pa I 111 ependcllI or Ihe mher pari s or Ihe secli on ;0 "
rea er II1l eresled onl y 111 one or Ihe HOI " ' 1 . ,
' h . " -'. say ven og. can rend only Ihe Veri log part' or
eac seCll on, Sklppll1g Ihe VHOL Or SyslemC pan,.
HDLA II1l ere ted in comparing Ihe three HOI., may ",,,d Ihe ,ecli ons of " II Ihree
' 1 . s. . oll1g so. YOllmay nOll ce Ihat Ihe HDL, have ,i milar capabilili es di frering prim.t-
n h
Y
IIlHI
O
e
L
II' symax. Tilli , aft er leaming One HOL th roughl y. a de' igne; can likely
0 1 er s qUi ckl y. '
9.2 COMBINATIONAL LOGIC DESCRIPTION USING
HARDWARE DESCRIPTION LANGUAGES
Structure
This chapt er's introducli on sough I 10 describe a circuil lI sing a lexwal language. We now
show how HOLs descnbe a circll il. The lerm stTllcture is somelimes used
10 rerer to a CII'CUII. wllh slructure meaning an int erconnecti on or componenl.
VHDL
Fi gure 9.4(c) shows a VHOL of Ihe DoorOpeller circuil or Figure 9.4(a). For
convellience, we' ve 31. 0 shown the Eng" h descri pli on in Figllre 9.4(b), and Lhe correspon-
dence bel ween Ihe Engli sh descnpll on and Ihe VHDL de cripLion.
begins with an elltity declaralion, whi ch defines the de ign's name and
Ihe deSign s IIlpUts and outpulS. known as ports. An entil Y declaration says nothi ng aboul
Ihe IIllemals or Ihe deslgn-:-Just the deSign's .name and interrace. The description li sts the
port names and defi nes thCll' Iype. whi ch III thi S case is Iype s td_ l og i C. That type es en-
li aJiy means a bil, bUI isn' l bui ll imo VHDL (Ihe predefined bit type in VHOL is too limil ed,
for rea ons beyond our scope here). To use s td_l 09 i C, we aCluall y musl incl ude Lhe stale-
ments: " library ieee ; use ieee . std_logic_1l64.all ; " a1thelOp oflhe fi le.
The description continues wilh an architectllre definiti on, whi ch descri bes the intemals
of the design. We named Ihe archilecture Circllit. bUI we could have named il anything we
wanted; DoorOpellerCircllit, DoorOpellerStructllre, Structllre. or even Fred, although we
want a name Lhat is helpful in underslanding Ihe architeclure. The architecture lans by
declaring what components the design will be uSing-Lhose components must be defi ned else-
where, perhaps earlier in Ihe descripti on's fil e. or perhaps in another fil e. We' II di scuss those
componenls' definiti ons later- for now, as ume they are omehow already defined. Each
componenl declaration mUSI define the inputs and OUlputs or each componenL and those
inputs and OUlpUIS mUSI match the component 's entity declarati on (found el ewhere) exactl y.
448 9 Hardware Description Languages
DoorOpener
tnv_'
library ieee i
use ieee.std_logic_1164.alli
.... entity DoorOpener is
(a)
........... // port h, p : in std_logic;
... "/,,,- _ ... _ ... f : out std_logic
We'll now describe a circuit whose name is DoorOpener, ............... ... ... -- .J, ;. ..
The external inputs are c, hand p. which are bits .......... - .' . nC:! DoorOpener;
The extemal output is f. which is a bit. ---------------- .'
architecture Circuit of DoorOpener is
We assume you know the behavior of these components: _------ component Inv .
An inverter, which has a bit input x, and bit output F. ----- port (x: in std_logl.c:
A 2-,nput OR gate, which has Inputs x and y. ____ F, out st,,-logie 1 :
and bit output F ---- end component :
A 2-input AND gate, which has bit inputs x and y, ........ ------- component OR2
and bit output F. ......... port (x. y: in stcLlogic;
.... F : out std_logic):
The circuit has internal wires nl and n2, both bits..... .......... end component ;
.. ........ component AND2
The OoorOpener circuit internally consists of: port (x, y: in std_logic;
An inverter named Inv l ,whose input x connecls 10 .. ""'.. ...... F: out std_logic);
external input c, and-whose output connects to n1 . "' .. '" .......... end component ;
A 2-input OR gate named OR2_" whose inputs ........ ''''''', ""'... signal nl, n2 : s td-log ic; - - in ternal wires
connect to external inputs hand p, and whose oufPul... ','
connects to n2. .. ........... ' ..
A 2-input AND gate named AND2_' , whose inputs -___ ......... ' Inv_l : Inv port map (x=>c. F=>nl);
connect to nl and n2, and whose output connects 10 ---... _ ...... ". OR2_1: OR2 port map (x=>h. y=>p. F=>n2) ;
externaloutput f. --'AND2_1: AND2 port map (x=>nl,y=>n2.F=>fl;
That's all. ____________________________________________ end Cireui t;
(b)
(c)
Figure 9.4 Describi ng a circuit using a textual language ralher than a graphical drawing: (a) schematic, (b) textual
description in the English language. (c) textual description in the YHDL language. Bolded words are reserved words
in YHDL.
The description then includes a declarati on of the design's internal sigllals, which are
essentiall y internal wires. Next to that decl aration, the description includes an example of
a YHDL comment : "-- i nterna 1 wi res". Comments start with "--" followed by
any text we want on the rest of the line. That text is ignored by YHDL tools, but is useful
to us humans who must read the descriptions.
Fi nall y, the descri ption instanti ates the circuit 's components and defines those com-
ponents' connections. For exampl e, the description instanti ates a component named
111 11_1, whi ch is a component of type 11111 (whi ch we declared earlier in the YHDL descrip-
tion), and indicates that 1/l 1I_l' s input x connects to c, which is an external input. An
alternate, more concise port map not ation omits the port names. Using this notation, we
could instantiate our inverter by writi ng "Inv_l: Inv port map (c . nlJ :". The
order of the signal s in the port map of I lIv corresponds to the order of the ports in the
component definition of Ill v. We wi ll use thi s alternate notat ion in subsequent examples.
9.2 Combinational Logic Description Using Hardware Description Languages <II 449
The bold words in the desc" t'
. Ip IOn represent reserved words. abo known", keywords
111 YHDL. We cannot use reserved \ dr ... ..
. . YOT S lor names of entit ies. architectures. signal:::,
1I1stantlated components. etc .. as those words have special 1l1eaning that guide YHDL
tool to understand Our descriptions.
Summarizing. the YHDL structuml descri ption has an entit y that de crihcs the design's
na1l1e, "'puts. and OUlput S: a declaration of what components wi ll be u,ed: a dec1amtion of
1I1t em31 and finally, nn in !antialion of all component, . along with their
Interconnecti ons .
The entity thm we've just defined could then be used as 11 component in another enti ty .
Ycrilog
Figure 9.5(c) a Yerilog description of the O()orOpeller circuit of Figure 9.5(3). For
convenience. we ve al so hown the English description in Figure 9.5(b). and the corre-
spondence between the Engli h deSCription and the Yerilog descri ption.
DoorOpener
Inv_'
(a)
, m04u1. tnv(x. F):
We'll now describe a circuil whose name is DoorOpener. \ ,/ input x;
The external inpuls are c, hand p, which are bits,,' \ I output F:
The external output is I, which is a bit. , I I deta i 1s not shown
" \ " \ endmodu1.
We assume you know the behavior of \_ .. - .a4ule OR2 (x. y. F);
An inverter, which has a bit input x, and bit ou.tput F.Al-......... \ input x. y;
A 2-inpul OR gate, which has inputs x and y, \ \ output F;
andbiloutpulF \ \ \ /1 details not shown
A 2-inpul AND gate, which has bit inputs x and y, ___ \ .ndmodu1.
and bit output F. \ AND2lx, y. F);
\ \ \ input x. y;
The circuit has intemal wires n1 and n2, both bits. " \ \ \ output P;
' ....,. '" \ .... ..details not shown
The DoorOpener circuit intemally consists 01: ., \
An inverter named Inv_1, whose input x connects to ........ " \ \ \
external inpul c, and whose output connects to nl ...... , .. ' .... DoorOpener(c, h, p, f);
A 2-input OR gate named OR2_' . whose inputs '_, -__ '" \ ' input e. h. p:
connecllo external inputs hand p, and whose output... _... ...., ..' ....: output f;
connects to n2. --...... " .. wire n1. n2;
A 2.input AND gate named AND2_', whose inputs-____ -"< Inv Inv_l (e. n11:
connectton' and n2. andwhoseoutputconnectsto ----____ OR2 OR2_1(h. p. n21:
extemal output f. --AND2 AND2_1(nl. n2. fl:
That's alf. ____________________________________________ '-ule
(b) (c)
Figure 9.5 Describing a circui t using a textual language rather than a graphical drawing:
(a) schemati c. (b) textual description in the English language, (c) textual description in the
Yeri log language. Bold words are reserved words in Yerilog .
. ' __ w_ J -
450 Hardware Description Languages
Tbc description begins by defi ni ng modul es for an inverter 1111'. a 2- input OR gate
OR2. and a 2-input AND gate AN02. We' l l skip discussion of tbose modules, and begi n
our discu"ion wi th tbe defini ti on of the founb module OoorO"eller.
Tbe dcscription declares a modllie named OoorO"eller. The modul e declaration
defi nes a design's name and the names of tbat desi gn's inputs and outputs. known as
pons. Tbe module declarati on says nothing about tbe intcrnals of the design or the
pons-just the design" name and interface.
Tbe descript ion tben defi nes tbe type of each pan, assi gning the types illplI l and
0"11'"1 in thi s example.
Tbe description tben i ncl udes a declaralion of tbe design' s internal wires. named II I
and 112.
Finall y. tbe dcscripti on instantiates the circuit 's componcnt s and defines those com-
ponenls' connecti ons. In tbe OoorO"ell er modul e. tbe descripti on instanti ates a
component named 1111'_/ . wbicb is a componenl of type III I'. Tbe connecti ons to the inputs
and outputs of tbe i nstantiated components arc specified i n tbe order in wbi ch the compo-
nent' s modules declare the input s and output s. In tbe instant iati on of 1111' _1, the input c is
connected to the input x of the IIII' component. In Veri log, the modul e does not need to
specify the interface of a component witbin tbe modul e instanli aling the component. For
example. the OnorO"eller module does not include a decl arati on of whi ch components it
wi ll instantiate or any informati on regardi ng tbose components. The components, of
course. must be defined elsewhere. perhaps earlier in the same fi le as shown i n Figure
9.5(c). or perbaps in anotber file. For reference purposes. tbe exampl e shown here pro-
vides i ncomplete speci fi cations for tbe III I'. AN02. and OR2 component s in order to
clearly show the pons and inter face For lhese component s. In pl ace of speci fying lhe
internal bebavior of these components. we simpl y included an example of a Veri log com-
ment. Comments stan witb . I I" and then any text we want on the rest of the line.
The bold words i n the descri plion represent reserved words. al so known as keywords.
i n Veri log. We cannot use reserved words for names of modules. pons, wires, instantiated
components. etc .. as those words have special meani ng that guide Veri log tools to under-
stand our descriptions.
Summarizing. the Veri log slructural description has a modul e that describes the
design name. li sts the module's inputs and OUlputs, and specifi es the type for each input
and output : a declarati on of internal wi res: and fi nall y. an instanti ali on of all components,
along with lheir il1lerconnections.
Syst emC
Figure 9.6(c) shows a SystemC descripti on of the OoorOpener ci rcuit of Fi gure 9.6(a).
For convenience, we've also shown the Engli sh descripti on i n Fi gure 9.6(b), and the cor-
respondence between the Engli sh descripli on and the SystemC descripti on. The SystemC
language is built on top of the C++ programming language, but it i s not necessary to be
an expen C++ programmer LO use Systemc. However, it i s imponantto keep in mind that
cenain restrictions ex i st as a result , such a not using C++ keywords to name modules,
ports. signal . etc.
Before defining the circuit behavior. we musl include the statement "IIi nc 1 ude
" sy s t emc . h"" at the top of each SystemC fi le. The descripti on begins with an
SC_MODULE declarati on, whi ch defines the design' s name, in thi s case OoorOpeller. The
9.2 Combinational Logic D .. .
escrlptlon USing Hardware Description Languages 4S I
DoorOpener
Inv_'
.include
, .include
/, 'include
:/ , .include
systemc . h
"inv.h"
or2.h
'and2 .h'
(a) :: /
We'll now a circuit whose name is (OOorOpenQr I
The exlernalmpuls are c, hand p, which are 10<
The external output is I, which is a bit ______________ (tl ______ _ C f i h, p:
We assume you know the behavior of these components' /
An inverter. which has a bit input x and bit output F ' ,/ I d og 1 c> nt n2;
A OR gate. which has mpuls x and V, . ,I'I' lnv cc arat ons
and bIt output F. / OR2 OR2 l'
A 2"nput AND gate, which has bit Inputs x and y, / lIND lIN02 i
and bIt output F. // II instantiations
T .. . /" SCS'l'ORtDoorOpenerl . Inv 1 (' Inv 1'1
he CirCUI t has Internal wi res nt and 02, both bits/ /, OR2 1 ('OR2 1') AND ' 2-1 (' -2 .'
,- -C'/ - -' - AND _1 )
The DoorOpener circuit Internally consists of: ,/" /, .. ,:, :. lnv 1 x (c I .
An invert er, named Inv_l, whose input x connects to lnv:1: F (n1 i j
eX,ternallnput c, and whose output connects to nJ......... < _______ OR2_1. x (h) j
A 2'lnpul OR gate named OR2_'. whose Inputs .<, _____ OR2_1. y (pi;
connect to externallllputs hand P. and whose output / OR2_1 . F (n2) ;
to n2. / ________ N-ID2_1. x I nIl;
A 2'lIlput AND gate named AND2_' , whose Inputs lIN02_1 . Y (n2 I ;
connect 10 01 and n2, and whose output connects to AND2_1 . F (f) ;
output f.
Thats all. -------------------------------- ------------.... . '-- ),:.. . ____ _
(b) Ie)
a circui t using a textunl language rather than a graphical drawi ng: (:'1) \ChClmllic. (b) textual
In the Engli sh language. (c) Icx lUal dc\criplion in the language. Bold word" arc n.!\crved words
In Systcmc.
modul e declarati on says nothing aboutlhe internals of the de,ign- justthe de; ign's name.
Within the modul e descri ption, the input and output pons of the design are specified. using
the sc_ill<> and sc_olIl<> statements respecti vely. The descripti on the pon names and
defines thei r types. which in this CHse is type 5c_1 og i c. whi ch speci fies a si ngle bit.
The description then includes a declarat ion of the design' internal si gnal , specified
as sc_siglla/, whi ch are essenli all y internal wires. Next to that declarati on, the descripli on
includes an example of a SystemC comment: ,. I lin te rna 1 wi res". Comments stan
Wi lh "I In and then consist of any text we want on the rest of the l ine.
The modul e then decl ares what component s the design will be using. The SystemC
module does not need to specify the i nterface of the component , but ralher ju t the type
of component as well as a unique name for each component wi thin the design.
The modul e defines a constructor functi on SC_CTOR that is responsible for i nstanti-
at ing and connecting the components within our SystemC design. The conslructor funclion
__ ...: ..... -...I"l._ .... - .. --
-'52 Hardware Description Languages
takes as an argument the name of the current SystemC modul e. which is in thi s ca e Door-
Opener. Following the SC_CTOR statement after the colon is a li st of component
instantiations. The SystcmC module' s inst antiati ons arc used to call the constructor func-
li ons of each componcnl bei ng instantiated. However, we poi nt out that the connections
between the indi vidual component are nO! specilied at this point. Instead. the statements
within the construct or fi nall y define the connections between the components. For example,
the invert er 11II' _I's input x is conne ted 10 c. which is an extemal input. In SystemC, the
module does not need to specify the interface of a component within the modul e. The com-
ponents. of course. must be complet ely defi ned elsewhere. perhaps earlier in the same file,
or perhaps in another fi le. In our SystemC DOO/'Opener desc ripti on, the descripti ons for the
hII'. AND2. and OR2 component s are pecifi ed in other SystemC li les. In order to use those
component s. we must include a statement at the beginning of the current file indicating
where we can find thi s descript ion. For example. our DoorOpener descri pti on includes the
statement "1/ i nc 1 ud e "i n v . h "". and the descri pti on of the component Ill v can be found
wi thin this fi le.
The bolded words in the descript ion represent reserved words. also known as key-
words. in SystemC and C++. We cannO! use reserved words for names of modul es, ports,
signals. instantiated components, etc .. as those words have special meaning that guide
SystemC and C++ 100is 10 understand our descripti ons.
Summarizing. the Sy temC structural description has: a modul e that defines the
design name: a li st of input s and outputs of the modul e specifying their types, a declara-
tion of internal signals: a decl aration of component s providing the name for each
component. a const ructor function instanti ating the modul e's components, and finall y, the
components' interconnecti ons.
Combinational Behavior
HDLs typically suppon the ability to describe the internals of a design as behavior rather
than as a circuit. This abi lity enables us to de cribe the bOHom-level building-block com-
ponents that we use in a des ign. such as the behavior of an AND gate or OR gate.
VHDL
Figure 9.7 contains a behavioral descrip-
tion of a 2-input OR gate. whi ch
you'lI recall we used as a component in
Figure 9.4(c). The de cripti on begins wi th
the declarati ons necessary to use
5 td_l og i C. It then decl are the entit y
with the name OR2 as having t wo input
ports x and y. and having output pon F, all
of type 5 d log i c . whi ch means bit.
The de, cription then defines an architec-
ture named behavior for OR2. That
architecture con'I S!!' of a process. whi ch i,
the VHDL con,truct that describes
behavior. The proce" declaration here i
library ieee;
use ieee.std_logic_1164 . all;
entity OR2 is
port (x, y: in st<t-logic;
F: out std_logic
);
end OR2;
arcbi tecture behavior of OR2 i.
begin
proce (x. y)
begin
F <::0 X or y:
en4 proce ;
end behavior;
Figure 9.7 Behavioral VHDL de,criplion
of an OR gate.
9.2 Combinational Logic Description Using Hardware Description Languages 453
"process(x . y) ". which mean the process should execute from beginning to end
whenever there's a change on x or y-in other words, the process is seflsitive to x and y.
A process body (the part between the process's begin and end) can contain sequential
statements, just like sequential statements in C, but with a different syntaX. The process
shown has onl y one such statement. assigni ng the value of "x or y" to F. "or" happens
to be a built-in operator in VHDL, making the internal description of the OR gate imple.
As another example of a behavioral descripti on, let's revi sit our DoorOpener
example from Figure 9.4(c). for which we created an architecture havi ng a structural
descripti on. We can alternatively create an archit ecture having a behavioral description-
a VHDL entity may have multipl e architecture descripti ons for that same entity.
Assuming the same entity declaration as in Fi gure 9.4(c). we show an alternative archi-
tecture definition in Figure 9.8. The behavior consists of a process that is sensitive to
Input s c, h, and p. When the process executes (which is whenever c. h. or p changes), then
the process executes its one statement, whi ch updates the value off
In designing the DoorOpell er circuit ,
we might start with the behavioral descrip-
li on, and run a simulati on to verify correct
behavior. We mi ght then create a structural
description, and run simulati on again to
veri fy that the circuit has the func-
ti onaliry as the behavi or. In fact, tools exi st
that automatically convert such behavior to
archi tecture beh of OoorOpener i.
begin
process (c. h. p)
begin
f <= Dot (c) and (h or pi;
end procells ;
end beh;
a circuit. Figure 9.8 Behavioral VHDL descriplion of
When writing a VHDL process the DoorOpell er design.
describing a combinational CirCUli S
behavior. care must be taken 10 include all the ci rcuit' s inputs in the proces 's
li st. Omitting an input is not a VHDL error. but such omission results in different
behavior than combinational behavior-wi th an input omitted, the output does not change
when that input changes. meaning there must be some storage in the circuit.
Verilog
Fi gure 9.9 contains a behavioral description of
a 2-input OR gate, which you'lI recall we
used as a component in Fi gure 9.5. The
descri pti on begins by declaring the module
named OR2 and specifyi ng that the module
has three ports named x, y. and F The descrip-
tion then defines that the port s x and yare
both inputs and the port F is an output. The
description then defines the output F to be a
reg output. In Veri log, all ports are by default
assumed to be a wires . which do not store
values. Instead. wires can onl y creme connec-
module OR2lx.y.F);
input x. y;
output F;
reg F;
a1wa.ya @ (x or y)
begin
F <= x I y;
end
enc!module
Figure 9.9 Beh., ioml Veril02
description of an OR gate. -
tions between components. If we want to assign a alue to an output pon. we mu t
defi ne the port to be a reg. which indicate the output pon stores the value - we i20
to the port. The Vcril og code for our design continues with an always procedure that
454 Hardware Description Languages
delines a bl ock of code Ihal wi ll be rcpealedl y cxcc ul ed whcnever a change occurs on
an input in Ihe block', inpul li st. The always procedure declaralion is "a 1 ways @(x
or y ) ". which Illeans Ihe procedure should execllIe from begi nni ng 10 end whenever
Ihere is a change on x or y-in olher words. Ihe procedure is sellsiti ve to x and y. The
always procedurc's Sialements (Ihe pan bel ween Ihe procedure's begill and elld stale-
menl) can contai n sequential Slalements. j usl like sequemial state menl s in C, bUI with
a different sy11l ax. The block shown has only one such Sial eme nt. ass igning Ihe value
of " X I y" lO F. where I is a buil t- in Veri log operali on 10 compute an OR.
As anolher example of a behavioral
descriplion. lel' s revi sil our DoorOpeller
example from Figure 9.5(c). for whi ch we
crealed a slruclural verilog de cripli on. We ea n
alienlalively creale a behavioral description.
Figure 9.10 presems a behavioral Veri log
de cription of Ihe DoorOpeller circuit. The
module declaralion is s imil ar lO Ihe struclural
descriplion of Figure 9.5(c). but in Ihe behav-
ioral description we need 10 declare lhe outpul
f as a reg. The beha ior consis ls of an always
procedure sen. itive 10 inputs c. iI. and p. When
the procedure execules (which is wheneve r c.
module DoorOpener (c, h, p, f) ;
input c, h, Pi
output fi
reg f:
always @(c or h or p)
begin
f <= (-c) & (h I p);
end
endmodule
Figure 9.10 Behavioral Veri log
descriplion of Ihe DoorOpener design.
iI. or p changes). Ihen lhe procedure execules a si ng le slatemenl thai updales lhe value
off, by assigning Ihe value .. (-c) & ( h I p) ". whe re - . &. and I perform the inven ,
AND. and OR operalions. respectively.
In designing lhe DoorOpeller circuit. we mighl sian wi lh Ihe behavioral descriplion,
and run a imulation 10 verify correCI behavior. We mi ghl lhen creale a slructural descrip-
tion. and run a simulati on again lO veri fy Ihal Ihe ci rcuil has lhe ame fu ncti onality as the
behavior. In fact. tools exist lhm aUlomalicall y conven such behavior to a circuit.
SystemC
Figure 9.1 1 a SyslemC behavioral
de criplion of a 2-inpul OR gate. which
you'll recall we used as a component in
Figure 9.6(c). The SystemC description
declares lhe module wi lh the name OR2
and has IWO inpul pons x and y and one
OUIPUI pon F. all oflype sCl ogi c . indi -
cati ng each inpul and output is an
individual bi!. Thc modulc defines thc con-
lructor function SC_CTOR that consisls
of a proce" named comh/ogic
'include systemc h-
SC_ MODOLE (OR2)
(
Bc_ in<Bc_ logic> x, y i
8c_ out<sc_ logic> F:
SC_METHOD (comblogic) ;
sensitive x Y j
void comblogic ()
(
)
};
P . writ. (x. read () I y. read!)); defined a' a SC...METflOD. SC...METflOD
is one Sy'tcmC con'lnlct lhat describes
behavior. The declaration here
SC.METHOD (comblogic); s en -
Figur.9.11 Behavioral Sy'lcmC de.'criplion
S i i v e < /, < < y; ". which mean' Ihe of an OR gale.
Testbenches
9.2 Combinal ional Logic Descri ption Usi ng Hardware Descri ption Languages
455
process wi ll execule the c . beh' .. .
. h orcUI[ aVIOr descnbed on the funcuon comb/ogic whenever there
IS a c ange on x or y. In other words, the process is sellsilive to x and y. The process body i
defined on the funcllon comb/agic and is declared as "v 0 i d comb 1 09" c ( ).. The
f . ( h . process
unCUon t e pan belween the open brace " {" and close brace "}" ) can contain sequential
sta!emenlS, JUSI li ke sequential sta!emenlS in C or C++. bUI somelimes requires different
synt ax. The process shown has onl y one such Slalement. writing the value of " X . re a d () I
y . rea d ( ) 10 F, where I execules an OR operali on. In SystemC, one can read the current
value an onput pon using Ihe readO function and can wri te a value to an outpul pon using
lhe wnleO funcllon. Whol e we can use other melhods of accessing lhe inpul and Outpul pons
lhe readO and wrile() functi ons are recommended. '
'include systemc. h
As anDlher example of a behavioral
description, let 's revisil our Door-
Opel1er exampl e from Fi gure 9.6(c), for
whi ch we crealed a slruclural SystemC (OoorOpener)
descriplion. We can alternatively creale
a behavioral descripti on. Figure 9. 12
present s a behavioral SystemC descrip-
tion of the DoorOpel1 er ci rcuit. The
module declaration is the same as lhe
slructural descripti on of Fi gure 9.6(c).
The behavior cons iSIs of a single pro-
cess. named comb/ogic, Iha! is sensilive
to inputs c, II , and p. When the process
execul es (which is whenever c, ii , or p
changes), Ihen the process executes ils
one statement , which updales Ihe value
Bc_ in<8c_ logic> c, h, p;
sc_out<8c_logic> f:
SC_C'I'OR (DoorOpener)
(
SC_METHOD{cornblogic) ;
sensitive c h Pi
vo i d comblogic ()
(
)
);
. write ( I-c. read () & (h. read () I
p. read i)) ;
of f by assigning lhe value Fi 912 B h . I S
Igure . e ,vlora YSlemC deSCription of
"(-c . read( & (h.read() I the DoorOpellerdeSlgn.
p . re ad() )", where - performs an
invert operalion. & performs an AND operation. and I perfomls an OR operation.
In designing Ihe Door?petler circuit, we might stan Wilh the behavioral description.
and run a SImul atIon 10 venfy correct behaVIor. We mIght then creale a tructura] des .
ti on, and run simulation again 10 verify lhat lhe circuil has the s:u."e functionality
behaVIor. In facl, lools exist lhat automati cally conven such behavlOr to a circuit.
One of Ihe main uses of an HDL is Ihal of si mulating a new design to ensure that th
design is correcl. To simulate a design, we need 10 sel Ihe de ign' S inputs to certru::
values, and then check lhal the design's output values are whal we expecl them to be. A
syslem Ihm sets inpul values and checks output value IS known as a leslbellch. \ e now
show how 10 create an HDL test bench to test our DoorOpeller circuit.
Hardware Description Languages
VHDL
Figurc 9. 13 shows a VHDL
leslbench for Ihc DOOl'Opeller
design of Figure 9."(c). I mice
that the entilY. named Tesr-
bellch. has no pons- the entil Y
is self-contained. requiring no
inputs and generating no out-
pUIS. The archilccillre declares
Ihe componelll Ihat we plan 10
Icst-namely. the DoorOpeller
component. The archit ecture
instantiates one instance of the
DOOIOpellercomponent. which
we named DoorOpell er/. A
single process in the architec-
ture sets the inputs of the
component and checks for
correct output. This test bench
tries all possibl e cases of the
three inputs. of which there are
eight cases. Many components
have too many inputs to Lry all
possible cases-in that situa-
tion. we might try border cases
(e.g .. all Os. all 1 s) and then
some random cases.
Each case sets the three
inputs of the component to a
panicular input combinaLion.
and waits for those values to
library ieee;
use ieee. st.d_logic_1164. all;
entity Testbench is
end Tes tbench;
architecture behavior of Testben ch i8
component DoorOpener
port ( c, h, p: in std_logic ;
f: out std_logic
);
end component :
signal c, h, P, f : std_logic;
begin
DoorOpenerl: DoorOpener port map (c, h, p, f 1 ;
process
begin
-- case
c <= '0'; h <= ' 0 '; p <= '0';
wait for 1 n8 ;
assert (f=' 0') report -Ca se a failed- i
-- case 1
c <= '0'; h <= '0' i P <= '1' ;
wait for 1 ns ;
assert (='1') report Case 1 failed
M
;
-- (cases 2-6 omitted from figure)
-- case 7
c <= '1' ; h <= . 1 '; p <= '1';
wait for 1 ns ;
assert (f=' 0') report "Case 7 failed
M
;
wait ; -- process does not wake up again
end process ;
end behavior;
Figure 9.13 Behavioral VHDL descriplion of DoorOpell er
lestbench.
propagate through the component-we arbitraril y wai t for I ns of simulated time, but
could have picked any time, since we didn' t actuall y create a time delay within the com-
ponent. But we do have to wait for some Lime. as VHDL simulation is defined such that
no signal is updated instantaneously. but rather after an infi nitely small period of simu-
lated time. After waiting, each case checks for the correct value on the output!. using an
assert statement. If the condition of the assen statement evaluates to lme, simulation pro-
ceeds to the next statement. But if the conditi on evaluate to fal se, the corresponding
error me<;sage Wi ll be reponed and the simulati on will terminat e.
9.2 Combinational Logic Description Using Hardware Description Languages 457
Verilog
Figure 9.14 shows a Veri log test-
bench for the DoorOpeller design of
Figure 9.5(c). Notice that the
modul e, named Testbell ch, has no
pons-the modul e is self-contained
requiring no inputs and
no outputs. The module fi rst
declares three registered signals c,
h, and p and a single wire f. The
Signals c, h, and p are declared as
reg because we must assign values
to the signals that wi ll be connected
to the inputs of the design we are
testing. However, because we do
not need to assign a value to the
output we are monitoring, the
signal ! is decl ared as a wire. The
test bench then instanti ates one
instance of the DoorOpeller compo-
module Testbench:
reg c, h. p;
wire f;
OoorOpener DoorOpenerl (c. h. p. f);
initial
begin
II case 0
c <= 0; h <= 0; P <= 0:
. 1 $dioplay (" f = %b". f);
II case 1
c <= 0; h <= 0 ; p <= 1;
tl $dioplay (" f = tb". f);
II (cases 2-6 omitted from
/I case 7
c <= 1; h <= 1: p <= 1;
11 $dioplay (" f = tb". f);
end
endmodule
figure)
Figure 9.14 Behavioral Verilog deSCription of
DoorOpener test bench.
nent : named DoorOpeller/ , and connects the inputs and outputs of the component t
our Internal ignals. The testbench then contains an initial procedure that defines 0
block of code that will be executed onl y once when executi on of the te tbench
The IIlIHal procedure sets the inputs of the DOOiDpell er component and di splays the
resulting value of the component ' S output. This testbench tries all po sible cases of
the three Inputs. of whi ch there are eight cases. Many component have too man
Inputs to try all possi ble cases-in that situation, we might try border case (e.!!.. at;
Os, all Is) and then some random cases. -
. Each ca e sets the three inputs of the component to a particular input combina_
lion, and waits for those values to propagate through the component-we arbitrarij
wait for I unit of simulated time using the delay contTol statement "1/1". but we
have pi cked any length of time, since we didn' t actuall y create a time delay within
the component. The Veri log language does not define standard. time unit. uch as
nanoseconds. but Instead simply defines lime 111 term of lime uruts. which a de .
. h' . . . W d h t . f Igner
can use wit 111 a sm1Ulallon environment. e 0 ave 0 waH or some time. as the
within the test bench are nonblockmg statements that are not Updated
untll the current simulation time completes. After wUlung. 7.
ach
case outpUts the value
of the output ! USlllg a $disp/ay statement. The statement $dlsplaY( "f lb"
flU outputs the value of! in binary. For If the value of! is l. then
di splay . tatement will output "f = 1 .... The display stalement consist of a format
slnng followed by a comma-separated hst of wIres. regl ters. or pons. \ nil th
format strint! of our display statement. the '7cb indicates thut the value of.
L
111 e
. _ ..' . . u,e I!!nal
speCified after the format string Will be displayed m bmaf)'. After SImulation has -
pleted. we can compare the values output during simulation to the expe ted \ lue om-
determine if our ircuit is working correctl y. . t
-
458 Hardwa re Description Languages
SystcmC
Figure 9.15 , how' "Sy'lem Ic't-
bench for the Doo,.Opel/cr de,ign
of Fi gure 9.6(c). I oli ce that Ihe
module. Tellbel/ ch. h,,' three
outpul port'>. C_I. h_l. and 1'_1. and
one inpul purt JJ In Sy"cm . we
de<i gn Ihe le'ihcnch ci rcuil '" II
separate module thai connects 10 the
design we arc le'ting. Therefore. for
every inpul port on Ihe circuit we
arc leMi ng. our ICMbcnch will have
II corr"polltling outpul port . Like-
wise. for every output port on
the circuit we arc leMing. OUf tcM-
bench wi li have a corresponding
input port. The t"tbench module
define"> a I\inglc proce" nallled
le.HIJel/ch-IlroC. The tes tbench
proces;. is defined a, an
SC_TNREAD. which is simil ar to
an SC_METHOD process excepl
that Ihe SC_THREAD ali ows LI S to
li se Ihe ",aitO function within Ihe
process body to control Ihe timi ng
behavior of Ihe process. In contraSI.
SystcmC doc;. not allow us 10 usc
the waitO function within an
SC_METHOD process. The lest-
bench process controls the inputs of
the circuit we arc testing and checks
for correct output. Thi s testbench
tries all possibl e cases of the
Doo,.Opel/e,.,. three inputs. of
whi ch there arc eight cases. Many
'include systemc. h-
se_MODULE (Test.bench)
{
8C_ out<8C_ logic> C_t. h_t. p_t;
8c_ in<8c_ logic> f_t;
SC_ CTOR (Tes tbench)
(
se_THREAD (testbench...,proc) ;
void testbench-proc ()
(
)
I;
II case 0
c_t .write ISC_LOGIC_O);
h_t . write ISC_LOGIC_O);
p_t . write ISC_LOGIC_O) ;
wait l!. SC_NS);
a ert ( f_t. read () =.: SC_LOGIC_O );
II case 1
c_t. write I SC_LOGIC_O) ;
h_t. write (SCLOGIC_O) ;
p_t. write ISC_LOGIC_l) ;
wait ll. SC_NS);
sssert ( f_t. read () ==
/ I (cases 2-6 omitted from fi9\lre)
/ I case 7
c_t. write ISC_LOGIC_11 ;
h_t . write I SC_LOGIC_11 ;
p_t . write I SC_LOGIC_l I ;
wait ll. SC_NS);
assert ( f_t. read () == SC_LOGIC_O );
Figure 9.15 Behavioral SYSlemC description of
DoorOpe1l er tcslbench.
componellls have too many inputs to try all possible cases-in that situalion. we might try
border cases (e.g .. ali Os. all Is) and then some random cases.
Each case sets the Ihree inputs of the Doo,.Opell er circuit to a parti cul ar input com-
bination. and wai ts for lhose values to propagate through the component-we
arbitraril y wai l for I ns of simulated time, but could have picked any time, si nce we
didn't aCluall y create a time delay within the component. But we do have to wait for
some time. as SystemC si mulation is defined such thai no signal or pon is updated
instantaneously. but rather after an infinitely small period of simul ated time. After
waiting, each case checks for Ihe correct output by reading the portLt using an assen
statement. If the condition of the assert statement evaluates to true, simul ati on proceeds
93 Sequenual Logic Description USing Hardwaro DeScription LangUBgos
to the Ilt!\( MalClllcnt. But if the condition '" tahl' , "IHulnlmn \\111 ' lOP and
the corre'ponding emIr \\ III be reported
In Y, lem . ",ch." 0 "nd I ,Ire ',lluc' "nd IIut logl' ' UlliC'. 1", lcild.
Sy'tcmC define' thc ,,,Iuc' CJOGIC .O "nd . _IOGle I Ih.1t corre'I","d h'lhe logic
,alue, of 0 nnd 1. re'pcltI\<!). \\hlch \\c u,cd dC'Cnplltlll
9.3 SEQUENTIAL LOGIC DESCRIPTION USING
HARDWARE DESCRIPTION LANGUAGES
Register
The mo;.t bo" componenl 111 equentlal Inglc I' a rc!,!"la. We no\\ , 11,,\\ h'" 10 llIodel II
basic regi,ter 111 IIDL, .
I-IDL
Figure 9.16 ,ho\\, " ba\lc
4-bit register m I IDL.
The register is identical In
that dC\cribed In Fi gure
3.30. The entll} <Ichne, the
data mpul I and the <Ial,1
output Q. "' \Veli u, the
input ciA. The inpul I
and output Q of thi' de"gn
corrcspond to 4-bll vil lucs.
of u,ing eight lIl<li-
vi dual .Ild_logi inl"l" and
output s. the entity", I and
Q port, arc defined as
.<ld_108ic_,ec/Or. A .flle
Hbrary i. I
u.. _lovlc 11f.4.all,
entity R V4 1.
P rt I I in d loQle velar I J downto 01:
);
. out td lOQlc v clot () CSownto 0):
lk in ld 1000Ie
en4 Pf!lg4:
architeoture of Rf"U4 1.
begin
proc tcUc:)
begin
it (elk '1' and clk' ev. ne ) t hen
o < 1:
_"" U ;
.04 proc I :
.04 behevior:
toxic_,'ector j, a vector. or
array. of multiple sl(elallic Figure 9.16 BchavlO,"1 VIIDL UC"'"P""" 0[" 4-bn rcgl"cr.
clemen IS. For example. Ihe
type declarati on "s td 1 ogi e vee or (3 down 0)" dehne, a 4bil vector of
SId_l ogic clement. where the bit wi thin the vector arc numbered from 3 to O.
The dowllto statement defines the ordering of the elemenl s within the vector. indicati ng
that element 3 is located in Ihe leftmo,t po\it ion. The ,wtcment "I <- . 1000" would
thus assign the value '1' to position 3 of the veclor I (l nd the value '0' to the remaini ng
three posi li ons. When assigning a value to a .f ld_logic"ector. the vector'b value
specified within double quotalions. For example, the deCimal value 5 would be specifi ed
as a 4-bit sl(U ogicveclOr as "0101". . .' .
The architecture describe the regi ster behavlonllly, u, ms a proce,s Matement. rhe
process is sensitive to its elk input onl y-because the should only update output
during a ri sing clock edge. the process need not execute If mput I changes. If elk change.
th ocess begins executing its statements. The first statement checks If the process began
due to a rising clock edge (0 to 1), as opposed to a falling edge (1 to 0).
The statement checks for a ri ing edge by cheekmg If the elk mput Just changed
-'60
Hardware Description Languages
(c 1 k ' e Ven t) and that change was 10 a 1 (c 1 k= ' 1 ' ). If the process began executing due
10 a rising clock edge. then the process updates the register's contems using the statement
" 0 (= I ". For a fa lling clock edge. the process will begin executing, check the i{statement
condition. and then reach the end of the process and hence stop executing, without updating
Q. Ideally. YHDL would have a way to begin executing a process onl y on a ri sing clock
edge. but YHDL has no such feature.
In VHDL. output ports are a type of signal. and signalS have memory in simulation.
Thus. ass igning I to Q causes Q to retain the new value. even when the process stops exe-
cuting, thus implement ing the storage part of the register.
Yerilog
Figure 9. 17 hows a basic 4-bi t regi ster in Veri log. The
register is identical to that described in Figure 3.30. The
module defines the data input I and the data output Q,
as weli as the clock input elk. The input I and output Q
of thi s design correspond to a 4-bit value. Instead of
using eight individual inputs and outputs, the module's
I and Q ports are defined as veClOrs. For example. the
type declaration " i npu t [3 : 01 f" defines a 4-bit
input vector where the bit positions wi thin the vector
are numbered from 3 to O. The [3:0] defines the
module Reg4 (I. Q, elk);
input [3, OJ I;
input elk;
output [3 , OJ Q;
rell [3,OJ Q;
always @ (poaedge elk)
begin
Q <= I;
end
endmod.ule
ordering of the elements within the vector, indicat ing Figure 9,17 Behavioral Veri log
that element 3 is located in the leftmost positi on, The descri ption of a 4-bit register.
statement "I <=4' blOOD" would thus ass ign the value
1 10 po ition 3 of the vector I and the value 0 10 the remai ning three positions, When
assigning a value to vector. we must specify the number of bi ts wi thin the value we are
as igning, the base in which we are specifying the value, and the value itself. For example,
the decimal value 5 would be specified as 4-bit binary value 4 'bOIOI,
The module describes the register behaviorall y, using an always procedure, The proce-
dure block i sensi tive to the positive edge of the elk input , specified using the posedge
keyword-because the module should only update its output during a ri sing clock edge,
the always procedure need not execute if I changes, On the positi ve edge of the clock, the
procedure update the register's contents using the statement "0 < = I ", Because we
defined the output Q as a reg, ass igning I to Q causes Q to retain the new value, even when
the procedure is done executing. thus implementing the storage part of the register,
SyslemC
Figure 9, 18 shows a basic 4-bit register in Systemc. The register is identical to that described
in Figure 3.30. The module defi nes the data input I and the data output Q, as well as the clock
input elk. The input I and output Q of thi s design correspond to a 4-bi t value. Instead of using
eight individual sc_logic inputs and outputs, the module's I and Q ports are defined as sc)v
logic vector. An .fc_fl is a vector of multiple se_logic elements. For example, the type decla-
ration "5c_l v<4)" defines a 4-bi t vector ofsc_logic elements where the bit positions within
the vector are numbered from 3 to O. In Systemc. the orderi ng of the elements within the
vector is defined such that the leftmost position i the most significant bit. For example, the
statement "I <= " 1000 "" would thus aSl>ign the value 1 to posi ti on 3 of the vector I and
the value D to the remaining three positions. When assigning a value to an sc_lv. the vector's
Oscillator
9.3 Sequential Logic Description Using Hardware Description Languages 461
value must be specified wi thin double quotations. For
example, the decimal value 5 would be specified as a
4-bit sc_lv as "0101". Notice that in defi ning the
input port for I , we included a space between the two
closing angle brackets, >, the space being required in
SyslemC.
The modul e consists of a si ngle process, named
seq_logic, that is sensitive to the positi ve edge of the elk
input. specified using the sell silive..JJos statement for
defining the sensiti vity li st- because the modul e
should onl y update its out put duri ng a ri sing clock
edge, the seq_logic process need not wake up if I
changes. On the positi ve edge of the clock. the register
updates the regi ter's coments using the statement
'O . write(l.read(".
In Systemc. output ports are a type of signal,
and signals have memory. Thus, assigning I to Q
causes Q to retain the new value, even when the
process is done executing, thus implementing the
storage part of the register.
YHDL
.include systemc.h-
SCJ<ODtILE I Reg4)
(
8c_ in<8c_ lv<4:> > I;
8c_out<8c_ lv<4> :> 0;
8c_ in<8c_ logic> elk;
SC_METHOD (seQ....-logic) ;
aenaitive-P08 elk;
void seCLlogic ()
(
I
I;
Q. write (I. r ead () ;
Figure 9.18 Behavioral SystemC
descriplion of a 4-bit register.
The register presented in Figure 9.16 has a clock inpu!. We thus need to define an oscill ator
component that generates a clock signal . Fi gure 9. 19 illustrates an oscill ator de cribed in
YHDL. The entit y defines one output , elk. The archi tecture consists of a process. but notice
that process does not have a sensitivity li s!. By default . such a proce executes irs tate-
ments as if they were encl osed in an infinite loop. So the process sets the clock 10 O. leeps
until iO ns of si mul ated time passes. sets the clock to 1. sleep another 10 ns of simulated
time, goes back 10 the first statement in the process that ets the clock to O. and so on.
The output waveform for such an 0 cili alOr wi ll be identical to the waveform shown in
Figure 3.17. library ieee;
The wait/or statement in YHDL tell the us. ieee. std....logic_1l64 .all;
simulator the amount of simulated time that
the proce s should not execute. A proce s
wi lhol/I a sensiti vi ty list IIIlISI have at least one
wait statement . otherwise the simul ator will
never fini sh simul ati ng that process (because
the process is in an implici t infi nite loop). and
thus the simulator will never get a chance to
update outputs or to simulate other proces es.
On the other hand. a process lI'ilh a sensitivi ty
li st call/IOI include wait statements. because
by defi niti on. the sensitivity li st defi nes when
the process should execute.
entity Osc i.
port ( elk: out stCLlogic );
end Osc;
architecture behavior of Ose i.
begin
proceS8
begin
clk<='O';
wait for 10
elk <= '1';
wait for 10
end proce8. ;
end behavior;
n.a ;
n.a ;
Figure 9.19
HDL oscillator description.
462
Hardware Description Languages
Verilog
The register prescnt cd in Figure 9. 17 has a cl ock input. Wc thus
need to define an oscillator component that gcnerates a clock
signal. Figure 9.20 illustrates an oscillator described in Veri log.
The module defines one output. elk. The module consists of an
olll"Q."s procedure. but noti ce that the always procedure does not
have a scnsitivity li st. By default. such a procedure execut es its
statements as if they were enclosed in an infinite loop. Assuming
we arc using a time scale of nanoseconds, the always procedure
sets the clock to O. delays for 10 ns of simulated timc. scts the
clock to 1. delays for anot her 10 ns of simulat ed time, goes back
modu1e Osc (e lk) ;
output elk;
elk;
a lways
begin
elk <= 0;
810;
elk <= 1 ;
#10;
end
endmodu l e
to the first statement in the procedure that sets the clock to 0. and Figure 9.20 Veri log
so on. The output wavefonn for such an oscill ator will bc iden- osc illator description.
ti cal to the waveform shown in Fi gure 3. 17.
The delay control statement. specified with the # character. tell s the simul ator the
amount of simulated time that the procedure should not execut e. A procedure lVilhol/l a sen-
sitiviry li st 1111151 have at least one delay cOnLrol stat ement, othcrwise the simulator wi ll never
fini sh simulating that procedure (because the procedure is in an implicit infinite loop), and
thus the simulator will never get the chance to update outputs or to simulate other procedures.
On the ot her hand. a procedure lVilh a sensiti vit y li st COIIIIOI include delay control statements,
because by definition the sensitivity list defines when the procedurc should awake.
SystemC
The register present ed in Figure 9. 18 has a clock
input. We thus need to define an oscillator com-
ponent that generates a clock signal. Figure 9.2 1
illustrates an oscillator described in Systemc.
The module defines one output. elk. The module
consist of a single process, named seq_l ogi c.
implemented as an SC_THREAD. By default. an
SC_THREAD prace s is onl y executed once. In
order to ensure the process executes continu-
ously, we encl ose the statements within the
proce s in an infinite loop. implemented using
the tatement "\01 h i 1 e ( t rue )". Thus, the loop
will execute the statement included withi n the
braces forever. During execution, the process sets
the clack to 0, suspends executi on for 10 ns of
simul ated time, sets the clock to 1. sleeps another
10 ns of simulated time, l>e ts the clock to 0, and
#include "systemc h"
SC_ MODULE (Osc)
(
;
void seCLlogic ()
(
)
) ;
while (truel {
elk . write (SC_LOGIC_OI;
wait (10, SC_NSI ;
elk. write (SC_LOGIC_ll ;
wait (10, SC_NS I;
Figure 9.21 SystemC oscill ator
so on. The output waveform for such an osci ll ator descripti on.
will be identical to the waveform in Fi gure 3. 17.
The wail() functi on in SystemC tells the simulator the amount of simulated time that
the process :, hould not execute. For example. the statement "wa i t ( 10 . SC_NS); " wi ll
su pend the execution of the process for 10 nl>. An SC_TfIREAD process expli cit ly
implementing an infinite loop IIIU.51 havc at least onc wait sWtCl11cnt . otherwi se the
Controllers
9.3 Sequential Logic Description Using Hardware Description Languages
463
simulator will ne fi ' h h ' .
I ver 1I11S Simulating that process (because t e process tS III an infinite
oop), and thus the simulator cannot update outputS or simul ate other processes.
Recall that a common type of
sequent ial circuit is a controll er
which implements a
machine. The controll er consists of
a state register and combi national
logic.
VHDL
Figure 9.22 shows one way to
model a controller in VHDL. The
controller modeled is described by
the FSM shown in Fi gures 3.38
and 3.39. The VHDL entity, named
LaserTIlll er , defines the controll er'S
inputs and outputs.
The VHDL architecture
describes the behavior of the entity.
The archi tecture consists of two
processes, one modeling the state
register, the other modeling the
combinational logic, that form the
standard controll er archi tecture
from Figure 3.47.
The first process descri bes the
controll er' s state register. That pro-
cess, named stafereg. is sen ilive to
inputs elk and rsl . If the rsl input is
enabled, then the process asyn-
chronously sets the Cllrrell/s/{/Ie
signal to the FSM's init ial state,
S_Ojf. Otherwi se, if the clock is
ri sing, the process updat es the state
register wit h the next tate.
Figure 9.22 Behavioral VHDL
descripli on of the LLrser7imer controller.
library ieee;
use ieee. .all
entity LaserTimer is
port (b : in stCLlogic;
x: out std_logic;
elk. rs t: in std logic
I;
end LaserTimer;
architecture behavior of LaserTimer i.
type statetype is
(S_O ff, S_Onl, S_On2, S_On3);
signal currentstate. nextstate:
statetype;
begin
statereg: proceaa (clk, rst)
begin
if (rst= '1') then -- int.ial state
currents tate <= 5 Off
e1aif (clk= ' !' and clk' e';'8Jlt ) then
<= nextstate;
end process ;
cornblogic : procesa (currents tate ,
begin bl
case currents tate ia
when 5_0f f =>
x <= '0'; -- laser off
if (b='O' I th"n
nextstate
elae
nextst.ate
end if ;
x <= '1'; -- laser on
nextstate <= 5_On2;
when 5_002 =>
:: still on
when 5_0n3 =>
x <= '1';
nextstate
end case ;
end process ;
end behavior;
laser st.ill On
S_Off;
The c"rrell/stale and lIexlstale signals are defined as a type. named
slatelype. The statet ype is defined by the type. statement and speCIfies the po ible values
a signal of that type can represent. In SpeCtfYlllg slatel ype. whIch repre ents the tates of
an F M, the Iype declaration consists of the names of all the states in Our controller s
cifi cally S_Ojf. S_OIlI. S_01l2. and S_O/l3. . pe-
464 Hardware Description Languages
The second process describes the cont roller' s combinational logic. That process,
named cOlI/btogi c. is sensitive to the input s to the comblll atJOnal logtc of FI gure 3.47,
namely. the external inputs (in Lhi s case. b). and the stat e outputs (curreIlISIGle).
When either of Lhose items change. the process sets the FSM s outputs, In tht s case x,
with the appropri ate value for the current stat e. The process al so detenlllnes what the next
state should be, based on the current state and the values of lllputS (Le .. the condllJOns on
Lhe FSM transitions) . The next state will be loaded int o the state regtster by the state reg-
ister process on the next ri sing clock edge.
Notice that the archit ecture
declares two signals, CUrre/llSlOle
and lIeXISlate. Signals are visibl e
across all processes in an architec-
ture. The CUrreJ1fstate signal
represent s the actual storage of the
Slale register. The ll exfs(Qte signal
represent s the value coming from
the combinational logic and going
to the state register. Notice also that
the architecture declares those
signals as rype slOl erype. defined in
the architecture as a rype whose
value can be eiLher S_OiJ. S_OIlI.
S_On2. or S_01l 3.
Verilog
Figure 9.23 shows one way to
model a controll er in Veri log. The
controller modeled is described by
the FSM shown in Figures 3.38 and
3.39. The Veri log module, named
LaserJimer. defines the controll er's
inputs and outputs.
The module consist of two
procedures, one modeling the state
regi ster. the other modeling the
combinational logic, that togeLher
form the tandard controll er archi -
tecture from Figure 3.47.
The state register procedure is
sensiti ve to the po iti ve edge of Lhe
rSI input and the positive edge of the
elk input. The sLaLe regi ter has an
asynchronous reset signal and in
order to model the asynchronous
reset. the staLe regi ster procedure
must be to the positive
module LaserTimer {b, x, elk. rst);
input b. elk, rst;
output x;
reg x;
parameter 5_0ff
5_0nl
5_0n2
5_0n3
2 'bOO.
2 'bOl,
2 ' blO,
2 ' bll ;
reg [1 : 0] currentstate;
reg [1: 0] nextstate;
/ / state register procedure
always @ (posedge rst or posedge
begin
if (rst==!) / I initial state
currents tate 5_0ff ;
else
currents tate nextstate i
end
II combinational logic procedure
always @ (currents tate or b)
begin
case (currents tate)
S Off : begin
-x <= 0; II laser o ff
if (b==OI
nextstate
else
nextstate
end
S On1 : begin
-x <= 1; / I laser on
nextstate <= 5_0n2;
end
S On2: begin
-x <= 1; II laser st ill on
nextstate <= 5_0n3;
end
S On3: begin
-x <= 1 ; /1 laser still on
nextstate S_Off;
end
endcaee
end
enc1module
elkl
Figure 9.23 Behavioral Vcril og description of the
Loser7imer controller.
9.3 Sequential Logic Description Using Ha rdware Description Languages
465
edge of the rSI input. On the positive edge of the rSI input , the procedure wi II wake asyn-
chronously and sets the currelllSIGle signal to the FSM's ini ti al state, S_OiJ. On the ri sing
edge of the cl ock input, elk, if the reset input is not enabled, the procedure updates the state
register wi th the lIeXISlme value determi ned by the combinational logic procedure.
In Verilog, we must expli citly specify the size of the state registers as well as define
the values associated with each state within the FSM. Within the LaserTimer module we
declare four parameter values, namel y, 5_0ff, 5_0111 . S_01l2. and S_01l3. whi ch specify
the values assigned to each state within the FSM. For exampl e. "5_0 f f 2' bOO"
defines the state name S_Off and assigns the 2-bit value "00" to thi s state. We can then
refer to thi s state th roughout the modul e using S_Off instead of using spec ifi c bit val ues.
Whi le not required to define a state machine, using parameters increases the readability of
our design and makes revi sions to the FSM much easier. As the LlIserTill/ er s FSM has
four stat es, we need a 2-bit state register. and we therefore declare the currelllSlal e and
lIexlstal e signals as 2-bit registers.
The second procedure is the combinational procedure implementing the control logic
of the FSM. That procedure is sensitive to the inputs to the combinational logic of Fi gure
3.47, namely, the external inputs (i n this case. b). and the state regi ster OutpulS (cllrrelll-
Slate). When ei ther of those items change, the procedure sets the FSM's out pulS. in thi s case
x, with the appropriate value for the current state. The procedure also determines what the
next state should be, based on the current state and the values of inpulS (i.e .. the conditions
on the FSM transiti ons). The next state wi ll be loaded into the state register by the tate
regi ster procedure on the next posi ti ve clock edge.
Notice that the module declares two signals, CllrrelllStale and lIeXISlal e. Signals are
visible across all procedures in a modul e. The c"rrelllstal e signal represenlS the actual
storage of the state register. The /l exlstale signal represents the value coming from the com-
binaLi onal logic and going to the state register.
SystemC
Figure 9.24 shows one way to model a controller in Systemc. The Controller modeled is
described by the FSM shown in Figure 3.38 and Ftgure 3.39. The module. named Laser-
Jimer. defines the cont roll er's inputs and outputs.
The module consists of two processes. one modeling the SLate regi ster named
sral er eg, the other process modeling the combinational named combl ogic. that
too ether form the standard controll er architecture from Ftgure
o The state register process is sensiLive to the positive edge of the rSI input and the po _
iLive edge of the elk input. The state register has an reset signal. In order to
model the asynchronous reset. the state regt ster process IS senSlllve to the po itive edge of
the rSI input. On the positi ve edge of the Input. the process wtll wake
and sets the Cllrrell/stal e signal to the FSM s mlll al state. S_OiJ. On the nSIng edge of the
clock input, elk, if the reset input is not enabled. the process updates the state regi ster
wi th the /l eXISlllle value determined by the combmallonal logtc proce 's.
The cllrrelllstale and lIeXlslate signals are defined as a u:er-defined type, name<!
. d fi ned by the elllllll statement and spect fYIng the possible values a
Stal elype. slllIelype tS e . . , . -
signal of that type can represent. In pectfymg stalelype, \\ htc.h. the state' of an
FSM. the elll/III declaration consists of the names 01 all the states tn Our controller. spe-
cificall y S_Off. S_O/l/ , S_01l2, and S_O/l3.
-'66 Hardware Description Languages
The second process.
named cOll1blogic. is sensi-
tive to the inputs to the
combinational logi c of
Figure 3.-'7. namely. the
external inputs. and the
state regi ster output s. When
either of those items
change. the process sets the
FSM' s out puts. in this case
x. with the appropriate
value for the cun'ent state.
The process also deter-
mines what the next state
should be. based on the
current state and the val ues
of inputs (i.e" the condi-
tion s on the FS M
transitions). The next state
will be loaded into the state
register by the state regi ster
process on the next rising
clock edge. Within the fi rst
state. we determi ne the next
state depending on the
value of input b by per-
forming the compari son
" b . read() SC
LOG I C_O". Note that the
compar ison for equality
uses the syntax "=="
Instead. if we accidentall y
used the syntax " =", which
is a valid statement. our
design would function
incorrectly.
Notice that the modu le
two sc-""igl/a/s.
currentSlale and nex/sltl te.
Signals are visible
.include "systemc.h"
SC_ MODULE (LaserTimer)
{
sc_ in<sc_ logic> b, elk. rst;
Bc_ out <sc_ logic> x;
sc_ signal <statecype> currents tate . nextstate;
SC_ CTOR (LaserTimer) {
SC_METHOD (statereg) ;
sensitive-pos rst elk;
SC_METHOD (comblogie) ;
sensitive currents tate b;
void statereg () {
if ( rst. r ead () SC_LOGIC_l)
currentstate S_Off; I I initial state
else
eurrentstate nextstate;
void comblogic ( )
)
);
switch (eurrentstate)
case S_Off:
x. write (SC_LOGIC_O); II laser off
if ( b .read () == SC_LOGIC_O
nextstate S_Off;
else
nextstate = S_Onl;
break ;
case S_Onl:
x . write (SC_LOGIC_l); I I laser on
nextstate S_On2;
break ;
case S_On2:
x . write (SC_LOGIC_l); I I laser st ill on
nextstate S_On3;
break;
case S_On3:
x. write (SC_LOGIC_Il; II laser stil l on
nextstate = S_Off;
break;
Figure 9.24 Behavioral SystemC description of the
Ltlser7imer conlroll er.
all proce, 5es in a module. The currell i stale signal represents the actual storage of the state
regi ster. The lIeXl.\IlIIe signal represent s the value coming from the combinational logic
and goi ng to the state register. Notice also that the arch it ecture declares those signals as a
type ,f/alelype. defined in the architecture as a type who,e value can be either S_OjJ.
.5_0"1. S_01/2. or S_01l3.
9.4 Datapath Component Descri ption Usi ng Hardware Description Languages 467
9.4 DATAPATH COMPONENT DESCRIPTION
USI NG HARDWARE DESCRIPTION LANGUAGES
Full-Adders
Recall that a full -adder is a combinati onal ci rcuit that adds three bits (a, b. and ci) and
generates a Sum (5) and a carry-out (co) bit. This secti on shows how to describe a full-
adder behaviorall y in an HDL.
VHDL
Fi gure 9.25 shows a I'u ll -
adder described behav-
iorall y in VHDL. The
full- adder design corre-
sponds to the full -adder
described in Fi gure 4.3 1.
The VHDL entit y, named
Fill/Adder, defines the
full -adder's three inpllls
a. b, and ci and two
outputs s and co.
library ieee;
use ieee.stcLlogic_1164 . all;
entity FullAdder is
port ( a, b, ci: in std_logic;
s, co: out std_logic
);
end FullAdder:
arcbitecture behavior of FullAdder i.
begin
process (a, h, eil
begin
5 <= a xor b xor ci;
The architecture co <= (b and cil or (a and cil or (a and bl;
describes the behavior of
the full -adder. The arch i-
end process ;
end behavior;
tecture consists of a Figure 9.25 Behavioral VHDL descri pti on of a fu ll -adder.
single process describing
the combinati onal behavior of the full -adder. TIl e process is sensitive to all three inputs (a.
b, and ci ) of ule full-adder. When any of the inputs change. the process executes its two
statement s updating the values for the sum (s) and carry-out (co).
Verilog
Figure 9.26 haws a fu ll -adder
described behaviorall y in Veri log.
The fUll- adder design corresponds
to the full -adder descri bed in
Fi gure 4.3 1. The Veri log module,
named FIIIlAdder, defines the full -
adder' s three inputs a, /l. and ci and
two outputs s and co.
The module describes ule
behavior of the full -adder and
module F-ullAdder (a. b, ci, 5, co);
input a, b, cit
output s, co;
reg 5, co;
always @ (a or b or eil
begin
5 <= a A b A ci;
co <= (b & cit I (a & ei) I (a & b);
elld
endmodule
Figure 9.26 Behavioml Vcri log description of a
consists of a single always proce- fu ll -odder.
dure descri bing the combinat ional
behavi or of the full -adder. The pro-
cedure is sensi ti ve to all three input s (0. b. or ci) of the fu ll -adder. When any of the input
Change, the procedure execute
carry-out (co).
its two statements updating the value for the sum (s) and
468 Hardware Description Languages
SystemC
Figure 9.27 shows a full-adder
behaviorall y in Sys-
teme. The fu ll -adder design
corresponds to the full-adder
described in Figure 4.3 1. The
SystemC module. named FIll/ -
Adder. defines the full-adder s
three inputs C/, b. anel ci and
two outputs s and co.
The modul e describe the
behavior of the full -adder and
consists of a single process.
named combiogic. describing
the combinational behavior of
the fu ll -adder. The process is
sensitive to all three inputs (a,
#inc1ude
SC_MODULE (FullAdder)
{
Be in<sc_ logic> a,
sc=out <sc_ logic> s.
SC_CTOR (FullAdder)
b. ci;
co:
( SC METBOD {comblogic);
a b
void comblogic ()
ci;
"" b. read () " ci. read(;
& ci. read {)I I
5 ,write(a. r ead ()
co .write ( (b. read ()
(a . read { 1
(a .read ()
)
);
& ci. read (1I I
& b. read ( III;
b. or ci ) of the full -adder. When Figure 9.27 Behavioral SystcmC descripti on of a full-adder.
any of the inputs change. the s m (s) and ca -out (co)
process executes its two stat ements updating the values for the u rry .
Carry-Ripple Adders
We now show how to struc-
turall y describe a 4-bit
carry- rippl e adder using the
full-adder we designed in
the previous section.
VHDL
Figure 9.28 is a VHDL
descri ption of a 4-bi t carry-
ripple adder wit h a carry- in,
as appeared in Figure 4.33.
The VHDL emity, named
CarryRippieAdder4, has
two 4-bit input s, C/ and b,
and a carry-in input. ci. The
carry-rippl e adder outputs a
4-bi t sum, s, and a final
carry-out co.
The architecture struc-
turall y describes the carry-
ripple adder composed of
four full-adders. The archi -
tecture begins by declaring
the component Ful/Adder,
library ieee ;
use ieee. std_logic_l164. all ;
entity CarryRippleAdder4. ia
port (a: in
b: in std_logic_vector(3
ci : in std_logic;
downto 0);
downto 01;
s: out std_logic_vector (3
co: out std_logic
downto 01;
architecture structure of CarryRippleAdder4 i.
component FullAdder
port ( a, b, ci : in std_logic;
s, co: out std_logic
I ;
end component ;
signal col, co2, co3 : std_logic ;
begin
FullAdderl: FullAdder
port map (a 101. b(OI. ci, 5 (01.
Fu llAdder2 : FullAdder
port map (a (11. b(ll, col, 5 {ll,
FullAdder3 : FullAdder
port map {a{21. b(21. co2, 5 (21.
FullAdder4: FullAdder
port map {a {31. b{31. eo3, 501.
end structure;
col) ;
c021;
c031;
col;
Figure 9.28 Structural VHDL descript ion of a 4-bit carry-
ripple adder.
9.4 Data path Component Description Using Hardware Description Languages 469
whi ch was described in the previous section. TIle design has three internal signal, col. co2.
and c03, that are used for internal connection between the full-adders. The architecture then
instantiates four Fill/Adder components. In VHDL. each instantiat ed component must have
a unique name. The four Fill/Adder components in this design are uniquely-identified by
the names FIII/Adderi. FIII/Adder2, FIII/Adder3. and FIII/Adder4.
In VHDL. the std_logic_vector type provides a convenient method of specifying
pons or signal s consisting of multiple bits. However. a design may need to access the
indi vidual bi ts of these vectors. The individual bits of a s{(Clogi c_vector can be accessed
by specifying the desired bit positi on within parentheses after the vector' s name. For
example, to access bit 0 of the 4-bi t input {( of thi s design. one would use the syntax
"a (0 )". In defining the connecti ons to the instanti ated components in the carry-rippl e
adder. indi vidual bits of the inputs a and b and output s are 'lccessed using thi s yntax.
The first full -adder, FIII/Adderi. connects bit 0 of the inputs a and b as well as the carry-
ripple adder' carry- in, ci, to the full-adder' s three inputs. The s output of FlIllAdderl is
connected to bit 0 of the 4-bit adder's sum output. s. represemed as s(O) . The design then
connects the carry-out bit of FIII/Adderl to the internal signal co l , whi ch is ubsequently
connected to the carry-in input of the next full -adder. FIII/Adder 2. The component COn-
necti ons of the remaining three full-adders are connected in a imilar faShion. with the
exception of the last full-adder in the carry-ripple chain. The carry-out from that last full -
adder. FIII/Adder4, is connected to the carry-out output (co) of the carry-ripple adder.
Verilog
Figure 9.29 is a Veri log descrip-
tion of a 4-bit carry-ripple adder
with a carry-in, as appeared in
Figure 4.33. The Veri log module,
named CarryRippieAdder4, has
two 4-bit inputs. a and b, and a
carry-in input. ci. The carry-
ripple adder outputs a 4-bi t sum,
s, and a final carry-out co.
The module structurally
describes the carry-ripple adder
composed of four full-adders.
module CarryRippleAdder4 (a.
input (3:0) a;
input (3:0) b;
input ci;
output (3 :01 .:
output co;
wire col, co2, co):
FullAdder pullAdderl{aIO).
s(O)
FullAdder FullAdder2 (a (1)'
5(1)
FullAdder FullAdder3 {a (21
5(2)
FullAdder FullAdder4 (a(3),
(3!.
The design has three internal ._ule
b, ci, s, CO);
bioI ,
col) ;
bill
co2) ;
b(2)
c031;
b(3) .
co) :
cit
COl.
Co2,
co3,
wires, col , co2, and c03, that are Figure 9.29 Structural Veril og descripli on of a 4-bit carry_
used for inlernal connection ripple adder.
between the fu ll -adders. The " .
module instantiates four Fill/Adder components. In Venlog. each IIl stanttated component
. The four Fill/Adder components III thI S deSIgn are uniquely_ must have a ul1lque name.
. 'fi b th F IIAdderl FIIIIAdder2. FIII/Adder3. and FIII/Adder4.
Identl ed y e names II , . f . .
In Veri log. vectors provide a conveni ent method of SpeCI ports or SIgna] coo-
sisting of multiple bits. HOlVever. a design may need to the blls these
vectors The individual bits of a vector can be acce se y SpeCI ymg .the de Ired bit
.. ' ' th' brackets after the vector's name. For example. to access bit 0 of the 4-bit
position Wt . III . Id se the syntax "a [0 J". In defi ning the connection to th
input a of tillS destgn, one wou u e
470 Hardware Description Languages
instantiated componcl1l s in the carry-ripple adder, indi vidual bits of the inputs a and b and
output s are accessed using thi s symax. The first full -adder. Fill/Adder I , connects bit 0 of
the inputs (1 and b as well as the carry-rippl e adder' s carry- in. ci. to the full -adder' s three
inputs. The s outpul of FIII/Adderl is connected to bi l 0 of the 4-bit adder's sum output , s,
represented as s{O). The design then connects the carry-oul bi t of FIII/Adderl to the internal
signal co l. whi ch is subsequeml y connectcd to the carry- in input of the next full-adder,
FIII/Adder2. The component connecti ons of the remaining three full-adders are connecled
in a simil ar fas hi on. with the excepti on of the last fu ll -adder in the carry-ripple chain. The
carry-out frolll Ihe last full -adder. F"IIAdder4. is connected 10 the carry-out output (co) of
the carry- ripple adder.
SystemC
Figure 9.30 is a SystemC descripti on of a 4-bit GUTy- ri pple adder wil h a carry-in, as
appeared in Figure 4.33. The SystemC modul e, named CarryRippleAdder4, has two 4-bit
input s. a and b. and a carry-i n input. ci. The carry-ripple adder outputs a 4-bi l sum, s, and a
final carry-out co.
Figure 9.30 Siructural SY' teme
de;cri pt ion of a 4-bi l carry-
"pple adder
#inc1ude systemc.h-
#include fulladder. h-
SC_MODULE (CarryRippleAdder4)
(
sc_ in<sc_ logic> a[4];
sc_ in<sc_ logic> b [4] ;
sc_ in<sc_ logic> ci;
sc_ out<sc_ logic> 5 [4) ;
Bc_out<sc_ logic> CO;
FullAdder FullAdder_l:
FullAdder FullAdder_2;
FullAdder FullAdder_3 ;
FullAdder FullAdder_4;
SC CTOR(CarryRipple4) :
FullAddecl I "FullAdder_I") ,
FullAdder_2(MFullAdder_2-),
FullAdder_31"FullAdder_3"),
FullAdder_4 I " FullAdder_4" )
)
);
FullAdder_l.alaIO]): FullAdder_l.bl bIOI):
FullAdder_l.ci(ci): FullAdder_l.s(s[O] );
FullAdder_l . co(col) ;
FullAdder_2 . a (a II] ): FullAdder_2. blbll]):
FullAdder_2 . ci Icol); FullAdder_2 . sis II] );
FullAdder_2 .co(co2);
FullAdder_3.a la (21); FullAdder_3 . blb(21);
FullAdder_3.cilco2): FullAdder_3.sls12]):
FullAdder_3. co (co) ;
FullAdder_4 . ala(31); FullAdder_4 . blbI3 1);
FullAdder_4.cilco3); FullAdder_4.sls13]);
FullAdder_4. co (co) ;
Up-Counter
9.4 Datapath Component Description Usi ng Hardware Descri ption Languages 471
The module structurall y describes the carry-rippl e adder composed of four full -adders.
The desIgn has three internal signal s, col , co2, and c03. that are used for internal connec-
tI on between the full -adders. The module first instanti ates four Fill/Adder components. In
SystemC, each instant iated component must have a unique name. The four FlIllAdder com-
ponents in thi s design are uni quely identified by the names Fill/Adder_I . FlIllAdder 2
FIII/Adder_3, and FIiI/Adder_4. - .
Previously, we defined multiple-bit inputs as an input vector using the sc_lv type.
However, SystemC does not support connecting individual bits within a s ignal o r POri
of type sc_l v 10 a structural description. In our Can)'RippleAdder4 design. we instead
defined the input s and output s, (I, b. and s, as arrays of sc_l ogic wi th four e lement s
each, rather than using type sc_l v. The indi vidual bits of the array can be accessed by
specifying the desired bit positi on wit hin brackets after the array's name. For
example, to access bit a of the 4-element input array a of thi s design, one would USe
the synt ax "a[O ]". In defining the connecti ons to the instant iated components in the
carry-rippl e adder, indi vidual bits of the inputs a and b and output s are accessed
us ing thi s synt ax. The first fu ll -adder, Fill/Adder _1, connects bit a of the inputs a and
b as well as the carry-rippl e adders carry- in. ci, to the full- adders three inputs. The
s output of Fill/Adder _I is connected to bit 0 of the 4-bit adders sum output . s. rep-
resented as s{O). The design then connects the carry-out bit of FlIllAdder _1 to the
internal signal co l that is subsequentl y connected to the carry-i n input of the next
full-adder, Fill/Adder _2. The component connecti ons of the remaining three full-
adders are connected in a simil ar fas hi on, with the excepti on of the last full-adder in
the carry- rippl e chai n. The carry-out from the last full -adder. Fill/Adder _4. is COn-
nected to the carry-out output (co) of the carry-ripple adder.
VHDL
Figure 9.31 is a VHDL de cription of a 4-bit up-counter, a appeared in Figure 4.48. The
VHDL entity, name UpColllller, defi nes the counter's inputs and outputs. consisting of a
clock input elk, a counl enable control input CIII, the 4-bi t count value C. and a terminal
count OUlput tc.
The UpCOIlllter's architecture slructurally describes the de ign consisting of three com-
ponents, namely Reg4, IlIc4. and AND4. Reg4 is a 4-bi t parallel load register with a load
control inpul Id. III c4 is a 4-bi t incrementer. AND4 is a four- input AND gate u:at will output
1 if and onl y if all four input s are I. The archi tectures f u r t h ~ speCifies two signal . tempC
and incC, used as internal wires within the structural deSCription.
-In 9 Hardware Description Languages
Figure 9.31 Structural VHDL
descript ion of 4-bit up-
counter.
library ieee;
use ieee. std_logic_1164 . all ;
entity upCounter is
port ( elk: in stcLlogic;
cnt: in sed-logic;
C : out stc;Llogic_vector (3 downto 0);
tc: out stcLlogic
);
end UpCounter;
architecture structure of upCounter ia
component Reg4
port ( I : in stcLlogic_vector(3 dowuto 0);
Q: out std_logic_vector() dcrwnto 0);
elk, Id : in stcLlogic
);
end component ;
conwonent Inc4
port ( a: in std_Iogie_vector(3 downto 0);
s : out (3 downto 0)
);
end cOl!lPOnent ;
component AN04
port ( w,x,y,z : in std_logic :
F : out std_logic
);
end component :
signal tempC: std_logic_vector(3 downto 0);
signal incC: std.....logic_vector (3 downto 0):
begin
Reg4_1:
Inc4_1:
AND4_1 ,
Reg4 port
Inc4 port
AN04 port
map {incC, tempC. clk, ent);
map ( tempC, ineC);
map ( t empC(3) , tempC(21.
tempC(l). tempC(O) . tc);
outputC: process (tempC)
begin
C <= tempC;
end process ;
end structure:
The architecture i nstantiates each of the three components and specifies the con-
nections between them. Reg4 is the only sequenti al component within the up-counter
and thus the elk i nput onl y needs to be connected to the clock input of the register.
We control the up-counter 's counting by connecting the count enable input, CIII, to the
load enabl e. Id, of the regi ster. The output Q of Reg4 _ I i s connected to the internal
signal tempC, which connects the register 's output to both the IlI c4_1 and AND4_1
components. Inc4_1 recei ves the current count from the tempC connection and
outputs the incremented count on its output s, which is connected to the other internal
signal illcC. The ill cC si gnal connects the incremented count from IlIc4_1 to the par-
allel load input I of Reg4_1. The current count is al 0 connected to the four inputs of
the AND4_1 component. The AND4_l' s output F i s then connected to the counter's
terminal count output IC.
9.4 Datapath Component Descri ption Using Hardware Description Languages
473
. In the UpColllller design. we need to connect the output of the 4-bit regi ster to the
Incrementer. the A D gate. and the counter' s output port C. VHDL does nOt all ow us to
connect multipl e signals or ports withi n the port map of an instanti ated component.
Therefore. the architecture uses the tempC signal to connect Reg4 _I's output to both the
AND4_1 and Il/ c4_1 components. We still need to connect the register's Output to the
output pan c.. The architecture makes thi s connection by specifyi ng a proces , named
OItlPll tC. that IS used to connect the output of the regi ter to the output pon C. The
OWPlItC process is sensiti ve to the signal tempc. previously used as an internal wire
bet ween the three component s. Whenever tempC changes, which corresponds to a chanae
in the up-counter'S stored count, the OlltPlIt C process assigns the new count to the
port C.
Vcrilog
Figure 9.32 is a Veri log descripti on
of a 4-bit up-counter, as appeared in
Fi gure 4.48. The Veri log modul e,
named UpColllller. defines the
counter's input and outputs. con-
si sting of a clock input elk, a count
enable control input CIII . the 4-bit
count value C, and a terminal count
output tc.
The UpCollnter 's modul e struc-
turall y describes the design
consi sting of three components,
namely Reg4, II/ c4, and AND4.
Reg4 is a 4-bi t parall el load regi ster
with a load control input Id. II/ c4 is
a 4-bit incrementer. AND4 is a four-
input AND gate that wi ll output 1 i f
and onl y if all four i nputs are 1. The
modul e further speci fi es two 4-bi t
wires, tempC and il/cC, used as
internal wires within the structural
descripti on.
The module instantiates each of
the three component s and speci fies
the connecti ons between them. Reg4
is the onl y sequential component
within the up-counter and thus the
elk input onl y needs to be connected
to the cl ock input of the register. We
control the up-counter' s counting by
connecting the count enable input.
CIII . to the load enable. Id, of the
register. The output Q of Reg.J_1 is
module Reg4tI. Q. elk. ld);
input (3 :0) I;
input elk. ld;
output {3:0] Q;
II details not shown
endmodule
module Inc4 (a, 5);
input (3,01 a;
output [3:0) 5;
II details not shown
endmodule
module AND4(w.x,y,z.F);
input w, x, Y, Z;
output F;
II details not shown
endmodule
cnt, C. tc);
output (3,0) C;
reg 13 , 0) C;
output tc;
wire 13,01 tempe;
wire (3 : 0] incC;
Reg4 Reg4_1 lince, tempe, clk. cnt).
Inc4 Inc4_1 (tempC, incc); .
AND4 AND4_1 (tempe 13). tempe 12)
tempe Ill. tempCIOI: te);
alway. @(tempC)
begin
C <= tempC;
end
endmodule
Figure 9.32 Structural Veri log des ription of
up-coumer.
- --_ ..
Hardware Description l anguages
connected to the internal signal l empC. whi ch connects the regi ster's output to both the
IlIc-l_1 and AND-I_I components. IlIc-l_1 recei ves the current count from the l empC con-
nect ion and outputs the incremented count on its output s which is connected to the other
internal signal illcC. The ill cC si gnal connects the incremented count from IlI c4_1 to the
parall el load i nput I of Reg4_1. The current count i s also connected to the four inputs of
the AND-I_I component. The AND4_ l' s output F is then connected to the counter' s ter-
minal count out put Ic.
In the UpCollll ler desi gn, we need to connect the output of the 4- bi t regi ster to the
incrcment er. the AND gat e, and the counter' s output port C. Therefore, the modul e
uses the l empC si gnal to connec t Reg4_ l' s output to both the AND4_1 and IlI c4_1
component s. We still need to
connect the register 's output
to the output port C. The
modul e makes thi s conneC-
ti on by specifying
procedure that is used to
connect the output of the
register 10 the output port C.
The procedure i sensiti ve to
the signal l empC. previ ousl y
used as an int ernal wire
between the three compo-
nents. Whenever l empC
changes. whi ch corresponds
to a change in the up-
counter's stored count. the
procedure assigns the new
count to the output port C.
SystemC
Fi gure 9.33 i s a SystemC
description of a 4- bit up-
counter, a appeared in
Figure 4.48. The SystemC
modul e, named UpCol/llter.
defines the counter's input s
and outputs, consi sting of a
cl ock input elk, a count
enable control i nput ClII , the
4-bi t count value C, and a
tenninal count output Ie.
The UpColllll er 's module
structural ly descri bes the
design consisting of three
components, namely, Reg4,
IlIc4. and AND4. Reg4 is a
.include systemc. h"
"reg4 . h"
"inc4 . h"
"and4 . h"
ff:include
#include
#include
SC_MODULE (UpCoun ter 1
(
sc_ in<sc_ logic> elk. cnt;
s C_out <s c _ lv<4> > C;
sc_ signal<s c _ lv<4> > tempC, ineC;
sc_ signal <8c_ logic> tempC_b [4] ;
Reg4 Reg4_1;
Inc4 Ine4_1;
AND4 AND4_1;
SC_ CTOR (UpCou nter) Reg4_1 ( " Reg4_ 1 " ) ,
I ne4_1 ( "Ine4_1 ") ,
AND4_1 ( "AND4_ 1 " )
(
Reg4_1.I{ineC) ; Reg4_1 . Q{tempC) ;
Reg4_1. elk (elk); Reg4_l.ld{ent) ;
Inc4_1. a (tempC); Ine4_1. s (inee) ;
AND4_1 . w (tempC_b [0 J ); AND4_1. x (tempCb 11 J ) ;
AND4_1 . Y (tempC_b 12 J ) ; AND4_1. z ( tempCjJ [31) ;
AND4_1. F (te) ;
SC_METBOD (eomblogie) ;
sensitive tempC;
void eombl ogie ( )
}
};
tempC_b [0 I tempC . read () [0 J
ternpC_b [11 tempC . read () [1 J
tempC_ b [2 J tempC . read () [2 J
ternpCjJ 131 tempC . read () [3 J
C. write (tempC) ;
figure 9.33 Slructural SystemC descripti on of 4-bit up-counter.
9.5 RTl Desi gn Usi ng Hardware Description Languages 475
4-bit parallel load register with a load control input lei . IIIc4 i s a increment er. AND" is
a four-input AND gate that will output 1 if and onl y if all four input' are 1. The module
further specifi es two 4-bit signal s. l empC and ill c , u"cd as internal wi res within Ille struc-
tural descripti on, Additi onally. the modul e defi nes a four-element array of ,c_l ogic signals.
named l empC_b. u ed to access the indi vidual bits within the 4-bit Vector l empC.
The module first instantiates each of the three c mponelllS and then specifies the con-
necti on between them. Reg" is the onl y sequent ial component withi n the up-counter and
thus the elk input only needs to be connected to the clock i nput or the register. We contIol
the up-counter' s counting by connecting the count enable input . CIII , to the load enabl e.
Id, of the register. The output Q of Reg4_1 is connected to the internal signal tempC,
whi ch connects the register' output to IlIc4_1. IlI c4_1 receives the current count from the
l empC connecti on and outputs the increment ed count on its output " whi ch is conne ted
to the internal signal ill cC. The ill cC signal connects the incremented count from 111 ('4 I
to the parall el load input I of Reg" _I . The current COunt is aL connected to the
inputs of the AND4_1 component using the l empC_b array to access the i ndi vidual bits.
The AND4_l's output F isthen connected to the counter 's terminal Count outputlC.
In the UpCol/lller deSIgn. we need to connect the output of the 4-bit register to the
incrementer, the AND gate. and the counter' s output port . Therefore. the module uses
the l empC signal to connect Reg4_l's output to the IlIc4_1 component and uses the
l empC_b array to connect Reg4_l's output to the AND4_ 1 component. Thus, we still need
to connect the register's output to the output port C and as" ign the indi vidual bits of the
regi ster' S output to the tempC_b array. makes these connecti ons by defining
a process, named combloglc, that IS senSlll ve to the signal l emp . Whenever tempC
changes. which corresponds 10 a change III the up-counter's 5t red count. the combl ogic
process assigns the new count to the output port C. Additi onall y, the process assigns the
bi ts withi n vector tempC w the ,"d, vldual sc_l ogi c signll is within the l elllpC_b llrray. In
order to access the IndIVIdual bIts of the Vector SIgnal l empe. we use the syntax.
MtempC.read()[O) " .
9.5 RTl DESIGN USING HARDWARE DESCRIPTION LANGUAGES
We now show how 10 create RTL descripti ons using HDLs. We will show HDL descrip-
tions of the starting point of RTL deSIgn, namely. hi gh-l evel state machines and of the
ending point of RTL design, namely, connected controllers and data paths. RTL de igners
wi l l commonly create a testbench to test the hIgh- level stat e machine description, and
then use that same testbench for the comroll er/datapath descripti on, thus helping to verify
that the designer created a correct controll er/datapath IlTIpl emcnti on.
High-level State Machine of the laser-Based Distance Measurer
VHDL
Fi gures 9.34 and 9,35 present a VHDL descripti on of a hi gh-l evel state machi ne for the
l aser-based di stance measurer shown tn FI gure 5. 15. The entity, named Laser
DislMeasli rer. defines the ,"puts and output , II1cluding a user- pre ed bunon i nput B. a
la er sensor input S, a laser control output L, and a 16-bit output D for the distance
measured.
476 9 Hardware Description Languages
Figure 9.34 Behavioral
VII DL dc'eripliol1 of 3
hi gh-len.:! "lai c nwc hinc
of (hc la ... cr-bascd di l\ t:1llcC
IIlcal\urcr.
librazy ieee;
u ieee . std_logic_l 164 .all;
u ieee.atd_lOO'ic_arlth.41I;
.Dtity LaserDistHeasurer i.
port ( elk. rst: in std_I09ic:
B. 5; in std_logic;
out std_Iogic;
0: out unsigned(15 downto 0)
);
nd I..aserDist.Measurer;
architecture behavior of LaserDiatMeasucer i.
type statetype 10 ISO. 51. 52. 53. 54);
state: 8tatetype:
aignd Detr ; unsignedllS _to 0);
cone taut U_ZERO :
unsigned lIS cSownto 0) ;. '0000000000000000';
con. taut U_ONE : unsigned(O 4owDto 0) :& -1- ;
begin
statemaehine: p.roc (elk, rst)
begiD
if 1 .) th.n
L <=- 0 ' ;
o <= U_ZERO;
Detr <. U_ZERO;
state <.sa: SO: -- i nitial SUtt.
el.if (clk::o: '1' and elk' event) th.n
ca state i.
"h.n SO :=>
L <c . 0';
o <= U_ZERO:
state <= 51
(continued in Figure 9.35)
laser off
clear 0
InSlead of using a 16- bil ."d_logic_,'eclOr. we defined the OUlput 0 as ullsigned. For
logic operati ons. an ullsigli ed behaves the same as a sIlClogic_,'ecl or. However. we can
also perform ari lhmelic operations on ullsigll ed values. Whenever u ing unsigned. __ e
musl include the Slalement use ieee . 5 td_1 og ; c_a ri th. all; at the lOp of our
YHDL descriplion. The use slatement specifies which package we will use within our
design. The package i eee.sld_logic_arilir defines Ihe ullsigll ed type as well as a set of
operati ons and functions we can perform on ullsigll ed value.
The entit y also define a clock inpul elk and resel inpul r SI . We assume that the clock
input is 300 MHz. as was assumed in the laser-based distance measurer design shown in
Fi gure 5. 19. We omil delails of generating Ihe 300 MHz clock (see Section 9.3 for an
exampl e of describing an oscillalor).
The YHDL archilecture describes the behavior of the entity. Instead of using IWO
processes as shown in Fi gure 9.22. the archilecture consists of a single process describing
Ihe behavior of our highlevel state machine. The high-level state machine process.
named slOlelllacirill e, is sen itive to inputs elk and rsl. If the rSI is 1, then the process
asynchronously selS the slate signal 10 the state machine' initial state, SO, and initializes
9 Rll Design USIOg Hardwaro 0 cnphon Languagos 477
Figure 9.35 Beh3>I<'I1I1 HI)l.
descnpli n of. hl&hl",el ,laiC
machine of lhe I.",rb ",d
dl lance me ul'"er (conlln14rd)
t_ '""" 9:34/
- I
[)cu - U Z :
U I 'I') tbea
t ,
1
t I.
_if
_ 52.
L 'j
A S I
_ 6)-
t, c. '0',
Dctr c OCt.f' t 1,
if t 1'1 tben
- S. r
.1
l
_ it .
"0 . ,.
D SHRt lr, U 'mi.
<- 51:
.... c ,
_if
eDIt proee
eDd t. v r.r:
1ft. t on
1 ... 1 If
the OUlpUt!>. Land O. and the ,"Iemal cmml er /)//, . Ie) Ihell d fallh vIII II ,. '111e
default value, hould corrC5poncJ 10 Ihe va lue, "'''g'' cJ III Ihe IIIlIhlll the lI1i" ,, 1 ,Wlc
of our high. le\el 'tale machine. oli ce thm we defined U (0", 1,1111 . nllliled (J 11-.110. colre
sponding 10 the 16--bll I/IIJ'gned vnlue "f l ero. When Ihe fl' " 11,,1 e"ahled. "" Ih ' ri,i "
clock. the proce evalume, the current ,wte. ;I\"lln\ Ihe "pproprinle Uli lput, fnr Ihe currenl
state. determine the nexl ,wte. and updale, Ihe , lute reg"ler "gnu!. \ 1/11" . In uur III B.h level
Slale machine dcscrip"on. we onl y need a "ngle tale reg"ler" n"l ln l1",del Ihe behuvinr
of our tate machme, of the two ,'gnal, curmllllale and IIr>lI ll1le we prevlllu\ly
used in the controller design hown '" Figure 9.22.
The high level Ulle mach me for Ihe IU'lCr ba'ICd d"wnce meawrer perfnnm IwO
arithmetic operali ons. addilion and ,hi fling. By u\ing Ihe /II"i/l,wtl l ype. 10 lI"remenl Ihe
counter signal OClr in stale 53. we use the 'yntax. "Dc l r ( Dc t r + 1; .'. 'I h" stale
ment will add one to the current value of OCIf and \lore the re\ ult in Ik/r. In ,tute S4, we
calculale Ihe di \tancc. 0 , by dividing Ihe value of J)( If by 2. Il uwever. we perform Ihi.
division u ing a righlshift-by-one operati on. To perfom) Ihe , hlft "nd a"igll the vulue 10
the outpul O. we use the M31ement "0 (- SHR( Dc lr, U ONE) ; ". The funclion
SHR() , defined within the; eee. 5 td_1 og ; c_a rl h package. the fi"t paramo
eler. OClr, by the amount specified by the second parameler. V_ONE, where V_ONE a
constant we defined earlier in Ihe architecture.
Verilog
Figures 9.36 and 9.37 presenl a Yerilog descripti on of a hi gh level \tate machine for the
laser-based di stance measurer shown in Figure 5.15. The module, named Laser
DislMeasurer. define Ihe inputs and outpuIS, including a bullon inpul B. a
.. ... -
4711 Hardware Description Languages
la'cr \CIl,or Input S. i.1
conlfol oulpul I .. ,anu .1
1(, 1\11 OUIPUI /) f()r Ihe
mca\urcu
'I he mlluutc
tkllllc' a InpUI elk
.Ifld re'cl Inplll (II We
",,"mc Ihal Ihe IIlpUI
I' .'(X) f',III/, .Il \Va'
41"lIII1Cd III the la"crba ..cd
tip .. lance I11ca\urcr dC'lgn
,hOWII In "'gurc 5. 19 We
Olllit detail' 01 gcncr'lllIlg
Ihc .1(X) f',1I11 (,cc
cellon I) ., for .In e.'.Imple
of de,ennlng all ,,'eill.llon.
The Venlog moullic
beh.1\ lorall} dc,cril1C, the
! .,lIH' rf)'\(J\/t ' tl\urtr'\ IlIgh-
Ie'el 'I ,li e Ill.lct"ne In,le.lu
or lI'lng (\\() proccdun;\ a'
, 11 0ll II in l'igure 9.2.\. Ihe
lI10dule con ... l .... h or a 'Ingle
procedure.: dc-.crihlllg the
neha' inr or our high-level
'1:1Ie lIl ae hine. The high.
bel ,laic llIaci,,"e proce
dun.: I.... ,cn ... ili\t.: 10 the
po,ili, e edge of Ihe re,el
inpul. r<t. and Ihe po,ili'e
edge or Ihe inpul. de
If Ihe nl b enabled. Ihen
Ihe procedure ,,,ynchro
nou,l y 'e ll Ih e :'Ia lc
register. stllll', 10 the '-laIC
m:u.:hillcs initial O.
and inilializes Ihe QUIPl" '.
.odul. t.A erD stl'!e aut r rl B. S. L. j):
input ... i<:. rat. B. S:
output L,
output
reg L;
reg 1
5:01 D.
D,
pax ... ter l-bOOO.
1 bOO!.
1 bOlO,
S3 j'bOll.
Sl J blOO,
reg 2 0) 4 ..
rev (l
alway. 0 ( po dge r OT poa.dg_ kl
,,-gin
if I r I "-gin
L 0,
D < 0
OCtr 0,
,tate < I Inl i I at
end
begi n
ca Cst: e
SO, "-gin
L < 0
D < 0
8 < 51:
end
SI: "-gin
Dc r < O.
if 16 1)
state 52:
.1
st te 51:
end
52, "-gin
L <:0 1;
state <: 53;
end
(conllnued In Figure 9.37)
II las r off
I clear D
II re.et count
II laser on
Figure 9.36 Bch3\ ioral VHDL de,criplion of n hlghlel'el <lale
nuu.:hinc of Ihe lascr-bn. cd di,wncc me3\urcr.
L and D. and Ihe imernal coumer regisler. DClr. 10 lheir defauh value. The defauh valu
shou ld correspond 10 Ihe values as igned 10 Ihe ignal \Vilhin Ihe ini li al late of our hi gh-
level SlalC machinc. When the rSI is nOI enabled. on the rising clock. the procedure evaJu
ale, Ihe currenl 'laiC. assigns the appropri ale OUlput for the curren! stale. delennin the
neXI SlalC. and updales the laiC regislcr. In our high-level stale machine de criplion. \Ioe
onl, nced a si ngle late register signal 10 modellhc behavior of our Slale machine. instead
of Ihe IWO rcgi slcr signal s CIIrr elllSfGle and lIe.tlSfGle we previously used in the controller
design shown in Figure 9.23.
TIl<: hlgh!c,(1 'IJI(
mJ ' hl"" t,l< lhe I. 'r hoi ,.s
tJl'.(.Uk:e urer nn,
1\\\1 Jnlhrrk!lh. "perdlulCl.
.IUUUI\'11 .Ind ,hllung III
In.: renl<:nIIt!.: ulUnla I If '"
'laIc \.1. \Ie u lhe ')01.'
Thl' ,1.ICI11<:11I \I III JJJ """
Ul Ihe I.UfTCnl 'Jluc 1.1 I>' "
,anu ,,,,,,. lilt. J"C,uh III I>'"
In 'I.llt .\-I. \Ie (ut-lIl,lIe !l1<:
dl'IMce. D, Il) dll 111<:
'JlliC "1/>.,,,,) Ih'\lc,el
\\ e pcrhmn (hi dl\ I'u)n
U 109 .1 nghl ,hili III lin ...
opewl"n 1" pert',"11 111<:
hili ilnU It!.: 'diU' III
Ihe (lUlrUI 0 . \lC U-C III<:
,lalelllCl1I .. 0
1 : ", \I I1<:rc perln""' ,I
n!!hl ,hili operaunn
S" lem('
Igure' 'l alld I) W rrc'<'n!
il de...:"pu!>n III ,I
hl!!hle'cl ,I;Ile IIIJlhan' tnr
Ihc IJ\Crha'Cd UI,lJIleC mea
urer ,hown an IIFure
The module. n.lloco 1..1.11/ r
1J'flMwlllrrr. dchnc lhe
anpul' and oUlrUI,. ancluu,"
a uler'pre \Cd hunnn anpul
B. a l3.ler len",r anJlllI .S, J
13....:r control OlIlpUI L. Jnd J
16.bll oulpul D for Ihe d,,
lance
The module 31\0 dehnc'
a clock inpul elk and rc\C1
inpul r.l/. We a"ume Ihal Ihe
clock inpul i\ 300 MH7, a\
was a,sumed an the laler,
b3.\Cd di,wncc me3.\urer
design <hown In Figure 5.19.
We omil detail; of generaung
Ihe 300 MHl clock
Seclion 9.3 for an example of
describing an O\Cilialor).
9S RTlD
FIIIU' 31 n I \ Itl.ll \, III, "
,n hi t,I,' I I I ,I til lith:
Iip.tllll1 ttl II IIINh lr\rI 'Hil t'
11K I II""' ,."",,,,,,,./1
Unclvd<> .Y h'
" ... . II 'II
K U1'1IOOI mMI"'hlnt!'I ,
..n.ltt PO' r. 1 I
H f r ' re&d O S<.' 1.oGIC I I (
" write ..LOCI 01
write
at , II a (I'
)
.1 (
..itch ' J
c
L. write I 1.oGie 01:
o, wri.e lOI,
tAt.#>
bred;
ca ('I
II I of!
II c1"'o. r 0
Ot: r O. 1/ cleat coun
H fB. rea4 11 SC LOGIC 11
atE> ,2;
(conbnuod in FI(1UfO 9 39)
Figure 9.38 8th,"!>.. JI Sy\ltm dClCflpulln of a high-level \laic
machlllC of lhe 1.o,c;rba.\Cd d"tance me" Uf.r
480 Hardware Descnpllon Languages
I he S} 'ternl module
beh,,, ",retll} de,enbe, Ihe
l..ill,.,!)"/il" ,{/, ,,,,,r', high
bel 'tate m.lehlne. In"e<ld
oj u\lf1g two pnM.:c\\c\ J.
,hown III "'gure 9.24. Ihe
lIlodule con'I'" (II Inglc
plCe" de,ertomg Ihe
oehav;or 01 Illlr hlglt le,d
, tate lIIachlne I he
Icn'l 11l.U,.:hlllC prnec .......
Ihll11cd \/tII,'fluu1111lt'. ....
,cn"IIVC [0 the j'Xhlll\C
edge 01 Ihe re'el IIlrlil. n/.
and tlte 1'0'111 ," edge 01 lit
dock Inrut. II Ihe 1'\1 "
en.lhled. Ihell lite prtlCc"
,t') lIehrlllllH"I) 'el' the
'Iel/L' ' Ignal III thc '1.l1c
lI1achlne', Inll i.tI ",lIe. SII.
and 11111 ", II Ie' Ihe "UlpU".
(conlmued from Ftgure 9 38)
c
wr i t. ..LOGIC 1 r oc
""".. 3.
br.ak
c a
L wri t. 01 I off
Dtr r.ad 1
if read LOCI 11
.1
"(It S3;
br.u
r r d 1)- Cal"'JI. e D
br.ak :
Figure 9.39 Rch.l\IClral \)'tcmC' ut.''-\.:nplwn of .. hl).!h)e-.d
"1.lle m.ll' hmc nf Ihe tJl"I.JIKC rncol,urcr (( (mtlnut',I,
L ,", d !). and Ihe InlCrIlul wunler "gmll. nur. III Ihcor IklJuh \.llue, The ddauh \alue
, llOuld corre'pond (() Ihe v" lue' ." "gned 10 Ihe Ignal, \\ lIhl11 Ihe 111111al ,I.IIC of our high
level ,I: ll c lIladllf1e. When Ihe /'II " nul cn"bkd. on Ihe mlng dod. Ihe proce' C' aluJlc
Ihe curre nl ;. Ialc. a"lgn, Ihe "ppropn'll e OUIJ1 UI' for Ih . eurrelll "a'c. determine, the nett
;' Iale. and updale' Ihe -talc regl-ter ,ignal. \1111... In our hlghle\tl 'laiC machine de np-
li on. we onl ) need J , ingle 'lUle reg"la 'Igllal 10 model the heha\lor of our laic
nwchill e. in, lead of Ihe 1\\ 0 ' Iglla!> l'IIrr""/.lIl11t' .tnd ",'\/\/(/1/' we prc\iou,l) u<,Cd III
w lli roll er de,ign , lll1\' n in Figure 9.24.
The 11I gh b 'el ,Wle machi ne for Ihc la,erb"ed dl,tance llleJ'urer perform, 1\\0 anth-
mClic opemli on;.. addill on and 'hifling. To incremenl Ihe counler o Clr III ,late 3. "c U
Ihe ,) Illa\. "Dc r Dc r . read () + 1: ". Thb ;'(Ulcmelli ,,, II ndd one 10 Ihe urren!
va lue uf o ("/r and -tore Ihe re, uh in o C/r. III 'talc 5-1. \\C calculmc Ihe di' lancc. D.
di, iding Ihe ,aluc of DC11' h) 2. Howe\ er. we perform Ihi, di,i,ion u ill g a nght , hl ft b}
ope,,"ion. To perronn Ihe , hif! and ,,,sign the va lue 10 the Ul pul 0 , we u c the , lalemeOi
" D. w r i t e ( Dc t r . read ( ) 1 ) : " . where perfonns a ri ght hil'l opeml ion.
Controller and Datapath of the Laser-Based Distance Measurer
VUDL
Fi gure 9.40 is 3 HDL descripli on of the laser-ba. ed di slance measurer hown in Figure
5. 19. The emit )'. named ulseroislMeasllrer. defines the inpuls and outpulS. including a
user-pressed butt n inpul B. a sensor inpul . a laser conlrol outpul L. and a Iii-bit
OUlput 0 for the di slancc measured. The entil)' al so defines a 300 MHz clock input cl
and re el inpul rSl for the design's controller.
Figur. 9.40 SlnKtu",1
de'iCripllon of Iop-k\d
VHDL de\Cflptton or II",r
ba..cd dl lance mea urer
1
- WI r
arebit ttln . _t
po I el
o 01
The ilfchltecture ItJUCIUroJl y de nhe' the ,onneell nll ' of Ih '
conoroller and dalapath component, 1llc :.rChllectUl c In'ta01 l11tc, Iwn elllllr<1rICI1t \.
WM Com roller I I lhe controller for the IU'lCr h.l\Cd d"wnee men",rer ano
WM=oalOpmh_' IS the daUl palh for thl\ de"gn The arehlletlurc Cflnnctt' Ihe enlll y" ( lie.
rSI. B. and 5 inpuls 10 Ihe inpul l of WM_ on/mll, r _I "nd eonnetl, Ihe coni roll er',
control OUIPUI to the corresponding OUlpul por1 L. Addill!mully, Ihe four "gnul ,. /)I"('II-"Ir.
Dreg_ld. DC1r_clr. and Del' J nt, connect the controll er' four conlml "gnal , 111 the ruur
inputs of WMflalOp(lI"_I . 1llc l.lllrrDIfIMeaUlrrr dlllapalh ha, u \l nglc !)Ulput IJ, pro-
viding the distance mea.wred. that connected to the outpul pon () of the cntny.
Figure 9.4 I is a VHDL descriplion of the La.JrrDwMell fl"""" duwpalh comr<mcnl
shown in Figure 5. 17. The entilY, named WMJJmuplIIit . deflOe, a clock Inpul elk. four
control inpuls Dreg_elr. Dreg_ld. Dar_elr. and DClrJ fI1. and a 16-bn dl \ Lancc Ouipul O.
The architecture defines three componenls. a 16-bn upcounler, u 16-blt register,
and a 16-bil right shifter Ihat shifl right by one posill on. Up o"nlerl6 i, a. l6-btC up-
counler wilh a counl control inpul en/ and a count clear IOpUI elr. Rell 16 1\ a Iii-bit
-'82 9 Hardware Description Languages
Figure 9.41 SlruclUral
VHDL descriplion of Ihe
laser-based di slance
measurer's dmapi.llh.
library ieee;
use ieee. std_logic_1164 . all;
entity LD}LDatapath is
port ( elk: in std_logic;
Dreg_clr, Dreg_ld : in std_logic:
Detr_clr. Dctr_cnt: in std_logic;
D: out stCLlogic_vector(lS downto 0)
);
end LDtoLDa tapa th i
a rchitecture structure of LDM_Datapath is
component UpCounter16
port ( elk: in stdlogic;
clr. cnt : in std_logic;
c: out stcLlogic_vector(lS downto 0)
);
end component ;
component Reg16
port ( I: in std_logic_vector(15 downto 0);
Q: out std_logic_vector(lS downto 0).
clk, clr, ld: in std_logic
);
end component .
component ShiftRightOne16
port ( I : in std_logic_vector(IS downto a).
S: out std_logic_vector(15 downt o 0)
);
end component ;
signal tempC std_logic_vector (15 downto 0);
signal shiftc : std_logic_veetor(15 downto 0) ;
begin
Dctr : UpCounter16
port map (clk, Dctr_clr, Detr_cnt, tempC);
Shi ftRight: Shi ftRightOne16
port map (tempC, shifte) ;
Dreg! Reg16
port map (shifte, 0, clk, Oreg_clr, Dreg_ld);
end structure;
parallel load register with a register load control signal Id and a register clear signal elr.
ShijlRighlOllel6 is a 16-bit right shifter that shift s the input I ri ght by one posi ti on and
assigns the shifted value to the output S. The archit ecture instanti ates an UpColllllerl 6
component named Delr, a Regl6 component named Dreg, and a ShijlRighlOnel6 com-
ponent named ShijlRighl. Delr's instant iation connects the datapat h's DCII'_elr and
DCII'_ClII inputs to DClr' s clear and count control inputs. Delr's count output C is then
connected to the archit ecture's internal signal lempC that connects the count value 10
the ShijlRighl shifter' s input. The shifted count is thcn connected to the input of the
Dreg regi ster using the internal signal shijte. The instantiation of the Dreg regi ster con-
nects the register' s clear and load control input s to the datapath's Dreg_ell' and Dreg_ld
input ports. Finall y, the register's data output Q is connected to LDMjlatapalh' s mea-
sured di stance output D.
9.5 RTL Design Using Hardware Description Languages ... 483
Figure 9.42 and Fi gure
9.43 are the VHDL description
of the laser-based di stance
measurer's FSM controll er
described in Fi gure 5.2 1. The
entit y, named LDM COlliroller
defi nes a clock elk.
reset signal rSI, a user-pressed
button input 8. a laser sensor
input S, and five output control
signals, L, Dreg_cit; Dreg_ld.
DCII'_el,; and DClrJ III. The
output L is used to turn the
laser on and off, where if L is
1, the laser is on. The four
other out put signals are used 10
control the RTL design's data-
path component s.
The VHDL architecture
describes the behavior of the
enti ty. Si mil ar to the controll er
design sholVn in Fi gure 9.22.
the archit ecture consists of two
processes, one modeling the
stale register, the other mod-
eling the combinat ional logic.
The state register process,
named stalereg, is sensiti ve to
inputs elk and rSI. If the rSI is
enabled. then the process asyn-
chronously sets the Cllrrelllstale
signal 10 the FSM's init ial
state, SO. Otherwise, if the
clock is ri sing, the process
library ieee;
u.e ieee.stc;Llogic_l164.all;
entity LDM Controller is
port ( rst: in std.-logic;
B. S: in stcLlogic;
);
L: out stcLlogic;
Dreg_clr, Dreg_ld: out std_logic;
Dctr_clr, Dctr_cnt: out std_logic
end LDM_Controller;
architecture behavior of LDM_Controller i.
type statetype ia (SO, 51. S2, 53, S4);
signal currentstate, nextst8te: statetype;
begin
statereg: proce.s (clk, rst)
begin
if (rst='l') then
currentstate <= SO; -- initial state
elaif (c!k='!' &nO elk'event) then
currentstate <= nextstate;
enO if ;
end proce ;
comblogic: proce (eurrentstate.
begin
B. 5)
L <= '0';
Dreg_clr <= '0';
Dreg_ld <= '0';
Detr clr <= '0';
Dctr=cnt <= '0':
ca.e currentstate is
when SO =>
L <= '0';
Dreg_clr <= '1';
nextstate <= Sl;
when Sl =>
Detr e!r <= 'I'
it (B;' 1 .) tbeD
nextstate
nextstate
end if :
52;
51;
laser off
clear Dreg
clear count
updates the state register wilh (continued In Figure 9.43)
the next state.
The second process, Figure 9.42 Behavioral VHDL dcscriplion of laser-based
named comb/ogie. is sensiti ve distance measurer's controll er.
to t.he inputs to the combina- .
tional logic of Fi gure 5.21. namely, the external inputs 8 and S. and the state regIster
out put c"neIlISlale. When either of those items change. the proce sets the FSM's out-
puts. in thi s case L. Dreg_ell; Dreg_"'. DCl r_el,; and DClr -,"I, wilh the value
for the current state. In the controll er example of Figure 9:22. the FSM s output x was
defined within the case statement for all possible states. five output that mu t be
defined in the LDM COlli roller and five possibl e states. as the values to all outputs
in each stat e would be cumbersome. Fu rthermore, find Ing a mI stake and makmg
484 Hardware Descriplion Languages
correcli on, or modificalion, 10 Ihe
com roll er would become very di r-
fkul! in a larger FSM con,iqing
or more Slal e, and having many
more oulpul ,. The comblflgic
proces, u,e, a dirrerenl approach
in which a deraull va lue for Ihe
OUIPUIS is Or>! as,igned and onl y
Ihe deviali ons from Ihe deraul!,
arc a. , igned lal er. The comblogic
process fir" as,ign, a deraull
value or 0 10 all five OUIPUI . The
prates, Ihen evaluale Ihe currenl
and Ihe.! value... to
Ihe OUlpU!> onl y when Ihc OUIPUI
, hould be 1. The prace" aha
as,igns Ihe va luc 0 10 ,eve", 1
"ignal" wilhin Ihe "," ell "WIC-
men IS. howc;ver.
(contmued from Figure 9 42)
when 52 =>
L <::: '1';
nextstate <= 53;
when 53 :::;>
-- laser on
L <= '0'; laser off
Decr_cnt <= . 1 ; count up
if (5.'1') then
nextstate S4;
.1
nextstate <= 53;
end it ;
when 54 =>
Dreo_ld 1' ;
Detr_cnt <= 0' ;
nextstate <= 51;
end ca ;
end proce ;
e nd behavior;
load Dreg
stop countino
Figure 9.43 Beha, 101111 VHDL de,cnpliOIl or la'er-based
di,tnncc I11Ctll.,UrCr', controll er (cofllillll('d).
melli' arc included on ly 10 clearly indicale Ihe behavior or Ihe cOlllrolier (Ihey arc redun-
dalli. bUI help make Ihe descriplion easier 10 unde"wnd).
Thc process 01,0 delcnnincs whal Ihe nexl "ale should be. based on Ihe currenl slme
and Ihe v:llues or inlulS Band S. The neXI 'laIC will be loaded inlo Ihe slmc regisler by
Ihe slalC regi, lcr process on Ihe neXI ri sing clock edge.
Vcri log
Figure 9.4-1 is a Veri log
descriplion of" Ihe laser-ba cd
di lance measurer shown in
Figure 5.19. The module.
named ulSerDisrNfeasllrer.
defines Ihe inpuls and oul -
pUIS. including a user-pressed
bUllon inpul B. a lascr cnsor
inpul S. a laser control OUl pul
L. and a 16-bil oulPUI 0 for
the dislance measured. The
modul e also defines a 300
MHz clock inpul elk and
resel inpul !"SI ror Ihe designs
controll er.
module LaserDistMeasurer(clk. rst. B. S. L. D);
input clk. rst. B, S;
output L;
output 115,01 0;
wire Dreg_clr. Dreg_Id;
wire Detr_clr. Detr_cnt ;
LD!-l_Controller_l (clk. rst. S, S. L.
LDM_Da tapa th
Dreg_clr, Dreg_ld.
Dctr_clr. Dctr_cnt);
LD1-COa tapa th_l (clk. Dreg_clr. Dre9_1d.
Dctr_clr. Detr_cnt, 0);
endmodule
Figure 9.44 Slruclural descripli on of lop-level Veri log
descri ption of laser-bnsed distance meilsurer.
The LaserDislMeas/lrer structurally describes Ihe conneclions or Ihe controller and
dalapmh componenls. The modul e inslanti ales IIVO componenls. LDM_Colllroller_1 is
Ihe cOnlroll cr ror Ihe laser-based di slance measurer and LDM_Dataparh_1 is the dalapath
for Ihis design. The archileclure conneCIS the module s elk. r SI . B. and S inpuls 10 the
inpuls of LDM_Collfroller _I and conneCIS the conlroll er's laser control OUIPUIIO the cor-
95 RTL Des'gn USing Herdware Descnpllon Languagos -ISS
responding OUlpUI port L. ddilionnlly. Ihe rour imcnwl wire,. Ol\'g_clr.
Dcrr _elr. and Dcrr J ill. connc I the cOntroller' ur cOlllrtJl ,ignnl' 10 Ihe rour inpulS or
LDM_Dawpmh_l . Thc ulserDislA/PtU'If"l'r dillupmh hu, (I 'inglc OUlpUI D. providing Ihe
dislance mea ured. Ihal is connecled 10 the OUlpUI port I or Ihe modulc.
Figure 9.45 i a
Vcrilog descriplion or Ihe
LaserDislMecwlfus dUla-
palh componcOl shown in
Figure 5. 17. TI,e module.
named LDM_D{//lIJI{l/h.
defi nes a clock inpul elk,
rour control DregJlr.
Dreg_ld. DClrJ lr, ll nd
DClr _CIII. and a 16-bil
lance OUIPUI D.
The dmapalh consist;
or three componenls. a
16-bil Up-couOler. a 16-bil
regislcr. and a 16-bil righl
shiner Iha! hin, righl by
one position. Up Olllller-
16 is a 16-bi l up-counl er
wilh a counl control inpul
elll and a counl clear inpul
elr. Reg l6 is a 16-bil par-
all el load regislcr wilh a
regisler load control "ignal
Id and a regiSler clear
signal elr. ShiftRighlOlle-
16 is a 16-bi l righl shift er
thaI hi rls the inpul 1 ri ghl
lIOdulo IIpCounter16 (elk,
iopu,t elk. clr. cnt:
out""t 115,01 C;
If detalla not Ih '"
_1.
_1. RegI6(I, 0, elk,
i"""t (15,01 1/
iJ>put elk, elr. ld,
out""t 115;01 0,
I detail. not .hewn
.1IdIIodul.
eIr, cnt, e):
elr. Idl,
lIOdul. ShHtRlghtOne16 (I,
iJ>put 115;01 I;
51,
out""t 115;01 5;
/1 detaill not Ihewn
_1.
_1. LOll_De apath(elk, Oreq elr, Dug ld,
Dc r_cl r, Detr cn . 0):
i"""t elk;
i"""t Dreg_cl r , Dr.g_Id,
laput Octr_clr, Delc, cnt j
out""t (15;01 0,
wire (15,01 tempC, ahlttC,
UpCounter16 Dctr(clk. Detr_cle. Dc c_cnt,
tempCl,
ShiftRlllhtOnel6 ShiftRlqht (tempC, ahlL LCI :
Reg16 Dreg(shiltC, 0, elk, Orell_elr, Dreg_ld),
eDdIIo4ul.
by one poSlI,on and Figure 9.45 Slruclural Vtrilog de<c riplion or Ihe laser-based
assigns Ihe shifted va lue distance measurer". dUlap.lh.
10 Ihe oUlpUI S. The data-
path module inslanliates an UpCOlllllerl6 componeO! .Delr, a Reg l 6 componenl
named Dreg, and a ShiflRighlOnel6 cmnponenl ShiflR'ghl. The module co.nncels
Ihe dalapalh s DClrJ lr and DClr_clIl mpulS 10 DClr S clear and eouO! .control mpu.lS.
respeclively. The counlers counl OUlpUI C IS then connecled 10 Ihe 16-bll mleroal wIre
lempC Ihat connecls Ihe count value 10 the ShiflRighl shiner's input. The shined count is
Ihen connecled 10 Ihe inpul or Ihe Dreg regisler usi ng Ihe 100eroai 16-bi l wi re shiflC. The
modul e conneclS Ihe Dreg regislers clear and inpuls 10 lhe dalapath's
Dreg_elr and Dreg_ld inpul port . Finall y, the reglsler S dala OUlpUI Q IS connected to
LDM_dalCtpalhs measured dislance OUlpUI D.
486 Hardware Description Languages
Figures 9.46 and 9.47 arc the Veri log de criplion of the laser-ba ed di tance measurer's
FSM controller de;cribcd in Figure 5.21. The module. named LDM_Colllroller. defines a
clock input elk. a ignal nl, a u ..cr-pr cd bUllon inpul B. a laser ensor input S, and
five OUlput control signal;. L. Dreg_clr. Dreg_id. DCfrJir. and DcrrJIII. The OUlput L i
used to !Urn the laser on and off, where if L is 1. Ihe laser is on. The four oLher output
signal s arc used to conLrol the RTL design' datapaLh components.
Figure 9.46 Behavioral
Veri log descripti on of laser-
based di stance measurer' s
controller.
module LOM_Controller (elk, rst, B. S. L. Dreg_elk.
Dre9_1d. Deer_clr,
Deer_cnt) ;
input elk, rst. 8. S:
output L;
output Dreg_elk. Dreg_ld;
output Deer_elr. Decr_cnt;
reg L;
reg Dreg_elr. Dreg_ld;
reg Decr_clr . Octr_cnt;
parameter SO
51
52
53
54
3'bOOO,
3b001.
3 'b010,
3 bOll.
3'blOO;
reg (2:0J currentstate:
reg [2:0J nextstate;
alway. @ (po.edge rst or po dge elk)
begin
if (rst==ll
currentstate
el
so; II initial state
currents tate nexcstate;
end
always @ (currents tate
begin
L <= 0;
Dreg_elc <= 0;
Dreg_ld <= 0;
Detr_elr <= 0:
Dctr_cnt <= 0:
case (currents tate)
50, begin
L <= 0;
Dreg_clr <= 1;
nextstate <::: 51;
end
(continued in Figure 9.47)
or B or 5)
II laser off
II clear Dreg
The Veri log modul e behaviorall y describes the LaserDislMeasll rer's FSM. Similar 10
Lhe controll er design shown in Figure 9.23, the modul e consisLs of IWO procedures_ one
modeli ng the sLaLe regisler, the olher modeling Ihe FSM' s control logic. The state regiSler
procedure is sensi li ve 10 the po ilive edge of the reseL inpuI, r SI , and the positive edge of
the clock inpuI, elk. If the r SI inpul is enabled, Lhen the procedure asynchronously sets Lhe
ClirrellISIGle signal 10 Ihe FSM's inilial sLaLe, SO. OLherwise, on the ri sing edge of Lhe
clock, the procedure updaLes the Slale regisler wilh the neXI stale.
9.5 AlL Design USing Hardwaro Dascnptlon Languages -'117
The se ond procedure i, sen-
siti ve to the inputs to the
combi naLional logic of Figure
5.2 1. namely, the external
Band S. and the regi ter
output ClIrrelllSlate. When either
of those items change, the proce-
dure et the FSM's in
thi s case L, Dreg_clr. Dreg_ld.
DCIr J lr. and DCIr_CIII . \ ith the
appropriat e value for the current
state. In the controller example of
Figure 9.22. the FSM', output .f
was defined within the ca>e SlUte-
ment for all possi bl e states. With
five outputs thm mu>! be defined
in the LDM_Colllroller and five
possible tat es. as igning the
values to all outputs in each SLate
would be cumbersome. Funher-
more. finding a mi stnke und
making correcti ons or modifica-
Li ons to the controll er would
(conhnll6d from FtgUre 9 46)
Sl, be9in
Octr .c1r < 1; II clear count
if 10 I)
next.tAt.e.. 52:
_1
n xtat ... SJ:
eD4
S3, be9in
L <- I;
nex .tat..... S3;
eD4
S3, bGQln
L .. 0:
Oc:tr cot 1;
if (S I)
n xt.tate 64:
_1
n xt.tate S3r
oDd
S4, be9in
II I ... r on
/I lUG, oCt
II Count up
Dreg ld < t,
Dc r < 0:
next.tate <_ 51,
/I load Dr 11
oDd
_040
oDd
II .top countinQ
become very diffi cult in a larger Figure 9.47 Bchflvioml Veri log de,cripti on of In,er-bused
FSM consisting of more stat es dbt.n c ",casurer\ conlroli er (mlllilill cd).
and havi ng many more outputs.
Instead. the procedure uses a dif-
fe rent approach in which a default va lue for all the outputs i, fir" assigned "nd onl y the
devi aLions from the defaults are assigned later. The procedure fi rst aSlignl a default va lue
of 0 to all five outputs. The procedure then evaluates the current blate and assigns the
values to the outputs onl y when the output should be 1. nle procedure also assign!> Lhe
value 0 10 several signals wi Lh in Ihe case however, Ihese assignments are
included onl y to clearl y indicate Ihe behavior of the controll er (they ure redundant, bUI
help make Ihe descripli on easier 10 undersland).
The procedure also deLermines what the neXI Slale should be, based on the current
staLe and the values of inpuls Band S. The nexi Slale will be loaded into the SLale regisler
by the Slale regisLer procedure on Ihe nexl posi li ve clock edge.
"'ss
Hardwa re Description Languages
SystemC
Figure 9. -18 is a SystemC descrip-
ti on of the laser-based di stance
measurer shown in Fi gure 5. 19.
The module. named w serDisl-
Measurer. defines the inputs and
outputs. including a user-pressed
bUlion input B. a laser sensor
input S. a laser control output L,
and a 16-bit output D for the dis-
tance measured. The module also
defines a 300 MHz clock input
elk and reset input rsl for the
design 's controller.
The w serDislMeasurer struc-
rurall y describes the connections
of the cont roller and datapath
components. The architectu re
instanti ates two components.
LDM_Colllroller_1 is the con-
troll er for the laser-based di stance
measurer and LDM_Datapalh_1
is the datapath for thi s design.
The modul e co nnects th e
modul e's elk, r SI , B, and S inputs
to th e input s of LDM_
COlllroller _I and connects the
comroller's laser comrol output to
the corres ponding output pon L.
Additionall y, the four internal
wires, Dreg_elr. Dreg_Id, DClr_
elr. and DClr_cllI, connect the
controller's four control signal s to
the four inputs of LDM_Data-
'include "systemc .h"
linclude LDM_Controller . h"
'include "LDM_Datapath . h"
SC_MODOLE ( LaserDistMeasurer)
(
)
) ;
sc_ in<sc_ logic> elk. rst;
sc_ in<sc_ logic> B. S;
sc_out <sc_ logic> L;
sc_out<sc_ lv<16> > 0;
sc_ signal <sc_ logic> Dreg_cIr, Dreg_l d;
sc_ signal <8c_ logic> Detr_cIr. Detr_cnt j
LDM_Controller
LDM_Datapath LDt-LDatapath_l;
SC_ CTOR (LaserDistMeasurer) :
LDM_Controller_l (-LDM_Control ler_l-).
LDM_Datapath_ltLDM....Datapath_l )
LDM_Controller_l. clk (clk) ;
LDf>CController_l. rst (rst) ;
LDM_Controller_l.B(B) ;
LDM_Controller_l. S (5) ;
LDM_Controller_l . Dreg_clr (Dreg_clr) ;
LDM_ Controller_l. Dreg_ld (Dreg_ld) ;
LDM_Controller_ l . Dctr_clr (Dctr_ clr) ;
LDM_Controller_l. Dctr_cnt ( Dctr_cnt) ;
LDM_Datapath_l. elk (elk) ;
LDM_Datapath_l. Dreg_clr (Dreg_clr) ;
LDM_Da tapa th_l . Dreg_ld (Dreg_ld) ;
LDM_Datapath_l . Dctr_clr (Dctr_clr) ;
LDM_Datapath_l . Dctr_cnt (Deer_cnt) ;
LDM_Datapath_l.D{D) ;
Figure 9.48 Structural description of top-level SystemC
descripti on of laser-based di stance measurer.
palh_l . The LaserDislMeaslirer data path has a single out put D, providing the distance
measured, that is connected to the output pon D of the module.
9.5 RTl Design Using Hardware Description languages
489
. Figure 9.49 is a SystemC descrip-
tI on of the ulserDislMeasll rer's
datapath component shown in Figure
5. 17. The module, named LDM
Dawparh, define a clock input
four comrol inputs Dreg_c/r, Dreg_Id,
DClr_elr, and DClrJIII, and a 16-bit
distance output D.
.include
'include
.include
'include
systemc. h
upcounterl6 . h-
regl6.h
shiftrightone16 h-
8c_in<8c_loglc> elk;
8c_ln<8c_ logic> Dreg_clr.
8c_ln<8c_loglc> Dctr_clr.
8c_out <8c_lv<16> > D;
Dreg_ld;
Detr_ent;
8c_81gnal <8c lv<16> > tempC;
8c_ 81gnal <8c:=lv<16> > shiftC;
UpCounter16 Detr;
Reg16 Dreg;
ShiftRightOne16 Shi ftRight;
SC_CTOR (LDM_Datapath) :
)
);
Detr(-Detr-). Dreg(Dreg).
ShiftRight (. ShlftRight)
Dctr.clktclk) ;
Detr .clr (Dctr_clr) ;
Octr. cnt (Detr_ent) ;
Dctr.C(tempC) ;
ShiftRight. I (tempC) ;
ShiftRight.S (shiftC) ;
Dreg. I (shiftC) ;
Dreg . Q{D) ;
Dreg . clk{clk) ;
Dreg.elr(Dreg_clr) ;
Dreg .ld IDreg_ld);
Figure 9.49 Structural Sy temC de cription of the
laser-based distance measurer' datapath.
The datapath consists of two
components, a 16-bit up-counter, a
16-bit register, and a 16-bit ri ght
shIfter that shifts right by one posi-
tion. UpCollll lerl6 is a 16-bit up-
counter with a count control input
CII I and a count clear input clr. Regl6
is a 16-bit parall el load regi ster wi th
a register load control signal Id and a
register clear signal elr. ShifrRighl
0llel6 is a 16-bit right shifter that
shi fts the input I right by one posi-
tion and assigns the shi fted val ue
to the output S. The datapath module
instanti ates an UpColllll erl 6
component named DClr, a Regl 6
component named Dreg, and a
ShijrRighrOllel6 component named
ShifrRighr. The module connects the
datapath's DCTI'_elr and DCl rJIII
input to Dcrr's clear and count
control inputs, respect.ively. The
counter's count output C is then con-
nected to the 16-bit intemal signal
rempC that connects the count value
to the ShifrRighr shifter' S input. The shifted count value i then connected to the input
of the Dreg register usi ng the intemal signal shijrC. The module connects the Dreg reg-
ister's clear and load control inputs to the datapath's Dreg_elr and Dreg_Id input pons.
Finally, the register's data output Q is connected to LDM_darapalh's measured distance
output D.
-'90 Hardware Description languages
Figures 9.50 and 9.5 1 are the SysLemC descript ion of the laser-based distance mea-
surer's FSl\'1 controller described in Figure 5.21. The modu le, named LDM_Colllroller,
has a clock input elk. a reset signal rSI. a user-pressed bUll on input 8, a laser sensor input
S. and five output conLrol signals. L. Dreg_el,; Dreg_ld, Dell_clr. and Dclr_clll . The
output L is used to turn the laser on and off; where is L is 1, the laser is on. The four other
output signals are used to control the RTL des ign's daLapaLh component s.
Figure 9.50 Behavioral
SystemC descripti on of
la er-based di stance
measurer' s controller.
.include "system.h -
anum statetype { SQ, 51, 52, 5). 54 };
SC_MODULE (LDM_Controller)
(
sc_ in<sc_l.ogic> elk. rst, B. S;
sc_ out <sc_ logic> L;
se out<sc logic> Dreg_clr. Dreg_ld;
sc=out <sc=logic> Detr_clr. Detr_ent;
8c_ signal. <statetype> currents tate. nextstate;
SC_ CTOR CLDM_ Controller)
(
SC METHOD (statereg) ;
rst elk;
SC METHOD (comblogic) ;
se;;:sitive currents tate B S;
void statereg () {
if ( rst .read (1 == SC_LOGIC_l 1
currents tate SO: II initial state
else
eurrentstate nextstate;
void comblogic() {
L. write (SC_LOGIC_OI ;
Dreg_clr. write (5C_LOGIC_O) ;
Dreg_ld .wriU(SC_LOGIC_OI;
Detr elr. write (5C_LOGIC_0);
Detr=ent. write (5C_LOGIC_O);
switch (eurrentstate) {
case SO :
L. write (SC_LOGIC_OI;
Dreg_clr. write (SC_LOGIC_OI ;
nextstate .: 51;
break;
(continued in Figure 9.51)
II laser off
II clear Dreg
The SystemC module behaviorally describes the ulserDislMeasllrer 's FSM. Simil ar
to the comroller design shown in Figure 9.24, the module consists of two processes, one
modeling the SLaLe regi ter, the other modeli ng the FSM's conLrollogic. The state register
process. named slolereg. is sensitive to the positi ve edge of the reset input . rSl, and the
posiLive edge of the clock input, elk. If the rSI is enabled, then the process asynchronously
9.5 RTl Design Usi ng Hardware Description Languages
491
sets the curren/state to the FSM's initial state, SO. Otherwise, on the rising edge of Lhe
clock, the process updates the SLate regisLer with the llextslOle.
The second process, named eomblogie, is sensitive to the inputs to the combinaLional
logic of Fi gure 5.21 , namely, the external inputs 8 and S, and the state regi ter output eur-
relllstale. When ei ther of those signals change, the process sets the FSM' s outputs, in Lhis
case L, Dreg_ell; Dreg_ld, DClr_c1r, and Delr_ClI/, with the appropri ate value for Lhe
current state. In the controll er example of Figure 9.24, the FSM's output x was defined
wiLhin the case statement for all possibl e states. WiLh five outputs that we must define in
Lhe LDM_Collllvller and fi ve possi ble states, assigning the values to all outputs in each
state would be cumbersome. Funhermore, finding a mistake and making corrections or
modifi cati on to the controll er would become very difficult in a larger FSM Con isting of
more states and having many more outputs. Instead, the process uses a different approach
in whi ch a default value for the all Outputs is fi rst assigned and only the deviation from
the defaults are assigned later. The process first assigns a default value of 0 to all five out-
puts. The process then evaluates the current state and assigns the values to the outputs
onl y when the output should be I. The process also assigns the value 0 to several signals
within the eose statements; however, Lhese assignments are included only to clearly indi-
cate the behavior of the controller (Lhey are redundant, but help make the description
easier to understand).
Figure 9.51 Behavioral
SystclllC description of
Inser-based distance
controller
(collfill"edj.
(continued from Rgure 9.50)
)
I;
caae Sl
Octr_clr .write (SC_LOOIC_1); II clear count
if (B. read () == SC_LOGIC_ll
nextstate S2;
e18e
nextstate 51;
break:
eaa. S2 :
L. write (SC_LOGIC_l); II laser on
nextstate .: S3;
break;
caae S3:
L. write (SC_UX;rC_O); II laser off
Detr cnt. writ. (SC_LOGIC_l ) ; II count up
if (S. read () == SCLOGIC_11
nextstate 54;
e1a.
nextstate = S3;
break ;
caae S4:
oreg_ld. write (SC_LOGIC_11; 1/ load Dreg
Detr_cnt. writ. (5C_LOGIC_O) ; I I stop counting
nextstate = 51;
break; }
. what the next state should be_ based on the current state
Thc process also detmnme b
d . 111e next st3te will be loaded into the taLe regisLer y
and the values of inputs B an ..
h
L po-ttlve lock edge.
the stat c rcgista process on t nex ,
-'92 9 Hardware Description Languages
9.6 CHAPTER SUMMARY
In this chapter. we stated that hardware descripti on languages (HOLs) are widely used in
modem digi tal design. We provided brief introducti ons to several widely used HDLs,
namely. VHOL. Veri log and Systemc. We introduced those HDLs primarily through the
use of examples. illustrating how each HOL mi ght be used to describe combinational
logic. sequential logic. datapath components, as well as RTL behavi or and structure. To
become proficient at the use of HOLs, a more thorough study of a particular HOL might
be helpful. Thi s chapter a lso illustrat es the point that different HOLs have several
commononalilics.
9.7 EXERCISES
The following exercises can be completed using any of the HDLs described in this
chapter.
SECTIO 9.2: CO IBINATIONAL LOGIC DESCRIPTION USING HARDWARE
DESCRIPTION LA GUAGES
9.1 Create a structural HDL description of the binary number to seven-segment display descri bed
in Example 2.23. consisting of the simple logic gates. lllv. AND2, and OR2. Be sure to include
combi nati onal behavioral descriptions of the simple logic gates.
9.2 Create combinational behavioral HDL descript ions for each of the foll owing two- input logic
gates. where each logic gate has two inputs. a and h. and a single output F:
(a) NAND2.
(b) NOR2,
(c) XOR2.
(d) XNOR2.
9.3 (a) Create a combinational behavioral HDL descripti on of the three Is pattern detector of
Example 2.24.
(b) Create a testbench that checks that your description works properly.
9.4 (a) Create a combinati onal behavioral HDL description of the Number-of-ls counter shown
in Figure 2.4 I, by describing the combinati onal behavior of both outputs x and y in sum-
of-minterms form.
(b) Create a testbench that checks that your description works properly.
9.5 Create an HDL description of the 2x4 decoder shown in Figure 2.50, as:
(a) combi national behavior.
(b) structure.
(c) Create a testbench to test either descripti on (the same test bench can test either
description).
9.6 Create an HDL description of the 4x I mult iplexer descri bed in Figure 2.55, as:
(a) combinational behavior.
(b) structure.
(c) Create a testbench to test either description (the same test bench can test ei ther
description).
9.7 Create a behavioral HDL description of a 2x I
multIplexor described in Figure 2.54. Then.
create a HDL description that combines
three 2x I muluplexors to create a 4x I multi -
plexor as shown 10 Figure 9.52.
9.8 Create a combinational behavioral HDL descrip-
tI on . of an 8-bll 4x I multiplexor. Be Sure to
specIfy the design input and output pons usi ng a
multIple bll data type.
9.9 Clearly explain the difference between a struc-
tural HDL descripti on and a behavioral HDL
descripti on. Explain the benefits of using both
klOds of descriptions.
9.10 Explain why a combinational behavioral HDL
description must include all the combinational
ci rcui.l's inputs in a sensitivity li st. In particular.
explatn why omitting an input actually descri bes
a sequential circuit.
iO - f- iO
9.7 Exercises 493
4xl
il-f- il di\
sO I L iO
d - - d
i2 - iO d
U
i3 - it
SI
Figure 9.52 4x I multiplexor
composed of three 2x I mUltiplexor.;.
9.11 Create a behavioral HDL descripti on of a 16x4 priority encoder. The priority encoder has
16 1OPUts, dl 5, dl 4, : ... dl. dO, and four outputs e3, e2. el. eO. The priority encoder outputs a
4-bllblOary number IOdlcaMg whIch of the 16 inputs is a 1. If more than one input is a 1, the
pnonty encoder will output the bmary number for the highest numbered input.
SECTION 9.3: SEQUENTIAL LOGIC DESCRIPTION USING HARDWARE DESCRIP-
TION LANGUAGES
9.12 (a) Create a behavioral HDL description of a 32-bi t parallel load regi ster.
(b) Create a test bench to test the description.
9. 13 (a) Create behavioral HDL description of the FSM controll er for the improved code detector
described in Figure 3.46.
(b) Create a testbench to test the descripti on.
9.14 (a) Create a behavioral HDL descri ption of the button press synchronizer described in Fieure
3.53. -
(b) Create a testbench to test the description.
9.15 (a) Create a behaviroal HDL description of the secure car key controller described in Figures
3.57 and 3.58.
(b) Create a testbench to test the description.
SECTION 9.4: DATAPATH COMPOl'l'ENT DESCRIPTION USL'iG HARDWARE
DESCRIPTION LANGUAGES
9. 16 (a) Create behavioral HDL description of an 8-bi t parallel load register with register clear
input e1r.
(b) Create a testbench to test the description.
9.17 (a) Create a behavioral HDL description of an 8-bit parallel load register with a clear 10\\
input cJr_1 and a set high input When the e1r.) input is 1. the register contents
should be cleared to "00000000 . When the stU, IOputs IS 1. the registers contents
should be set to "11111111". If both inputs are I. the lear low input has priority.
(b) Create a testbench to test the description.
-
--
-19-1 Hardware Description Languages
9.18 Create a behavioral HDL description of an 8-bit
register with IwO control inputs sO and sl wi th the
following control behavior described in Figure 9.53.
9.19 Create a structura l HDL descri ption of a half-adder.
9.20 Create a structural HDL descripti on of a 4-bit carry-
ripple adder wi thout a carry input. First create a
behavioral description of a full-adder, and then use
the fuJI-adder component in your carry-rippl e adder
description.
sl sO Operation
0 0 Maintain present value
0 1 Parallel load
0 Shift right
Rotate right
Figur. 9.53 Operati on table of the
S-bi t register fo r Exercise 9. IS.
9.21 Create a behavioral HDL descripti on of the approxi lllnte Celsius- to-Fahrenheit convener
described in Figure 4.40.
9.22 Create a behavioral HDL description of an approximate Fahrenheil-lo-Celsi us converter usi ng
the following approxi mation for the conversion: C ; (F - 32) /2 .
9.23 (a) Create a behavioral HDL descripti on of a I-bit comparator.
(b) Create a structural description of a 4-bit comparator. using the I-bit comparators.
Create a behavioral HDL description of a 32-bit equality comparator with three 8-bit inputs a,
h. and c.
9.25 Create a structural HDL descri pti on of the up-dawn-counter circui t described in Figure 4.55.
Be sure to first creme a behavioral HDL description of each component used in your structural
HDL design.
9.26 Create a structural HDL description of a 4-bit down-counter with parall el load. Be sure to first
create a behavioral HDL descripti on of each component used in your structural HDL design.
9.27 Create a structural HDL descript ion of the RGB to CMYK converter described in Figure 4.68.
Be sure to first create a behavioral HDL description of each component used in your structural
HDL design.
9.28 Create a structural HDL descript ion of a CMYK to RGB converter. Hint: Use the information
presented in Example 4.20 describing the RGB to CMYK converter to assist in designing the
CMYK to RGB converter.
9.29 Create a structural HDL description of a 4-bit adder/subt.ractor circuit. Be sure to first create a
behavioral HDL description of each component used in your structural HDL design.
SECTION 9.5: RTL DESIGN USING HARDWARE DESCRIPTION LANGUAGES
9.30 Create a behavioral HDL description of the high-level state machine for the simple bus inter-
face shown in Figure 5.24.
9.31 Create a structural HDL descript ion of the controller/datapath for si mpl e bus interface shown
in Figure 5.26.
9.32 Create a behavioral HDL descripti on of the high-level tate machine for the sum-of-absolute-
differences component shown in Figure 5.29.
9.33 Create a structural HDL description of the controll er/datapath design of the sum-of-absolute-
differences component shown in Figure 5.30.
Create an RTL design of a reaction timer circuit that measures the time elapsed between the
illumination of a light , and the pre sing of a button by the user. The reacti on timer has three
input" a clock input elk. a reset input rst. and a button input B. and three output s. a li ght
enable output lell. a IO-bit reaction time output rt;me. and a .'ilow output indicating the user
was not fast enough. The reaction timer works as follows. On reset. the reaction timer waits
for 2 seconds before illuminating the light by ;elti ng l ell to l. The reaction limer then
9.7 Exercises 495
measures the length of lime in milli seconds
the time as a 10-bit binary numbe . Ore the user presses the button B. outputting
r on rtlme. if the user did l . L. b . .
I second (1000 milli seconds) the react" .. no press ",e utton Wtthln
on rrill/e. Assume a clock wlil set the output sl ow to 1 and output 1000
level state machine in an HDL (b) C z. (a) Start by captunng the design using a high-
path descripti on in an HDL. onven the high-level state machine to a controUer/data-
9.35 Starting from the C description shown in F 9
Common Di visor (GCD) calculator that RTL design
d
of a Greatest
input go, and a 16-bit output D. When the go is '1' the a an h, an enable
greatest common di visor and output the GCD on th." output D Sct a cu atohr whlil cholmpute the
h" HD . an Wit a Ig - evel state
mllac
h
In an I L. and then create an HDL implementat ion with a datapath controller and
a t clr Intema components. . ,
GCD(uint a, uint b) II not quite C syntax
while ( a ! = b )
ifla>b)(
a = a - b;
else (
b = b - a;
return(a) ;
Figure 9.54 C program description of a greatest common divisor calculator.
496
A
Boolean Algebras
This appelldix is reproduced lVilh permissioll from Ihe l exlbook " I IlIIVduClioll 10 Digital
Syslems" by rcegolloc. Lallg. alld Morello, ISBN 0471-52799-8, Johll Wiley alld SOil S
publishers, 1999.
Boolean algebras is an imponant class of algebras that has been studied and used exten-
sively for many purposes (see SecLion A.5). The switching algebra, used in the
description of switching expressions discussed in Section 2.4, is an instance (an element)
of the cia s of Boolean algebras. Consequentl y, theorems developed for Boolean algebras
are also appli cable to switching algebra, so they can be used for the transformati on of
switching expressions. Moreover, cenain ident iti es from Boolean algebra are the basi s for
the graphical and tabular techniques used for the minimization of swi tching expressions.
In this appendix. we present the definition of Boolean algebras as well as Lheorems
that are useful for the Lransformation of Boolean expressions. We also show the relaLion-
ship among Boolean and switching algebras; in panicular. we show that the swit ching
algebra satisfies the postulate of a Boolean algebra. We also sketch ot her examples of
Boolean algebras, which are helpful to funher understand the propenies of thi s class of
algebras.
A 1 BOOLEAN ALGEBRA
A Boolean algebr a is a tuple {B. +, . }, where
B is a et of elements:
+ and . are binary operat ions applied over the elements of B,
saLi sfyi ng the following postulates:
PI: If a, b e B, Lhen
(i) a + b = b + a
(ii) a . b = b . a
That is, + and are commutati ve.
P2: If a, b. c e B. then
(i) a + (b . c) = (a + b) . (0 + c)
(ii ) a . (b + c) = (a . b) + (0 . c)
A.2 Switching Algebra 4 497
P3: The set B has two di sti nct identit ele
every element in B y ments, denoted as 0 and I, such that for
(i) 0 + a = a + 0 = a
(ii) I . a = a . I = a
The elements 0 and I are call ed the additive' d n I
' d n I . I en t ye ement and the multiplicati ve
t en t ye ement, respecti vely. (These elements should t b f . .
gers 0 and I.) no e con used wl Lh the Inte
P4: For every element " e B there exists an element a' called th I f
a, such that . e comp ement 0
(i) a+a' = I
(ii) a a' = 0
. The symbol s + and should not be confused with the arithmetic addition and multi.
pltcatlOn However, for convenience + and are often called "plus" and 'times.'
and the expressIons a + b and. a b are called "sum" and "product ,. . I M
over, + and are also call ed "OR" and " AND," respectively.
, respecuve y. ore-
The elements the set B are call ed constants. Symbols representing arbitrary ele-
ments of B are variables. The symbols a, b, and c in the postulates above are variables.
whereas 0 and I are constants.
A precedence ordering i defi ned on the operators: has precedence over +. there-
fore, parentheses can be eliminated from product . Moreover, whenever single symbols
are used for van abies. the symbol can be eliminated in products. For example.
a + (b c) can be written as a + bc
A.2 SWITCHING ALGEBRA
Switching algebra is an algebraic system used to describe swi tching functions by means
of swi tching expressions. In this sense. a swi tching algebra serve the same role for
switching func ti ons as the ordinary algebra does for arithmeLic functions.
The swi tching algebra of the set of two elements B = {O. I}. and two operations AND
and OR defined as foll ows:
AND 0
o 0
o
OR 0
o 0
These operation ' are used to evaluate switching expressions. as indicated in ;,cuon
T heorem I
The swi tching algebra i a Boolean algebra.
Proof We how that the switching algebra saLisfies the postulate of a B lean al"ebra.
-'98 A Boolean Algebras
PI: Commutati vity of C+). C, ). Thi s is shown by inspect ion of the operation tables.
The commutativi ty property holds if a tabl e is symmeLric about the main
diagonal.
P2: DistributivUy of (+) and (' ). Shown by perfect induction. thaL is. by consid-
ering al l possibl e values for the elements 0 , b, and e. Consider the foll owing
table:
abc a + be (a + b)(a + c)
000 0 0
00 1 0 0
010 0 0
011 I I
100 I I
101 I I
11 0 I I
II I I I
Because a + be = (0 + bleb + e) for all cases. P2(i) is saLi sfied. A similar
proof shows that P2(ii) is also saLisfied.
P3: Existence of additive and multiplicati ve identi ty element . From the operation
Lables
0+ 1=1+0=
Therefore, 0 is the additive identity. Similarl y
0 1 =1 0=0
so that I is the multiplicative identiLY
P4: Existence of the complement. By perfect inducLion:
a a' o +a' (J-a
I 0 I 0
0 J J 0
Consequentl y, I is the complement of 0 and 0 is Ihe complement of I.
Because all postulates are saLisfied, the switchi ng algebra is a Boolean algebra. As a
result. all theorems true for Boolean algebras arc also true for the switching algebra.
A.3 IMPORTANT THEOREMS IN BOOLEAN ALGEBRA
We now present some importam theorems in Boolean algebra; these theorems can be
applied to the lran,formati on of switching expressions.
A.3 Important Theorems in Boolean Algebra 499
Theorem 2 Principle of Duality
Every algebraic idenLity deducible from th
if e postul aLes of a Boolean algebra remains valid
Lhe operati ons + and are intercha d h
. . nge L roughout; and
Lhe IdentIty elemenLs 0 and I are at h
so mterc anged throughout
Proof The proof foll ows at once from Lh f h
anoLher one (Lhe dual ) that is obtai ned by . acht L at for each of the postul aLes there is
m erc angmg + and . as well as 0 and I .
Thi s Lheorem is useful because it reduces the nu be f .
be proven: every theorem has its dual. m r 0 dIfferent Lheorems that must
Theorem 3
Every element in B has a unique complement .
Proof Let a E B; let us assume that a' and a' b h
. I 2 are Ol complements of a. Then.
uSlOg the postul ates we can perfonm the following transfonmaLi ons:
a'
t
= a'i' I by P3(ii) (identity)
= a'] . (a + a'2) by hypothesis
(a'2 is the complement of a)
= 0 '1 . a + 0 '1 . 0'2 by P2(ii) (distributivity)
= a . a'i + 0'1 . 0 '2 by PI (ii) (commutativity)
= 0 + a'l . a'2 by hypothesis (a'i is the complement of a)
= a'i . 0'2 by P3(i) (identity)
Changing the index I for 2 and vice versa, and repeating all steps for a' 2' we get
0'2 = 0 '2' 0'1
= a'l . a'l by PI (ii)
and therefore a' 2 = a'i .
The uniqueness of the complement of an element allows considering as a unary
operation called complementation.
Theorem 4
For any a E B:
l.a+ l=
2. a 0 = 0
Proof Using the postulates. we can perfonm the following rransfomlations:
-
SOO A Boolean Algebras
b)
a+1 = I (11+ I ) PJ(i,)
= (11+11') (II + I ) Pl(i)
= (I + (a' I) P2(,)
= a+a
P3(II)
= I P-l (,)
C"," (2): by
a 0 0+(1 0) P3(i)
(II (1')+(11 0) P-l (II)
= 1I
(11'+0) P2(II)
= 1I II' 1'3( ,)
= 0
P4(,i )
C '.'e (2) can al,o be proven by me:,," of en,e ( I ) and the principle of dualil) .
Theorem 5
The compl emenl of Ihe clelllent I i, O. and vice \er;J. That ".
I. 0' = I
2. I ' = 0
I'roof Oy Theorelll 4.
0+ 1 = I and
O I = 0
Because. by Theorem 3. Ihe complement of an elcment is unique. Theorem 5 follows.
Theorem 6 Idem polen I Law
For every a E B
I. a+a = (I
2. o (I = a
Proof
( I ):
(2) : dualil y
0+0=(0+0) 1
= (a+o) (a+a')
= ( a + (a . a'
= a +O
=0
by
P3(i i)
P-l (i)
P2(i)
P4(ii )
P3(i)
Theorem 7 Involulion La"
For every a E 8 .
A.3 Important Theorems ,n Booloan Algobra
SOl
(a')' = tl
Proof From the defi nili n f no !cmelll "
by Theorem 3. Ihe complement of p I (II) ,ond (J arc bolh coonpl 'l1\ nh of II ' OUI.
nn c eonenl " un'que. "h,eh prove.' Ihe Ihcorelll .
Theorem H Absorption Law
For every pair f elemen15 a. b E B.
1.(I+ o b=o
2. (I (a+b) = a
Proof
( I ):
b
(I + ab = lI ' I + tlb P3(i ,)
(2): dualil Y
Theorem 9
= a( I + b) 1'2(i ,)
= a(b+ I ) PI(i)
= (1 1 Theorem 4 ( I )
= 1I P3(ii)
For every pai r of elemenls a, b E 8.
J. (I+o' b = (I+b
2. o(a' + b) = ab
Proof
( I ):
a+a' b = (a +lI')(a+b)
= I (a +b)
(2): dualilY
Theorem 10
= a +b
by
P2(i)
P4(i)
P3(ii)
In a Boolean algebra, each of the binary operalions (+ ) and (. ) is associalive. ThaI is. for
every a, b, e E 8 ,
I. a+(b+e) = (a+ b)+e
2. a(be) = (ab)e
The proof of this Iheorem is quile lenglhy. The interesled reader should consult Ihe
further readings suggesled al Ihe end of Ihis appendix.
..
t
11
B
d i Ion I In
u r 1 b S
[XAMPLE B 1
(\'II\l'll Ih,' 1I11111"'r
I II ,,'r-llll 11 thl
,.
,.
pomt to r e ~ n t the number In
a tinHe number of l ~ auJlablc III
need to be truncated and the b1n.wy
y
508 B Additional TopIcs In Binary Number Systems
B.3 FI XED POINT ARITHMETIC
If " e tix the bi nary point of a real number in a certain posi-
ti on in the number (e.g .. after the -lth bit). we can add or
subtract binary real number by treating the numbers as inte-
gers and adding or subtracting normall y. In the resulting sum
or differen e. we maintain the binary point's positi on. For
I I
+ 00 1
I I
00 1
111
1 1 1 . 1
example. a!.sume we are worki ng wit h S- bit numbers with figure B.4 Adding two
half of the bits used to represent the fract ional part of the (i x"d poinl numbers.
number. If we wat1led to add 1001. 00 10 (9. 125) and
0011 . 1111 (3.9375). we can simpl y add the two number a
if thev were it1le2ers. The sum. shown in Fi gure B.4. Can be convert ed back to a real
by maint;ining the binary poit1l's posi ti on within the sum. Converting the sum to
decimal verifies that the calcul ation was correct: 1*2
3
+ 1*22 + o*i + 1*2
0
+ 0*2.
1
+
0*2" + 0*2.
3
+ 1*2-4 = 8 + 4 + I + 0.0625 = 13.0625.
Multipl ying binary real numbers is also straightforward
and does not require that the binary poit1l be fixed. We first
multiply the two numbers as if they were integer. Second. we
pl ace a binary point in the product such that the precision of the
product i the sum of the precisions of the multiplicand and
multipli er (the two numbers being multiplied). just like what is
done when we multiply twO decimal numbers together. Figure
B.5 shows how we might multipl y the binary numbers 01.10
01.10
x 1 1.0 1
1 1
1 1
+ 1 1
1 00 . 1 1 1
figure B.5 Multiplying
( 1.5) ard 11.0 1 (3.25) using the partial product method 111' 0 fixed poinl numbers.
described in Section 4.7. After we calculate the product of the
two numbers. we place a binary point in the appropriate loca-
ti on. Both the multiplier and multiplicand feature two bit. of precision. therefore
the product must have four bits of precision. and we insert a binary point to reftect thi s.
Convening the product to deci mal veri fie that the calculation was correct : 0*2
3
+ 1*22 +
0*2
1
+ 0*2
0
+ 1*2.
1
+ 1*2" + 1*2.
3
+ 0*2-4 = -l + 0.5 + 0.25 + 0. 125 = 4.875.
The pre"i ou, example was conveni ent
in that we never had to add four Is
IOgether in a column when we summed up
the pani al product; . To make the caleula-
li ons simpler and to all ow for the partial
product ; ummation to be implemented
u>ing full -adder,. whi ch can onl y add three
I , at a time. we add the pani al products
incrementall y in>tead of all at once. For
exampl e. let \ multiply 1110 . I ( 14.5) by
(C' III . I ) 7.5. , een in Fi gure B.6. we
1110. 1 multiplicand
x 1 1 1.1 multiplier
1 1 101 pat1ial product 1 (ppl)
+ 11101 pp2
1010111 ppshppl+pp2
+11101 pp3
71-71";0'-:0:-'1:-0::-:-1-=-1- pps2 ; pps 1 + pp3
:;:+..;.1-,;1,..;1,..,0;.-;,.1"...,-..,-_pp4
1 1 0110011 pps3 ; pps2+pp4
;,-+-7.0;,-;;-:",,;-;;--;;-;..,-_ pp5
01 101 100. 1 1 product; pps3+ pp5
begtn by generattng panial products as we Figure B.6 MultIplying IWO fixed poinl
did earli er However. we add partial prod- numbe" u, ing inl Crrl1cdtnlc partial product' .
Ut t; Immediately Int o p,lnial product
labeled PI" In the fi gure. Eventually. we "ind that the product i, 0110 1100 . 11, whi h
corre'JlIlI1d, to 'he correct an>wer. 108.75. You may want to try adding the five partial
B.4 Floating Point Representation 509
product s together at once instead of using the intermediate panial product sums to see
why thi s method i useful.
Before proceeding to binary real number division, we will introduce binary integer
divi sion. which was nOl di scussed in previous chapters.
We can use the familiar process of long divi
sian to di vide two binary integers. For example,
consider the binary divi sion of 1011 00 (44) by
10 (2). The full calculation is shown in Figure
B.7. NOlice how the procedure is exactly the
same as decimal long division except that the
numbers are now in binary.
Dividing binary real numbers, like multipli-
cation, also does not require that the binary point
be fixed. However. to simplify the calculation, we
shift both the dividend and divisor's binary point
right until the divisor no longer has a fractional
pan. For example. consider the division of 1 . 01
2
( 1.25) by 0.1
2
(0.5). The divisor. 0. 12' has one
divisor 1
1 1 1 quotient
Ojl 01 10 dividend
1
-0
1 1
-1
1
-0
o remainder
digit in its fractional pan. therefore we shift the Figure B.7 Di viding Iwo binary
dividend and divisor binary points right by one integers using long divi sion.
di git. changing our problem to 10. I, divided by
1, . We now treat the numbers as integers (ignoring the binary point) and can divide them
the long division approach. Trivially. 101/1 2 is 101 2, We then restore the binary
poin7 to where it was in the dividend. giving us the answer 10 .12 or 2.5.
Why does shifting the binary pomt not change. the. answer? Ln general , hifting the
radix point ri ght by one digit is the a . multlplymg the number by its base. For
binary numbers. shifting the binary pomt rtght IS equivalent to multtplytng the number by
2. Di vidino twO numbers will give you the rallO of the two numbers to each other. Multi-
plying numbers by the same number (by meansof the binary point) will
nOl affect that ratio. since doing a IS equivalent to muillplytng the ral10 by I.
Fi 'ed point numbers are simple to work with. but are limited in the range of numbers
that they can represent. For a fixed number of bits. tncreastng the preci ion of a number
comes at the ex pen e of the range of whole numbers that we can use. and vice versa.
Fixed point numbers are suitable a variery of uch as a digital
eter. but more demanding appllcaltons need greater ftextblhty and range in tbetr real
number . .
B.4 FLOATING POINT REPRESENTATION
\ hen " orking "ith decimal numbers. we often vcry large or very small
b
n tation. Rather than wntmg a googol as a I with a hundred
numbers y t"lng !Xl 99 9 -
o . f' . '" "rite 1.0' JOI . of - .7 _A m/s. we could write the
, ,I tcc II. C S 1 99 ' 10' or even 299. *10".
, 'cd of li ght u, .1.0*10 m/, . as - ' ' .
, pe . ' . Id be transl3ted into btnary. we would be able to tore a mu h
II <11 h nOllltlon (au .' , fi .
be urlll if the POlllt "ere xed. What feature of thiS nota-
grealcr range! of ntlm . . re entari n'!
lion need I\) be l:tptured In :1 blnar) rep
510 B Additional TopIcs in Binary Number Systems
First is the whole and fractional pan of the
number being multipl ied by a power of 10. which
is called the malllissa (or sigllificalld). as shown in
Figure B .. We do not need to store the whole pan
of the number if we make sure the number is in a
+ 3.0 * 10
8
; --'" \ \ exponent
sign mantissa base
cenain fonn. We call a number wrillen in scientific Figure B.B Parts of a number in
notat ion lIormali:ed if the whole part of the scientific notation.
number is greater than 0 but less than the base. In
the previOl;s speed of light examples, 3.0* 10
8
and
2.998*IO
s
are normali zed since 3 and 2. respectively. are greater than zero but less than
10. The number 299.8* 10
6
on the other hand, is nOl normali zed. If a binary real number
is noml ali zed. then the whole part of the manti ssa can onl y be a 1. To save bits. we can
assume that the whole pan of the significand is I and slOre onl y the fractional pan.
econd is the base (somet imes referred to as the radix) and the exponent by which
the mantissa is multiplied. shown in Figure B.8. Calling 10 the base is no accident - the
number is the same as the ba e of the entire number. In binary. the base is naturally 2.
Knowing thi s. we do not need to store the 2. We can simply assume that 2 is the base and
SlOre the exponent.
Third. we must capture the sign of the number.
The IEEE 754-1985 Standard
The Institute of Electrical and Electronic Engi neers (IEEE) 754- 1985 standard specifi es a
way in which the three values described above can be represent ed in a 32-bit or a 64-bit
binary number. referred 10 as single and double precision. respectively. Though there are
other way to represent real numbers. the IEEE standard is by far the most widely used.
We refer 10 these numbers as f/oatillg poillt numbers.
The IEEE standard a signs a
o cenain range of bits for each of bit l31 130 129 1. 124 123122 121 I .
the three val ues. For 32-bit num- ... a-L...-'
bers. the fi rst-most significant-
Figure B.9 Bit arrangement in a 32-bit Hoating poi nt
bit >pecifie; the sign. followed by number.
bit for the exponent. and the
remaining 23 bits are ued for the mantissa. Thi arrangement is piclUred in Figure B.9.
The sign bit is set to 0 if the number is positive. and the bi t is set 10 I if the number is
negative. The manti<sa bits are set 10 the fract ional pan of the mantissa in the ori ginal
number. For example. if the manti sa is 1 . 1011. we would store 1011 foll owed by 19
zeroe, in bits 22 to O. As part of the standard, we add 127 to Ule exponent we slOre in the
exponent bits. Therefore. if a fl oating point number's exponent is 3. we wou ld store 130 in
the exponent bits. If the exponent -30. we would store 97 in the exponent bits. The
adju;ted number i, call ed a hiased exponent. Exponent bits conlaining all Os or all 1 s have
'pecl3l meanings and cannot be used. Under these condi tions. the range of biased exponents
we can wnte in the exponent bi ts is I to 254, meaning the range of unbiased exponents is
- 126 to 127. Why don'l we .. imply store the exponent a< a signed, IWO'S complement number
(di'>Cu,'>Cd In Section 4.8)? Becau,e itlUms out thai biasing the exponent resulL< in impler
circuitry for cornpanng the magnilUde (absolute value) of IWO noming poinl numbers.
EXAMPLE 8.2
B.4 Floating Point Representation
511
The IEEE standard defines cenain special values if the .
are u",form. When the exponent bits are ali a' . contents of the exponent bits
s, two poSSibilities occur:
I. If the mantissa bits are all as then the e t' be
I n Ire nurn r evaluates to zero
2. If the mantissa bits are nonzero, then the number is nOl . . .
whole pan of the mantissa is a binary zero and not a one IS, the
When the exponent bits are all 1s, two possibilites occur:
I. If the bits are all as, then the entire number evaluates t _ . fi .
dependmg on the sign bit. a + or m , mry,
2. If the manti ssa bi ts are nonzero then the emire " b r" . .
number (NaN). ' , num e IS clasSified as not a
There are also speci fi c classes of NaNs, beyond the scope of tho .
used in computations involving NaNs. tS appendtx, that are
Wi th thi s information, we can conven decimal real numbers t ft .
. . a oatmg pomt num-
bers. Assuming the deCimal number to be convened is not a spect'al I ., fl ' .
.' va ue In oatIng pomt
notation, Table B.2 descnbe how to perform the conversion.
TABLE B.2 Method for converting real decimal numbers to floating point
2
3
Step Description
Convert the 'Illmher from base Use the melhod described in Seclion B.2.
10 to base 2.
COIwert 'he "umber 10
1lormali:ed scientific notatioll.
Fill ill the bit fields.
Initial ly multiply the number by i'. Adjust the binary point
and exponent so that the whole part of the number is I,.
Set the sign. biased exponent. and mantissa bits
appropriately.
Converting decimal real numbers to floating point
Conven the foll owi ng numbers from decimal to IEEE 754 32-bit floating point: 9.5. infinity. and
-52406.25 10".
Let's follow the procedure in Thble B.2 to convert 9.5 to. floating point. In tep L we COm'en
9.5 to binary. Using the subtracuon method. we find that 9.5 IS 1001 . 1 in binary. To com-en the
number to scientific per "';I' 2. we muluply the numbe: by 1'. giving 1001.1 _0 (for
readabilit), purposes. we WIlte the 2 pan In base 10). To nonnahze the number. we must shift the
binary poi nl left by three digilS. In order to not change the value of the number after movino the
binary point. we change the 2's exponent t.o 3. After step 2. our number becomes 1 _ 00 11 }
In step 3. we put everything together Into the properly fonnalled sequence of bits. The ion bit
is set to O. indicating n positive number,. The bits are 3 + 127:: I '''0 (we must bi:s the
exponent) in bina,). and the mantissa bllS areset to 0?11 ". which IS. the fra tional part of the man-
ti >sa. Remember that the 1 to the left of the blnar)' pomtlS Imphed In e the number is normalized.
TIle properly encoded number is hown m Figure B.IO.
rna-
511 B AdditIOnal TopI CS in Binary Number Systems
EXAMPLE B.3
Nm\ let' :, conven infi nity 10 a
Hoallng polill number. Since infinit y
IS.I special \alue. \\c cannol employ
the method" e used 10 om'en 9.5 10
floaling point. Rather. we 1111 in Ihe
three bit Iklds with :,pccial values
indicating that the number is infinit y.
From the discussion of special values
abo\c. we know that the exponent
bit s should be all I s and the mantissa
bits should be aliOs. The sign bit
should be 0 since infinit y is positive.
Therefore. the equivalent fl oating
poin! number is 0 11111111
00000000000000000000000.
Convening -52-l06.25 ,. 10-
2
to floating point is straightforward
u,ing the method in Table B.2. For
step I . \ ..' e conven the number to
binary. Recall that we represent the
Step 1: Conven to binary
9.510 <=> 1001.1 2
Step 2: Conven to normalized scientilic notation
1001.1 <=> 1001.1 20<=> 1.0011 ' 2
3
To normalize. move binary
point 3 digits left & add 3 to exponent
.Q. 10000010 00110000000000000000000
sign exponent mantissa
(biased)
Figure B.l0 Represenling 9.5 as a 32-bit Roating point
number. most significant bit first.
sign of the number using a single bit and not using two' s compl ement representation. so we
only need to com'en 52406.25 * 10" to binary and set the sign bit to indicate that the number
is negative. The number 52-106.25 * 10" evaluates to 524.0625. Using the subtraction or divide-
by-2 method we know that 52-1 i 1000001100 in binary. The fracti onal part. 0.0625. is con-
\'eniently 2-<. Thu 52-1.0625 is 100000 11 00 . 0001 in binary. In step 2. we write the number
in scientifi c notation: 1000001100 . 0001 * 20. We must also normali ze the number by
shifting the binary point left by 9 digits and compensating for thi s shift in the exponent:
1 . 000001 100000 1 * 29. Finall y. we combine the sign ( I since the original number is nega-
tive). biased exponent (9+ 127= 136). and fracti onal part of the mantissa into a noating point
number: I 10001000 00000110000010000000000.
Convertmg floating pomt numbers to decimal
Comen the number 1100 10 11101010100000000000000000 from IEEE 754 32-bi t fl oating
point (0 decimal.
To perform conversion. we first split the number into its sign. exponent . and mantissa
pan.<: I 1001011 1 01010100000000000000000. We can immediately see from the sign bit
that the number is negative.
Next . we convert the 8-bi t exponent and 23-bi t manti ssa from binary to decimal. We find that
1(1)101 I I IS 151. We unbi as the exponent by subtracting 127 from 151. givi ng an unbiased expo-
nent of 24. Recall that the manti ssa in the pattern of bits represents the fractional part of the
manu"a and I< 'tared Without the leading 1 from the whole part of the manti ssa (assuming the
oTl glnal number wa, normalilOd). Restoring the I and adding a binary point gives us the number
J.f)JOIOI()()()()(){)OO(. whic h is the ,arne number as 1.010101. By applying weights to
each di git . "'. ,ee that 1.010101 = ,za + 0*2" + 1*2.
2
+ 0*2' ] + 1*2-< + 0*2'
s
+ , . 2.
6
=
t
Wi th the oTl glnal Ign. exponent. and manti ssa extracted. we can combine them into a single
numller - I 327125 2". We can multiply the number out to -22.265.462.784. which is equivalent
tll -2221)5-162784 If)'
B.4 Floating Point Representation 513
The format for double preci sion
(64- bit) floating point numbers i bil [
63
1
62
161 I .. 1 53 152151 Iso I. 1 1 I 0 1
similar, with three fields having a Sign exponent
mantissa
defined number of bits. The first Figure B.ll B"
mo t significan! bit represents number. II arrangement in a 64-bit Roating point
sign of the number. The next I I
bits hold the biased exponent and the remaining 52 bi hi '
manti ssa. AdditIOnally. we add 10?3 to th . ts 0 d the fractIOnal pan of the
exponent. Thi s arrangement is in Instead of 127 to form the biased
Floal ing I'oi nt Ari thmetic
Floating poin! arithmeli c is beyond the scope of thi s text, but we'l . .
vtew of the concept. Wt I prOVIde a bnef over-
Floating poin! addition and subtraction must be performed b fi "
fl oating point numbers so that their exponents are I F Y rst aitgllmg the two
the two decimal numbers? 5?*leY + I 44*10' S. equa. Or example, consider adding
-. - . . Ince the exponents d''''' h
2.52* 10
2
to 0.0252* 10" Adding 0.0252*10' and *' IlIer.we canc ange
I 46
-2* 10' S ' 1 1.44 10 gIVes us the answer
.. ) . Iml arl y, we could have changed 1.44* I 0' t 144* 02. >
* ' . 2 0 I . Addmo 144* 10-and
2.52 10- gtves us the sum 146.52*10. which is the sarne be 0
I
. I " num r as our first set of calcu-
atlOns. An ana ogous situatIOn occurs when we work w'th fl ' .
Typicall y, hardware that performs Hoating poin! arithmetic O'ft PO'"t
. . '11 d" ' en re.erred to as aJWatmg
pO/l11 1/1111. WI a Just the mantt ssa of the number with th all
. . '. e sm er exponent before
addtng or subtracting the manussas (with their implied I s res d)
. . tore tooether and pre-
servtng the common exponen!. Notice Ihat before the addition or
o
subtraction is
performed. the exponents of the two numbers are compared Th' . . ..
" . . tS COmpanson tS facili-
tated through the us: of Ihe sIgn bit and the biased exponent as opposed to re reseorino
the exponent In twO s complement form. p e
. Multiplication and division in Hoating point require no uch alignments. Like in
deCimal multiplication and d"" ton of numbers in scientific notation: we multi I or
divide the manti ssas and add or subtract the two exponent depend' th p Y
. . . mg on e operauon.
When multIpl ying. we add exponents. For exao:ple, let's multiply 6.-14* 107 by 5.0* I 0-3.
Instead of trying to multiply 6-1.-100.000 by 0.00). we mUltiply the two m U a th
644*- O' 32? . an ssas tO"e er
and add the exponents.. ). IS . - and 7+(-3) tS -I . Thus the answer is 3_.2*10'.
\ hen di\ iding. we subtract the exponent of the e1ivi or from the e1i\ ' d d'
. d' ' d 31 - *10'" (d" d d) I' en exponenL
For example. let s IVI e.) tVI en by 2.0' 10- - (divisor). D' v' di 031 - b
7
- b . h d" , I I n_ .) y
2.0 gives us 15. ) . u tmctlng t e S e>;ponem from the dividend's - gives us
- 1-{- 12)=8. Thus the an wer IS 15.7) *10 . Floating point divi ion defines ";ults for
several boundary Ilses as d,vtdlng by O. evaluates to po iti"e Or negative
infinity. depending on the of th.e diVidend. Dlvtdtng a nonzero number bv infinity is
defined :l, O. othet'\\ ise d,vldmg by tJlfimty tS -
-
514 B Additional Topics in Binary Number Systems
B.5 EXERCISES
SECTION n.2: REAL ' UMBER REPRE ENTATION
I. Convert the following from decimal (0 binary:
(a) 1. 5
(b) 3.125
(c) 8.25
(d) 7.75
2. Convert the foll owing numbers from decimal to bi nary:
(a) 9.375
(b) 2.4375
(c) 5.65625
(d) 15.5703 125
SECTION n.3: FIXED POINT ARITHMETIC
J. Add Ihe foll owing IWO un;igned binary numbers u ing binary additi on and convertlhe result to
dec imal:
(a) 1011 1. 001 + 1010.110
(b) 01101 . 100+10100 . 101
(c) 10110.I+llO. Oll
(d) 1101. 111 + 10011 . 0111
SECTION B.4: FLOATI G POINT REPRESENTATION
Convert Ihe foll owing decimal numbers to J2-bil noating point:
(a) - 50.208
(b) 10'
(c) - 24.55 1.152 10'"
(d) 0
5. Convert the following 32-bit naming point numbers to decimal :
(a) 010011000101 10110101 100001011 000
(b) 01001100010110 11 0101001000000000
01111111111000 11 0000000000000000
(d) 01001101000 110101000101000000000
Extended RTL Design
Example
C.l INTRODUCTION
In Chapter 5, we performed RTL design of a soda di spenser processor. We ,tuned with a
high-level state machine, created the datnpath's structure, and then described tile on-
troll er using a finite-state machine. We did not further design the controll er to s!nleturc.
as such deSign was the subject of Chapter 3. and we did not wish to clutter hnptcr S"S
RTL design discussion with too many details of previously learned material. In thi s
appendix, we'll complete the RTL design by designing the controll er's F M down to a
state register and gates, resulting in a complete custom-processor impl ementation of u
controll er and a datapath. We' ll then trace through the behavior of the complete imple-
mentati on. The purpose of demonstrating thi s complete design is to give the reader a clcar
understanding of how the controller and datapath work together.
The block symbol for the soda di spenser processor appears in Figure C. I. Recall thut
the soda di spenser features three inputs, c. S, and a. The 8-bit input S represents the cost
of each bOltle of soda. The I-bit input C is 1 for
one clock cycle when a coin is inserted. Addi-
tionally, the value on 8-bit input a indicates the
value of the coin that was inserted. The soda di s-
penser features one outpUt, d, used to indicate
when soda should be dispensed. The I -bit
output d is 1 for one clock cycle after the value
Soda
dispenser
processor
8
of the coins inserted into the soda dispen er is Figure C.l Soda di spenser
greater than or equal to s. The soda dispenser block symbol.
does not give change.
In Chapter 5, we developed the high-level state machine seen in Figure C.2. We sub-
sequentl y decomposed the high-level state machine into a controller (repre ellled
behaviorall y as an FSM) and datapath, shown in Figure C.3. The datapath supports the
data operations necessi tated by the high-level state machine. includtng the value
of ror (ror = 0 in the Illir state), comparing if ror is less than S (for the from the
Wair state), and adding lOr and a (in the Add tate). The controller FSM IS slmtlar to the
SIS
maw
516 Extended RTL Design Example
hi gh-Ieve! qate machine. but "
modified to control the d.lla-
path and accept ,wtu, Input
from the datapath (I e.
to tit , ) rather than per-
fonlllng d:lw opcraW)ll'
t1irectly. The controller and
dawpath arc ,hown In h gure
'.3.
Inputs c. tOI It s (bit)
OutputS' d. tol Id. tot clr (bit)
Controller
(a)
Input c (bfls). a (8 bfls). s (8 bits)
OurputS' d (bll)
Local reg/siers lot (8 bits)
Figure C2 Soda dl'pcn",r
... t..lte mtichlOc
(b)
d=\
Figure C J Suda tlI'pcn,cr; (a) controller (de,,,,bed beh.l\ lorall y) and (b) datapalh ("ru ture) .
C.2 DESIGNING THE SODA DISPENSER CONTROLLER
U,ing Ihe controller de;ign procc" Introduced in hapter 3. we can complete
the de, ign of the controller. The five steps are as follows:
Captll re the FSM. The F I for the soda
displ..'n:-.cr"s controll er \Va, crea.ted during
step of the RTL dc,i gn method. The con-
troller' s is shown in Figure C.3(a).
Captllre the Architecture. As indicated
by the controller's F M. the tate
machine's architecture require at least 2
inputs (C and tot _ I s) and J outputs (d.
to I d. and to . C I r). Additionally. we
will usc two bits 10 represent the con-
troller' s states. which adds an additional
two inputs (the current stat e sls0) and two
outputs (the next state n 1 n 0) 10 the con-
troll er architecture. The corresponding
controll er architecture is shown in Figure
CA.
d
Combinational toUd
logic
Figure C.4 Standard controller architecture
for the soda di penser.
C 2 DeSigning 1/10 SOds Olspln 01 Controller
SI7
Encode the tate .
. 0 slr:ughtfoN nrd en,ndl
frill : O. IInit: 0 I. dd. 10. and DIJp: II nil 0' the ""'.1 .11'1 ' II'a\ hlllr 't.IIC, "
Create tire tate Table F .
kn
. tOnl Ule controller ' h .
we 0\\ thm the Itnte table UI\ lie d"" '"cd III " " ,' .11 Ilrr 'tel'
d
mu t 3 COunt for .
outputsC. o Id.O clr nl '"PU"( . a 1 . 1 ..
2
4
= 16 tOw (Figure .5). '. nnd nO) \\tth 1111 ut' . the 't.lle 1.lhle \\IIIIII 'I"de
Illputl
.1 sO c toI
d
0 0 0 0
0
0 0
0 \
0
0 0 \ 0
0
0 0 1 1
0
0 \ 0 0
0
'"
0 1 0 1
0
0 1 \ 0 0
0 1 1 1
0
1 0 0 0 0
:s
1 0 0 1
0
<0: 1 0 \ 0 0
1 0 1 1 0
1 \ 0 0 1
!
1 1 0 1 1
1 1 1 0 1
1 1 1 1 1
Figure C.5 The soda d"pcnscr conlroller" >tate wblc.
By examining the outputs pecified '" the
comroller FSM. duplicated for convenience to
Fi gure C.6, we fill in the d.
tot_I d, and tot_CI r columns in the state tabl e.
For example, in Figure C.6, we see that when the
controller FSM is in the filii state, d-O,
tot_CI r-1. and tot_ld is implicitly O. Thus.
for rows in the state tabl e that correspond to the
fil i i stat e - namel y, the four rows where
sls0-00 si nce we chose "00" as the encodi ng
for the Ini/ state - we set the d column to O. the
t ot_CI r column to I, and the to Id column
10 0 . -
We fill in the next state columns. nl and nO.
0u\pU1I
tOl Id tot clr nt nO
0
0
0
0
0
0
0
0
1
1
1
\
0
0
0
0
1 0 \
t 0 t
\ 0 1
1 0 \
0 \ 1
0 0 \
0 \ 0
0 1 0
0 0 1
0 0 \
0 0 1
0 0 \
0 0 0
0 0 0
0 0 0
0 0 0
tnpul. c. lot It (bot)
OutpulS' d. tot Id, lot clr (b.t)
Figur. C.6 Soda di spen,,:r conlroller
FSM WII/1 ' tate encoclins-,.
based on the the transiti ons specified in the controll er FSM and the stale encoding we
chose in an earlier step. For example. con ider the Wait state. As indicated in Figure .6.
the FSM transition to the Add state when coL for rows where s !sOc-Oll
5 18 C Extended RTL Design Example
(s 1 sO 01 corre'pond, 10 Ihe \Vall we CI Ihe n 1 column 10 1 and the rO col umn
10 0 (n I nO 10 corre'pond, 10 Ihe Add ,laiC). When Ihe F 1\1 Imn,II1 n., 10 Ihe D/ p
'Wle If o. I 0 <lr remalO' In Ihe "ale of ttl 1 We reprc'>eOl the
Iran'"I011 fr011l Wall 10 01\(1 In Ihe talc lable bj \ClUng r 1 10 I and nO 10 1 (D/.rp) in lhe
row II herc S 1 0 0 I O. and ;) 1 O. 111I1Iarl). I\e repre'>ClII the tr.IIl I-
lion fr011l back lO \Vall by wnllOg P I 1 1\ here 51 0-01. r -0. and
o I , 1 Wc Ihcn C"'11IllIC Ihe rem;,,",ng "<1n\1I10n, 10 a "mllar I\J). filling In Ihe
appropriale valuc, lo r n 1 alld nO UIIIII all Imn'"10n' arc ,Iccounled for The compleled
, laiC table" , hown III Figure c.s.
Implemell tthe C{)mbill oti{) llal t oxic. For each of Ihe ,laIC 1.lble \ OUIPUI . we IHlle lhe
corre'pollding \3oulean equullnn. From the \l,lIe table lie oblaln Ihe follow 109 equollon .
d 51 a
o Id - sls0 '
o clr - sl 'sO'
n1 - sl ' sOc ' 0 1 s ' s l' sOc
nO - 51 ' sO' sl'sOc ' + sls0'
nO Sa ' + sl ' sOc'
NOle Ilwl Ihe tiN four equallon, derived fro111
Ihe Malc lable arc nlready minl11llled. The fifth equa-
lion. corresponding 10 nO. can be mlnlmi/ed 10 sO'
+ s I' sOC ' Ihrough algebraic or by u,ing
a K-map t" ,howII III Fi gure .7. K-map' nre di,-
cussed in Seclion 6.2.
G slsO
c
o
SI 'sOc' sO'
Figure C 7 K-map for the inlllru
cquallon for nO.
C 2 Deslgnmg the Soda Dlspensol ContIolior
519
sing lechnique; di cu."ed '" ha I ,
inio an equi l olcnl III G-Icl cl gUie-ll'l-cd p rr -. lie the Jh.."c elll""" 11\
lhe Boolean equalion, li e are c nl II'('UII Th" '1 01 cr."," I' 'Ir.l1l1ll1lc"\\.,,,1 ''''<0
e"'"8 .11\: olreJoI 10 I I
equenl inl controll er circuli Qnd the d . 'U11l\'p"", \I,'" hll1l\ I he ",,"1
Figure C. . Jlup.llh fm Ihe 'I 1.1 Yhf'CII\CI " ,h,1IIll II I
Figure C.B Final implemental lon of Ihc "Xla mllCh,"c controller (' lell) Wll h dlllllpl1l h
m
520 C Extended RlL Design Example
C.3 UNDERSTANDING THE BEHAVIOR OF THE
SODA DISPENSER CONTROLLER AND DATAPATH
In this secti on. we will look closely at how the controll er and claw path we designed for
the soda di spenser interact to form a working implement ati on of our initi al hi gh-l evel
Slale machine.
Figure C.9 ill ustrate, the behavior of the soda di spenser controll er and dalapmh,
including initi ali zati on and how the soda di spenser behaves when the user inserts a
quaner int o the system. The 5 clock cycles shown are labeled I through 5 in the figure.
We' ll assume thm the cost of a soda can is 60 cents and thm the soda di spenser' s con-
troll er is in the /Ilir stme during the firs t clock cycle. Let' s examine what occurs in each
clock cycle:
Initi all y. in clock cycle I. the controller is in the /Ilil stale. shown in Figure C.9(b).
When in state /Il il. the controll er sets d to O. tot_l d to 0, and tot_cl r to 1.
Additionall y. the cont roll er sets the next state signals nInO to 01. corresponding
to the stat e. In the dawpath. the value of 101 and lOi+a is unknown. denoted
by ''??''. Notice that even though the cont roll er set t ot _cl r 10 1 during thi s
clock cycle. the 101 register wi ll not be cleared immediately (asynchronously).
Rather. 101 will be cleared shonl y aFter the next clock cycle, a synchronous
behavior. Finally, noti ce thm the price of the soda, s. is set to 60 cent s and the
coin input signals. C and a. are initi all y 0 and O. respectively.
Figure C.9(c) shows the soda dispenser in clock cycle 2. The controll er is now in
the iVail state. Accordingly. the controll er sets d, tot_l d. and tot_c 1 r 10 O.
The value of 101 is cleared. and shonl y afterwards. two signals. tot_l t_s and
IOI+a. take a known value. The datapath's comparalOr sets tot_l t_s to 1 since
the total . O. is less than the price of soda, 60. The dat apath's adder sets interme-
diate signal 10i+a to 0 since 101 and a are now known. The next state signals
remai n set 10 01 (IVait) since c is 0 and tot _l t _s is 1.
Figure C.9(d) shows the soda di spenser in clock cycle 3. During the third clock
cycle. the user insens a quaner inlO the soda di spenser. as indicated by C
becoming 1 and a becomi ng 25. Shonl y after a changes, the adder' s output 101+a
changes to 25. the sum of 101 and a. Since c is 1. the controll er sets the next state
to 10 (Add). The values of d. tot_l d. and toCc 1 r remain the same since the
controll er' s stale has not changed since Ihe previous (2nd) clock cycle.
In cl ock cycle 4, shown in Fi gure C.9(e), the conl roll er is in the Add stale and sets
tot_l d 10 1 whil e keeping d and tot_c 1 r at O. As was Ihe case wilh tot_clr
during Ihe /Ilil stale. 101 will nol be updaled until Ihe neXI clock cycle. The con-
troller will uncondi ti onally relurn 10 slale iVail . selling nInO 10 01 (Wail ).
(a)
"' <0
c:
Ol
' in
e
E
o
()
C.3 Understanding the Behavio .
r of the Soda Dispenser Controller and Data path
521
(b)
elk
slate (5150)
nexl state (ntnO)
d __ +.==:::::ii;---+---
C :
loUd n __ h' =---+----I---......
w
"' <0
c:
Ol
'in
.c: 101
OJ
a.
'" OJ
25 25
0 tol+a ?? 25 25
60
00 00
Figure C.9 Soda di spenser operati on from initialization to inserting a quarter: (a) timing di agrnm. (b}-{e) signal values
during clock cycles 1-4.
522
C Extended RTL Design Example
In clock cycle 5. shown in Figure C. IO. the cont roll er sets d. to t _ l d, and
tot_c 1 r to 0 since the controll er is in the Wait state. The tot register loads the
value of IOt+G. storing 25. Shonly afterwards, lOt+a changes to 50 to refl ect the
new value of lOt . however, 50 is not loaded into tot as tot will onl y perform a load
synchronous to the risi ng edge of the clock signal.
The addition procedure demonstrated in clock cycles 3 through 5 is repeated for each
coin insened unti l enough change has been insened to cover the cost of a soda, a indi-
cated by input signal s.
C.3 Understanding the Behavior of th S .
e oda Dispenser Controller and Datapath
Figure C.10 Operation
of the controller and
data path: clock cycle 5
from Fi gure C.9(a).
523
C Extended RTL Design Exampl e
Figure C. II detai ls the behavior of the soda di,pe."er when the user has inserted
enough change into the machine to merit a soda being di spensed. In the timing diagram
shown in Figure C. II (a). we dupl icate clock cycle 5 from Figure C.9(a) as a point of ref-
erence. During the next few dOlen clock cycles. we assume that the user has insert ed a
nickel followed by a quart er. As a result. the register 10 1 will cont ain the value 55
(25 + 5 + 25 cent s). Lct"s examine the behavior of the soda di 'penser when the user insert s
a dime into the machine:
In Fi gure C. II (b). corresponding to clock cycle 100. the socia di spenser' S con-
troll er is in the IVail state. Assuming the user insert s a dime into the soda
dispenser. the c input will become hi gh for one clock cycle and the a input will
change to 10. the value of a dime. Short ly after a changes, the intermediate signal
101+0 changes to 65 (55+ I 0). With c asserted. the nex t state signal s nInO become
10 (Add).
In clock cycle 101. shown in Fi gure C. II (e). the controll er is in the Add state and
assert to _I d. The regi ter /0/ will not load a new total until the ri sing edge of
lhe next clock cycle. The controll er uncondit ionall y sets the next state to 01
( \\'ail).
Figure C.I I (d) shows the status of the soda di spenser in clock cycle 102. where
the controller is in the IVail Slate. As ind icat ed by the arrows in Fi gure C.II (a).
tot_l d being asserted on the ri sing edge of the clock causes 10 1 to load the value
on its input. whi ch i, 65. Shortl y aft er 101 loads a new value, the comparator' s
output to t_l t_5 changes from 1 to 0 to re fl cct the fact that 101 (65) is not less
than 5 (60). Since the controll er is in the Wail state. and since both c and
tot_l _5 are O. the cOl1l roll er sets the ncxt stat e signal; to 11 (Disp). Notice
that prior to the next state ignals settling on the Disp statc. the next state was Wail
for a brief period of time. Depending on the time required for signal s to propagate
through the datapath and controll er. certain signals may initiall y cont ain unex-
pected value,. but signals wi ll eventuall y settl e to their expected values. We
can avoid any problems a sociated with thi s peri d of uncert ai nt y by selecting a
clock period that is long enough to all ow our circuit 's intermcdiat e signals to
; ettle into a , tablc state and stay stable long enough to compl y wit h any setup
time, requi red by our circuit' s sequential component "
In Figure C. II (e). the controller is in the Dis!, Slnte. The cOl1l roll er assert s d, indi -
cating to ,orne outside component that a soda should be di ' pcnsed. The controll er
will unconditi onall y tran. iti on to the /Ilil stat e. where the initi ali zation procedure
shown in Figure C.9 is repeated (partiall y shown in clock cycle 104 of Fi gure
C.1 1(a) ).
We ,ee that lhe controller and datapath work together to implement the behavior of
the origi nal hi gh-level ,tate machine.
(a)
'" 0;
c:
'" 'in
e
c
o
(.)
C.3 Understanding the Behavior of the Soda Dis C
penser ontroller and Datapath 525
1->0
e
(b)
(e)
state (sl s0)
I I I 1
next state (nl nO) 00 UISP : Init: Wait
1
tot ! ' it=
tot.slr ----L::::: : : /:
toUU -=+-___ __ ' , {
-5 tot _____ 1 55 t 55 65 I 65 I 65
a 2sT====gs, 10 10 10 , 10 , 10
8 tot+a 65 65 75 , 75 , 75
S 24-m- i 60 60 60 I 60 , 60
.-l-____
lO la
(d)
(e)
Figure C.ll oda J"pcn"'r opec.ll1M \\ hen ,ullic.en! change has been i=cd: (3) timing diagram. (bHe\ signal
, alue, dunllg clock c)clc, IOC Ill.\
I nd ex
=.
SdlSpla) , tatement.
6 HC II microprocessor. 2 1
subserie- ICs.
subseries ICs.
,ubseries ICs.
series ICs.
series ICs.
8051 microprocessor. 21. 422
A
Abe." component (AL-extender). 203
Abo\e-mirror display (example):
with 16 32bi l regi sters. 20-+-207
"ith 16,32 regi ster file. 208
\\ Ith parallel-load registers. 155- 156
with shift registers. 159. 160
with up-counters. 183
Absorption Law. 501
Abstraction (in RTL design). 276
Access time (RAM). 263
Active-high input. 136
ACllve-Iol> input. 137
Actuator. 9
Adaptive cruise comro!. 237
Adder(,). 165- 173. 197
building a SUblI3clor using. 197-200
carry-lookahead. 33.\-342
carry-ripple. 166-173.339-340.468-471
carry-;elect.
creallng faster. 333-343
deSIgn examples using. 171-173
.+-bit carry-ri pple. 169-171
full-. 168-169
h.lf-. 167-168
"bu. 165-166
t"'o-Ievel logIC. 334
Adder tree. 215
add ,",tructlon. 434
Addlllve Identu} elcment. 497
Addll1ve '<lund. 211
Addre" (for reg"ter). 205
"'L-extender. ;ee Anthmellcflogic extender
Algebr""1
of logic, 504
of sets. 504
switching, 496. 497
Algebraic methods, in two-level logic size optimi zation.
296-298
Algorithms:
Espresso tool in. 3 15
exact. 308
selection of. 356-357
for state reduct ion. 3 19
Algorithmic state machines (ASMs), 233
Ahemalive minimum-bidwidth binary encoding, 323-324
ALUs. see Arithmetic-logic units
always procedure, 453-454
Amperes, 3 1
Analog ci rcui ts. 5
Analog phenomena, encoding of. 9
Analog signals, 4
Anal og-la-digital converter, 9
AND gates. 43-44, 404-407
AND operator, 38-40
Appli cat ion-Specific Integrated Circuits (ASICs).
38G-388
cell arrays. 383
FPGAs vs., 40 1
gate arrays, 38 1-382
implementing. using NOR gates. 386-388
impl ementing, using onl y AND gates, 384-386
standard cell s. 382-383
structured. 383. 408
Architecture. 447
Arithmetic:
fixed point. 508-509
Roati ng point. 5 13
Arithmeti c/logic extender (AL-extender). 202-203
Arithmeti c/logic instructi ons. 439
Arithmeti c-logic units (ALUs). 20 1-203
multi -function calculator using. 203
operati on. 423-424
ARM microprocessor. 2 1. 422
Arrays. Sec also Field programmable gate arrays (FPGAs)
cell ,383
gate, 38 1-382. 389
programmable logic. 407
ASCII. 10
ASICs, see Appli cati on-Spec ifi c Integrated Circuits
ASMs (a lgorithmi c state machines). 233
Assembler programs, 430
Assembl y code. 431
assert (term). 137
assert statement s, 456, 458-459
Assoc iative propert y. 50. 50 I
Asychronous circuits. 102
Asychronous inputs. 133
Asychronous reset inputs, 135
Asychronous set inputs, 135
Atria (of heart), 138, 139
Audio, digit ized. 6-8
Audi o recordi ng, 5-7
Automation
with Quine- McCluskey method, 3 11 -3 12
of two-level logic size optimi zation, 308-3 15
B
Bardeen. John. 33
Basestati ons (cell phones), 279-28 1
Base ten. 11 - 12
Basic input/output system (BIOS), 431
Ba ic SR latch. 97-99
Beamforrners. 210-213
princi pl e of. 210-2 11
in ultrasound machines. 2 12-2 15
Behavioral-level design. 254-258
Bell . Alexander Graham. 8
Bell Laboratori es. 33
Bell Telephone. 8
BeltWam circuit . 387-388
B-frames. see Bi directional predi cted frames
Biased exponent. 5 10
Bi di rectional predicted frames ( B-frames), 363-364. 369
Binary numbers. 11- 17.505
Binary number systems. 05-5 13
fi xed point ari thmetic in. 50 -509
Roating point represcntation in. 509-513
real number represent ati on in. 505-507
Binary poi nt. 506
Binary rcprc cnl3li ons. 4
Binary sear h. 357
BIOS (basic input/output ,y tem). 431
Bit. 4
Bit file, 399
Bit storage. 96. ec also <pecific types. e.g.: R Intches
Bit wise opcrntion. I
Blinking li ght- (10 computcrs).
Block symbol. 152
Board game,. computcn/cd. 157
Boole. George. ),
BOOlean algebra, 38, 47-55 496-504
e.valualing expressions in '48-49
hterals in. 50 .
operators in. 38-39, 48-49
product terms in, 50
Properties in. 50-55
sum-of-products in, 50
S
W
llchmg, 497-498
terminology, 49-50
theorems in. 498-503
Variables in, 49
Boolean functions, 55-{i7
canonical form, 63-{i5
ClrCUlls fO.r 56
and circuits, 65
conversIon of. 58-{j()
defined, 55
Index
equations for representing, 56
truth tables for representing, 56-58, 62-{i3
BOOlean logIC gates, see Gate(s)
?perators, see Operator(s)
Boollng ' computers, 43 1
Brattain. Walter, 33
Buffers, 206, 272
Bus (i n register files). 206
Bus interface, 238-241
Bus protOCol, 239
Button press synchronizer (example). 123-124
Button sensor. 10
c
C (program language), 19-20, 254-258.388
C++ (program language), 254. 258. 388
Calculators, 200
Calculus. propositional. 504
Cameras. digital. 22-23
CAN (controller area network). 160
Canonical form (Boolean functions). 63-65
Capture (step in combinational logic desiga). 67-{i9 ..
Carry-lookahead adders. 334-342
efficient example. 336-339
half-adders in. 337-339
hierarchical .
inefficient example. 335-336
Carry-ripple adders. 166-173
in dntapath component description. -171
8-bit. 173
-l-bit. 169-17_
fulladders. 168-169
half-adders. 167-168
and hierarchical arry-lookahead adders.
Carry-ripple style magnitude comparator. 17 -I 0
ClUT)-sclc t adders.
528 Index
Cas.ell e '"pes. 5- 6
Cell arrays. 383
Ce ll s (cell phone region<). 279
Cell s. standard (ASIC), 382- 383
ellul ar telephones. 7. 279- 284
components of. 28 1- 284
voice qua lity on. 25 1
Ccb ius, 175
han ncb (in transducers). 2 10
Checkerboard. comput cri led (exampl e). 156- 158
Chips. Sili con chi ps
CincxI componenl (AL-cxlcndcr).
analog. 5
asychronous. 102
and Boolean functi on" 56
building. using gmcs. 44-l7
clock divider. 187
combinati onal. 30, 65. 85. 95
crit ical path in. 252- 25-1
defi ned. 22
digit al. 4-5.2 1- 22. 38-10. 2 13- 2 15
integrated. 33-35
mathemati cal formali sms in design. 130
and notati on simplifi cati on. 69- 72
paniti oning. among lookup tables. 390-394
sense amplifi er. 26 1
sequential. 30. 85- 86. 95
simplifying drawings of. 130
state of, 95
synchronous. 102
CLBs. see Configurabl c logic bl ocks
Clear inputs. 134
Clock di vider. 187
Clock frequency, 103.25 1- 254
Clock gating, 358-360
Clock signal. 102- 105
Clock skew. 359
CMOS transistors. 35-37, 41. 42. 357-358
CMY color space. 192- 194
CMYK color space. 194
Codecs.409
Code detector (example). 11 7- 11 8, 129- 130
Color pace convener-- RGB to CMYK (example). 192-
194
Combinati onal circui ts, 30. 85
multiple-output , 65
output of, 95
Combinational logic descripti on:
gate behavior in. 452-455
structure in. 447-452
test benches in, 455-459
us ing hardware languages. 447-459
Combinati onal logic design, 67-72, 168- 169
Combinati onal logic optimi zati on, 296-317
multil evel logic optimizati on, 3 15-3 17
two- level logic-size optimi zati on. 296-3 15
Combining le nns 10 eliminate a vari able, 297
Combl ogic process, 455, 464-466
Communi cati on:
serial , 160
wireless. 161
Commutative propeny, 50. 498
Comparator(s). 177- 181
equalit y, 177- 178
exampl e using, 180-18 1
magnitude, 178- 180
Compensating wei ght scale (example), 173
Complement (s).4 . 194-1 97, 497
defined, 195
existence of, 499
unique, 499
Compl ementati on. 499
Complement propeny. 51
Compl exit y, managing (RTL design), 275
Complex programmable logic devi ce (CPLD), 407-408
FPGAs vs .. 407-408
SPLDs vs .. 407-40
Component all ocation, 349- 350
Compre sion. 7
and computation of ratios in video, 364, 367, 368
in digital video. 363- 369
quantization in, 366-367, 369
and transforming to frequency domain. 364-366
Computers, 4
with blinking lights, 430
booting. 43 1
Computerized board games (example), 156-158
Computer monitors. 192
Concurrency (i n RTL design), 348-349
Concurrent computat ion, 354-355
Conductors, 36
Configurable logic blocks (CLBs)
grid of. in FPGAs, 398- 399
output configuration memory in, 399
as programmable ICs, 396-398
Configuration (in RTL design), 245
Configuration memory, 398, 399
Congestion, 204
Constants, 434, 497
Constructor functions, 451-452
Control-dominated design, 247
Control input, 3 1, 32. 150. See also Gate(s)
Controll er(,). III. 11 9- 130. 135- 140
behavi or of. in soda machine di spenser exampl e, 519-
525
common pitfalls with, 128- 129
connecti on of dot apath to. in RTL design. 236
defined. III
deri vati on of FSM for. 237. 238
de ign exampl es using, 116-117, 120-1 21. 123- 127
design of. in soda machine di spenser example. 516-5 18
design process for, 120, 126
and implementation of FSMs. 122
initial state of. 135- 136
in laser-based di stance measurer example, 480-491
in LED module, 4 14-4 16
negative logic in. 136-137
output glitches in. 136
in pacemakers. 138- 140
in equential logic description. 463-466
tandard architecture for, 119
Controll er area network (CA ), 160
Control unit. 424-428
in six- instruction programmable processors, 435-437
for three-instructi on programmable processors, 432-434
Conversion(s). 58
among Boolean functions. 58- 60
from any base to any ot her base, 15- 16, 60
from binary to decimal, 12
from circui ts to equati ons. 58- 59
from circuits to truth tables, 60
decimal to binary, 13- 15
from equations to truth tabl es. 59
as step in combi national logic design, 67- 69, 72
from truth tables to circui ts, 60
from truth tables to equations, 60
Convener(s):
analog-to-digital. 9
digital -to-analog, 9
of FSMs to circui ts, see Controller(s)
RGB to CMYK (example), 192-194
Core, 41 I
Cosine waves, 364-366
Counters, 181 - 188
down, 18 1, 183
exampl es usi ng, 183, 184, 186- 188
N-bit, 18 1
parallel load, 185-187
as timers, 187
up, 181-183
up/down, 184
Cover (term). 309
CPLD. sec Complex programmable logic device
Critical path (in circuits), 252-254, 317, 333
rui e control, adapt;'c, 237
Crystals. pielOClectnc, 210
CUlT'<nt (teon), I
Currentstate signal , 46 66
Custom digital circult<, .1 - 22
Cyc\c, clock, 10
o
0313 communi Ali n. 161
Data-dominated design, 247- 250
defined, 247
example using, 248- 250
Data input, 150
Data memory, 423
Data movement in\ lructiOIl\, 439
Datapath, 423 24
Index 529
COntroller to, In RTL 236
of, In RTL de'ign, 2 236
In laser-based di stance measurer (example), 480-49 1
for programmable procco". 422 24
:n Six-instruction progrnmmable proce,IO"', 435 37
r" soda machine di ' penICr (example), 519- 525
or three-instruction programmable proce"or , 431-4
Datapath component description:
and carry-ripple adde"" 468-47 1
and full -adders, 467-468
up-Counters in, 471-475
usi ng hardware lunguages in, 467-475
Datapath components, 151
and faster adders tradeoff, 333- 343
and smaller multipliers tradeoff, 343- 345
Datapath operntions, 423-424
OCT. see Discrete cosine lransfonn
Detr. sec Local registers
Debugging, 33
Decimal point, 506
Decimal to binary conversion:
di vide-by-2 method, 14-15
subtraction method, 13-14
Declaration(s):
enum, 465
process, 452-453
type, 463
Decoders, 77-79, 395
Decoding stage, 426-427
Decrement (in counters), 181
Decrementer, 183
Deep Blue (computer), 157
Delay (i n gates), 85
Delay circuits. 213-214
DeMorgan's law, 52, 502, 503
DemUltiplexers, 85
Dequeue, 272
h
b
530 Index
Dc,igner proli le,. 29. 9-1. 22-1. 293. 377- 378.444
logic. 67-72
and circuli notal ion". 69- 72
,Iep' in. 67- 69. 72
DC!'Iign proce ..... :
ror cOlll roll er>. 120. 126
for 163
Detector 17-19.21
Dcterior:lIion. 6
D nipli op,. 103- 109
edge-Iri ggered. 10 107
-I-bil. 109
and It!vcl-!"cn:, iti vc D latch. 103- 1 ().l
Di gital camcm .... 22- 23
Di gi lal circuit,. -1--5. 21- 22. 38--10. 213-2 15
Digital filter. 2-l 8. See ;l lso Finite impulse rc"pon'\c
fihers (FIR)
Digil:ll phenomena. encoding of. 9- 10
Digilal , ignal procc;<i ng/proccsso" (DSP). 213. 28-1
Digiwl +-7
Digilal sound recorder (exampk). 26+-265
Di gi lal ,yslems. 4. 17- 18
Digital telephone an!o.wcring machine (exampl e). 270-27 1
Di gital thermometer converter (example). 175
Digital -Io-analog converter. 9
vi deo. 2-l-l
Di gital video di scs (DVDs). 36 1-363
Di gilal video player/recorders. 36 1- 370
compression in. 363-369
di screle cosine Iransrorm in. 36+-367. 369
and DVDs. 36 1-363
and hurrman coding. 367-369
MPEG-2 encoding and. 363-366. 369-370
Di giti zed audio. 6-8
Digililcd pictun.:s. 8
Di giti zed video.
DIP, see Dual Inline Package switch
DIP-switch-based calculator (examples):
adding. 171 - 172
adding/sublracling. 191-192. 198
multi-runction wi thout using ALUs. 20 1
using ALU. 203
Di screle cosine transrorm (DCT). 364-367. 369
Di screte transistors. 33
Di spl ay Slalemenls. 457
Di sp stale. 330
Di stance measurer. laser-b3sed. see Laser-based
distance measurer
Di stribuli ve propeny. 50. 498
Di vide- by-2 melhod. 14-15.505
Di vide-by-n melhod. 15- 16
D lalch. 103- 106
maSler. 105- 106
,"rvanl. 105- 106
Don'l care inpul combinati ons. 305- 307
Down-counlers. 181. 183
dowTlto \;\ tatcment. 459
Drain (OUlpUI ). 35. 36
DRAM. see Dynami c random access memory
Driver>. 206
DSP. ,ee Digilal signal processing/proces.ors
Dual lnline PaCkage (DIP) swilch. 171- 172.402
Dualil Y. principle or. 499
Dual -poned regi'ler filc, 208
DVD<. sec Di gi lal video di,cs
Dymunic microphone. 5
Dynami c power. 358
Dynami c random access memory (DRAM). 262-263.271
E
EchoDelay circuilS. 214
Economy or scale. 200
EDA (eleclroni c design automalion). 409
Edge-triggered D liip-nop . 10-1--107
defined. 105
musler/servanl design. 105- 106
EEPROM. sec Eleclri cally erasable PROM
8-bi l carry-ripple adders, 173
Electri call y erasable PROM (EEPROM). 268-269. 27 1
Electronul.gncti sm. 5
Eleclronics. 31
Electronic design automalion (EDA). 409
Eleclroni c focusing (or sound). 21 I. 2 12
Embedded syslems. 4
Enable (decoders). 77
Enable inpul. 101
Encoders. 85-86
Encoding. 9- 13
of anal7,g phenomena. 9
or digilal phenomena. 9- 10
emropy. 368
huITman. 367-369
minimum-bilwidth binary, 323-324
MPEG-2.363-366
of numbers. 1(}-13
one-hOI. 324-326
OUlpUI. 327-328
run-Ienglh. 367. 369
in sequenlial logic opl imi zat ion. 323- 328
E lAC (compuler). 33
Enqueue. 272
emily declaration. 447
Entropy encoding. 368
enum declaration, 465
cnum stiltcmcnl. -t65
EPRO I. see Erasable PROM
Equali lY comparator. 177- 178
Equalions. 56
Equivalenl slales. 318
Erasable PROM (EPROM). 267-26
Espresso (heurislic (001). 315
E semial prime implicanl. 309-3 10
Exacl algorilhm. 308
Excalibur plalrorm (All era). 409
Execuling slage. 426-427
Exi tence:
or addili ve idenlit y elemenl. 498
or complemenl . 499
of mulliplicalive idenlilY elemenl. 497
Expanding (Ierm). 309
Expand operal ion. 3 13
Exponenl. biased. 510
F
Fabricalion planl (rab). 380
Fahrenheil. 175
Falling edge-Iriggered flip-fl ops. 107
Fanoul. 204
Fa. I Fourier Transrorm (FFT). 364
Feedback. 96-97
Felchng slage, 426-427
FFT (Fasl Fouri er Transrorm). 364
Field programmable gale arrays (FPGAs). 377. 388-401
archlleclure or. 398-40 I
AS ICs vs .. 40 1
configurable logic blocks with, 396-398
CPLDs vs .. 407-408
lookup tables wilh. 389-394
microprocessors vs .. 40 I
programming or. 399-400
SPLDs vs .. 407-408
swilch malrices wi lh. 394-396
FIFO (firsl-in firsl-oul). 272
FIFO queues. 272
Fillering (in digi lal signal processing). 282
Finile impul se response fill ers (FIR). 282-284
wilh clock galing. 359-360
example using. 248-250
and pipelining. 347
using operalor scheduling. 352-354
Finile inducli on, 503
Finile-slate machines (FSMs), 11 3- 119. 128-130
behavior in, 11 8- 1 19
comroll er archi lecture ror. 11 9
convening circuillo, 126- 127
wi lh data (FSMD). 230
defined. 11 4
Indo
derivulion of
d . . Or Comroller '>7 '18
,,-<ample, u' '"g. Ils-i ' ls' i -,
enly Iype. 32 _ 3 . - 110
loore Iype. 32 ). J
n.ondclcnnini lilic. 128
'linplirying n 101' ,
FIR fiI : 11 5-1 16. 130
ICrs. sec FlOlie 1I1l.pul
Firsl-in firsl -oul (FIFO) 'e rc'llOl"C lille"
Firsl-i n fi"l - . . 272
Fi"'l . , OUI (FiFO) queue,. 272
. pll (slale redUCllon) 1 0 )11
llrilhmeli .508- 50<) --
Hush memory. 269 .
F1lghl 'lIIend'''" c 11 b
Ii ,a - Ullon (cxnmple) 10K
Ip- 0ps. 96-111.130-135
cl ock signnl, '". 102- 10)
D. 103- 109
and D latche.. 103- 104
and reedback in bil Slomgc 96-97
lK.131 ' .
IDlche. vs.. 107
behavior in, 131- 134
and r:glSlers in bil Slomge. 109-1 II
resel mpul ' in. 134-135
sel inpUIS in. 135
SR. 108. 131
and SR Inlches. 97- 101
T. 131
F1oOling-poin! ari lhmelic. 513
poin! numbers. 510
poinl rcpresen!a!ion. 509- 513
Fioallng poinl unit. 513
Flops, 108. Sec also F1ip-lIop,
Flow-Of-conlrol inslruclion" 440
Focusll1 g (of sound). 21 1, 212
4-bil carry-ripple adders, 169-172
4-bH D liip-Hops. 109. Sec also Regisler(s)
FPGAs. see Field programmable gale arrays
Frames. 241.361.363-364.369
Frequency:
cl ock. 103. 251-254
sound waves. 210
FSMs. see Finile-slale machines
Full-adders. 16&-169.467-468
Full-cuslom ICs. 379-380
Fuse-based programmable ROM. sec One-lime
programmable (CYrP) ROM
G
GAL (generic array logic). 407
Games, compulerized board. 157
Gale(s), 35. 36.41-44.73-76
A D.43-44
building circuilS u ing.44-47
53 1
t
532 Index
('Olilirllled)
and combinational behavior.
with. 85
and FPGA,. 400--l01
10\\ -power. on noncriti cal paths. 360
NA D.73-75
NOR. 73-75
NOT. 42
number of possible. 76
OR. 42-13
unhersal. 75
XNOR. 74. 75
XOR. 74. 75
Gate arrays. 3 1-382.389. See also Fi eld programmable
gate arrays (FPGAs)
Gating. clock. 35 -360
General-purpose processors. -1.21 . See also Programmabl e
processors
Generate (i n carry-lookahcad adders). 338. 340-34 1
Generator(s):
I Hz pulse generator (example). 183. 186-187
sequence generator (example). 124- 125. 327-328
Generator. sequence. see Sequence generator
Generic array logic (GAL). 407
Generic variables. 503
GHz (gigahenz). 103
Giant video display (product profil e). 4 12-4 16
Gigahenz (GHz). 103
Glitcheslgli tching. 100. 136
Google. II
H
Haitz's law. 413
Half adders. 167- 168
in carry-lookahead scheme. 337-339
tmplementing on a gate array (exampl e). 382
Implementing urn circuit using NAND gates
(example). 385
Implementing sum ci rcuit using NOR gates
(example). 386-387
Implementing using standard cell s (exampl e). 3 3-384
Hardware description languages (HDLs). 446-447
Hardware languages:
in combi nati onal logic description. 447-459
in datapath component descripti on. 467-475
In reg"ter-transfer level (RTL) design. 475-49 1
In ,equential logic description. 459-466
HDLs. -.e hardware description languages
HDTV (h,gh-definilion TV). 94
Hean. human. 138
Heru (HZ), )03
Heumllc,. 308. 3) 3-3) 5
E'prc"o too) In. 315
Ilerallve. 312
Hexadecimal numbers (hex). 16- ) 7
Hi erarchical carry-)ookahead adders. 339- 342
Hierarchy (in RTL design). 275- 278
Hi gh-defi nition TV (HDTV). 94
Hi gh impedance. 239
High-Ieve) state machine(s). 229- 233
in laser-based distance measurer (example). 475-480
and Moore vs. Meal y. 354
Highway speed measuri ng system (example). 187- 188
Hold time (in flip-fl op inputs). 131. 132
Huffman codi ng. 367-369
Hz (hertz). 103
ICs, see Integrated circuits
Idempotent Law. 52. 500
Identity comparator. see N-bi t equality comparator
Identity elements, 497
Identit y propeny. 50
I-frames, see Intracoded frames
If- then-else statements. 255-256
If-then statements, 255
Impedance. high. 239
Implementation(s):
physical , see Physical implementation
as step in combinational logic design. 67-69, 72
two-level logic. 67
Implicant (term), 309
Impli cati on tables, 31 8-322
Improvement, iterative, 312
Increment (counters). 181
Incrementer. 182- 183
Inductance. 188
induction:
finite, 503
perfect. 498
Inducti ve loop. 188
Initial state (controll er ), 135- 136
Init state. 330
Input(s):
acti ve-hi gh. 136
acti ve-low, 137
asynchronous. 133, 135
clear, 134
in combinational logic descripti on. 450
conditi ons. 11 4
control . 150
data. 150
enable. 101
reset. 134- 135
synchronous, 134-135
Input/output extensions (programmabl e processors), 440
Instanti ati on (i n RTL dc;ign). 234
Instructions. 425-428. See also specifi c instruCti ons
arithmellcfloglc. 439
data movement. 439
now-of-control . 440
Instructi on memory. 425
Instructi on register (lR). 426
Instruction set:
programmable processors, 434-435
m processors. 428-431
Instrucll on set extenSIOns (programmable processo )
428. 439-440 rs .
Insulators. 36
In-system programmable EPROMs. 268
Integrated circuits (lCs). 33-35
fu ll -custom. 379-380
semicustom (ASICs), 380-388
Integrated circuit (lC) technology(-ies). 379-412
CPLDs as. 407-408
FPGA as, 388-40 I
FPGA-to-ASIC conversion as. 408
manufactured. 379-388
and Moore's Law, 412
off-the-shelf SSI ICs as. 40 1-404
and proces or varieti es. 410-41 I
programmabl e. 388
relative popularity of, 409
SOCs as. 408-409
SPLDs as. 404-407
tradeoffs among. 409-410
Intel,21
Intracoded frames (I-frames). 363-364, 369
Inverse. 48
Inverters. 42
Involution Law. 52, 50 I
lR (instruction register). 426
Irredundant operations. 3 15
Iterate (term). 313
Iterative improvement . 31 2
J
Java (program language). 254. 258
JK Hip-nops. 13 1
jump-if- zero instntction. 435-437
K
Keys. secure Cnr (example), 11 6-1 17. 125- 126
Keyboands. computer. 71
Kilohertz, 210
K-maps:
four-vari able. 302-303
three-vnriable. 298- _99
and two-level logic ,ilo optimi 7atiOll. 19 306
L
Lands (on DVDs), 362
Laser(s):
for surgery. 112
Index . 533
m three-cycles h" h .
120- 122, 3z,;, (example). 111 - 11 2, 11 5,
Laser-based distanc
230-238 e measurer (example),
connecting the data ath
COntroll er in, to a COntroller in. 236
datapath in, 234-236. 480-491
den vat IOn of COnlI II .
high-level state 0 er s FSM in, 237. 238
LatChes, 97-101 , 475-480
basIC SR. 97-99
flip-flops vs. , 107
level-sensitive D. 103- 106
level-sensiti ve SR, 99-101
Latency (in . I'
La pipe Ine registers). 347
yOUl (of transistors on chips) 380
(Liquid Crystal on chip. 94
, see Llght-enuUtng diode
Level-sensitive D latch. 103-104
Level-sensitive SR latch, 99-10 I
Li ghts. blinkin . 430
LLighh t-emitting diode (LED). 171-172 41?-416
Ig t sensor, 10 . -
Li ght sequencer (example). 184
Lmear search, 356
Liquid Crystal on Sil icon (LCoS) chip 94
Luera/s. 50, 296-298 .
load-constant instruction. 43-1-435, 437
Loadmg (data). 151
load instruction. 428-131. 434
Load operation . 423-124
Load/shift registers. 160-163
Load-store architecture. 424
Local registers (Dctr), 232-233
Logic:
next-state. 329
output. 329
Logic block. configurnble (CLB). 396-39
logIC gates. see Gate( 1
Logic Ie. 40_
Lochhead (in omputer games). 157
Lockup t. bles. 3 9-394
"an'ples using. 392-394
parrit ioning a cin:ui t among. 390-394
Lo\\ -PO" er gat . 360
LT 1000 ' .ntil.tor. 2. 3
-
534 Index
M
MAC (multipl y-accumulate) unit. 353
code.
Magnetic RAM (MAG RAM). 27 1
'Iagnitude comparators. 178- 180
MAGRAM (magnetic RAM). 27 1
Mantissa. 510
Manufactured integrated circui ts (ICs). 379-388
ASICs. 380-388
full-custom ICs. 379-380
Mark 1I (computer). 33
Mars Cli mate Orbiter. 175
Mask-programmed ROM. 266
Master latch. 105- 106
Maxterm. M
Meal y FSMs. 328-333
example using. 331
high-level state machines. 354
with Moore FSMs. 332-333
timing issues in. 33 1-332
1ean time between failures (MTBF). 134
Medium-scale integration (MSI), 34
Megahertz (MHz). 103.210
Memory. III. See also Sequential circui ts
configuration. 398
data. 423
fl ash. 269
in LrUction, 425
MxN.258
nonvolatile. 265
random access (RAM). 259-265
read-only (ROM). 265-271
in RTL design. 258--271
volatile, 265
Metastability, 131-134
Metastable state, 132
Meucci, Antonio. 8
MHz (megahertz), 103,2 10
Microphones. 5. 210
Microprocessors:
defined. 18
digital ci rcuits in, 4-5
FPGAs vs. 40 I
software in. 18--21
Millimum-bitwidth binary encodi ng, 323-324
M,nterm, 63. 308
MIPS microprocessor, 21. 422
Mnemonic instructions. 430
Module:
III combillalional logic description, 450
In LED,. 414-416
SC. 450-452
Monitor(s):
RGB.I92
in ultrasound machines. 213
Moore. Gordon. 34
Moore FSMs. 328-333
hi gh-level state machines, 354
with Mealy FSMs, 332-333
Moore' s Law. 34, 35. 412
MOS (ternl). 37
Motion-in-the-dark detector appli cat ion, 17- 19, 21 , 440
Motion sensor, 9
Motorola, 21
MP3 fornlat , 7
MPEG- 1. 363
MPEG-2 encoding, 363-366, 369-370
MSI (medium-scale integration), 34
MTBF (mean time between fai lures). 134
Multifunction registers. 160-163
Multilevel carry-Iookahead adders. 342
Multil evel logic. 360
Multil evel logic optimi zation. 315-317
Multiple bit storage. 109- 11 1
Multiple-output combinati onal circuits, 65
Multipl exers (muxes), 79-83
internal design of, 79- 80
N-bit Mxl , 81-82
Multipli cat ive identit y element, 497
Multipliers:
in beam formers. 2 15
in binary numbers. 189- 190
sequential , 343-345
Multipl y-accumul ate (MAC) unit. 353
Multi-ported register file, 208
Muxes, see Multipl exers
MxN memory, 258
MxN register file. 204
N
NAND gates, 73- 75, 384-386
Nanosecond (ns), 100
Nanowalls, 360
N-bil adders, 165- 166
N-bil arithmetic-logic unils, 20 I
N-bil barrel shifters. 176
N-bit counlers, 181
N-bit equalil y comparalor, 177- 178
N-bit magnitude comparalors, 178- 180
N-bil regislers, 151
N-bit shifters. 174
N- bit subtractors, 190-19 1
Negative edge-triggered nip-naps. 107
Negalive logic, 136--137
Negalive numbers. represenl ing. 194- 197
Network rouler, 92
New Year's Eve counldown di splay (exam Ie) 18
Nexpena plalform (Philips), 409 p , 6
Nexl-stale logic. 329
nexlSlale signal. 463-466
NMOS .Iransislors. 35. 36, 42-44, 73
Noncnli cal paths, 360
Nondelermini stic FSM, 128
Non-ideal behavior (in flip-flops), 131 - 134
Nonrecumng engll1eering (NRE), 200, 380
Nonvolalil e memory, 265
Nonvolalile RAM (NVRAM). 27 1
NOR gales, 73-75, 386--388
Normalized numbers, 510
Nolalion(s):
in Boolean algebra. 48-49
si mplifyi ng circuit. 69-70
simplifyi ng for FSMs. 115- 116, 130
NOT gales. 42
NOT operalor. 38-40
NRE, see Nonrecurring engi neering
nS (nanosecond), 100
Null elements. 52
Numbers:
bi nary, 11 -17
encoding of, 10-13
hexadecimal, 16-- 17
OClal, 17
represenling negative. 194- 197
subtractors for positi ve, 190- 191
NVRAM (nonvolatile RAM). 27 1
NxN multipliers. 189
o
Octal numbers. 17
Off-sel. 308
Off-the-shelf logic (SSI ) IC.
Ohm's Law. 31
I Hz pul se (example). I 3. I 6--1 7
One- hot encoding. 32+.326
One's complement . 196
One-time programmable (OTP) RO 1. 267. 405
On-set. 308
Opcode, 429
Operands, 429
Operati on(s):
bitwi e. 20 1
expand. 313
irredundnnt . 315
reduce. 315
Operalion ode. 429
Operator(s):
AND. 3 -40
Index
in BOOlean al eb
NOT, 3S-40 g ra, 38--39, 4S-49
OR,38--40
Operator binding 350-
Operator sched I: 351
Opticom s Uing, 351-354
O ' ystem, 188
solution. 308
OptlmlZati On(s), 294-2
and algOrithm selecti!6. also Tradeoff(s)
combinational I ' . 6
criteria for, 295OglC, 296-317
294, 295
at higher vs. lOwer d .
multilevel logic 31 eslgn levels. 355
power, 357-300 5-317
RTL deSign. 345-354
equenUallogic.317_333
two-level 10 ' .
OR glc Size. 296-315
gates, 42-43
OR operator, 38--40
Orthogonal implementati
OSCillation, 99-100 on features, 410-411
Oscillators:
defined, 102
quartz, 102
in sequential I ' '.
OTP ROM oglc d.escnpuon. 461-463
. see One-ume
OutDelay, 213-214 programmable ROM
Output(s), 31. 32
in combinational I .
reading. 246 oglc description. 450
reg. 453. 454, 460
Output enCoding, 327-328
Output glitches. 136
OutPUt logic. 329
Overclocking (in Pes). 253
Overflow detection, 198-200
p
Pacemakers, 137-1
PAL (programable array logic). 407
Parnllelload Counters. 185-187
registers, 151-152. 160-161
rdluUomng. _3 -
Pe (program COunter). 426
Pel see P . h
p'. :np eral component interface
enuum ml roProcessors. _1
Perfect indu tion. 49
Perfonnance (in digirnl systems). _95
euensions (programmable
(clock signal). 103 proces rs). I
Penpheruls.
Peripheral component interfoce (Pel). 141
SJS
536 Index
Pframcs. <;;ee Predicted frames
Physical design. 387
Physical impl ementation. 379-117
alternati ve technol ogies for. 401-409
comparing technologies for. 409--t 12
of giant video display. 412-1 16
and manufactured IC technologies. 379- 388
and programmable IC technologies. 388-40 I
PIC microprocessor. 21. 422
Pictures. digiti zed. 8
Piezoelectri c crystals. 210
Pipeline registers. 346
Pipelining. 345- 347
Pixels. 192. 361
Pl acement (in chip components). 387
PLAs (programmable logic arrays), 407
Platform SOCs. 408-109
PLD. see Programmable logic device
PMOS transistors. 37. 42-44. 73
Pop (in queues). 272
Pones):
in combinational logic description. -+47
read. 205
write. 205
Positi ve edge-triggered flip-Hops. 107
Positi ve numbers. subtractors for. 190- 194
Power:
in digi tal systems. 295
dynami c. 358
Power optimi zati on. 357-360
Power PC programmabl e processor. 422
Precharging (RAM bit storage), 261
Predicted frames (P-frames). 363-364, 369
Preset (asynchronous set). 135
Prime (term). 48
Prime implicant. 309
Printers. 192- 194
Pri ori ty encoders. 86
Proces declarati on, 452-453
Processor( s):
defi ned, 225
digital signal. 213
single-purpose, 421
superscalar. 44 1
Very Large In"ructi on Word (VLlW), 44 1
Product. 48
Product-of-maxterms form, 64
Product profi les:
cell phones. 279-284
dIgital video pl ayerl recorders, 361-370
giant VIdeO d"play, 412-4 16
pacemaker>. 137-139
ultr",ound machines. 209-2 16
Product term. 50
Program, 42 1, 425
Programable array logic (PAL). 407
Program counter (PC). 426
Programmabl e illl egraled circui t (l C) technology. see
Fi eld programmabl e gate arrays (FPGAs)
Programmable inlerCOnneCIS. 394--396
Programmable logic arrays (PLAs). 407
Programmable logic device (PLD). 404-407
Programmable processors. 42 1-442
control unit for. 424-428
datapath for. 422-424
input/oulput extensions 10, 440
instructi on set eXlensions to. 439-440
performance extensions 10, 441
six- instruction. 434-439
th ree-i nstruction. 428-434
Programmabl e ROM, 267
Programmers (ROM), 267
Programming languages, 254-258
PROM, see Programmable ROM
Propagate (in carry- lookalJead adders), 338. 340-34 1
Propagation. 104
Propositi onal calculus. 504
Pul se width modulat ion (PWM), 415
Push (in queues). 272
PWM (pulse width modulation). 4 15
Q
Quanti zati on (in video compression). 366-367, 369
Quartz, 102
Quartz oscillat ors, 102
Queues. 271-272
Queuing. 271-274
Quine-McCluskey method, 3 1 1-3 12
QWERTY keyboard, 71
R
Race conditi on. 100
Radi x. 5 10
Random access memory (RAMs):
bit storage in, 260-26 1
dynamic (DRAM), 262-263
exampl e using, 264-265
in RTL design, 259-265, 271
stati c, 26 1- 262
readO functi on. 455
Reading (data), 15 1
Read-Onl y Memory (ROMs):
exampl es using, 269- 27 1
in RTL design, 265-27 1
types of. 266-269
Read-onl y memory programming, 265
Read pon, 205
Read time, 263
Real nu mbers, 505-507
Recording, audio, 5-7
Reduce operation. 315
Register(s), 109- 111 , 15 1- 165
design process for, 163
examples using, 152-160, 164- 165
local (Dctr), 232- 233
multifunct ion, 160- 163
in multipl e bit storage, 109- 111
N-bi t, 151
parall el load, 151-152
wi th parallel load and shi ft , 160- 163
pi peline, 346
rotate, 159- 160
in sequential logic descripti on, 459-46 1
shi ft , 158, 159
updati ng of, 245-246
Regislered data outputs, 246-247
Register fi les, 204-208
dual-poned, 208
mul ti-poned, 208
MxN, 204
si ngle-poned, 208
Register-transfer level (RTL) components, 151
Register- transfer level (RTL) design, 225-285
abstraction in, 276
behavioral-level, 254-258
clock frequency, determinati on of. 25 1-254
component allocation in. 349-350
concurrency in, 348-349
connection of data path to controller, 236
controll er's FSM, derivati on of. 237. 238
data-dominated,247-248
data path, creation of, 234-236
examples of. 238-244, 248-250. 269-27 1. 279-284
hierarchy in. 275-278
high-level state machine. creation of. 229-233
managing compl exi lY in, 275
memory components in. 258-271
method, 226-238
operalor binding in. 350-351
operalor scheduling in, 35 1- 3 4
optimi zali ons and tradeolTs in, _ 45- 354
pipelining in. 345- 347
pitfalls in. 245- 246
queuing in, 27 1- 274
RAMs in, 259- _65, 27 1
and regi stered data outpul S, 2-16-_47
ROMs in, 265- 27 1
scope of. 225- 126
using hardware langung., Ill , -1 7_ 91
Index .. 537
using programmin I
reg output, 453, 45
g
7 in, 254-258
Relays, 32 ' 60
Reset inputs, 134- 135
Resetting, 98
ReSistance, 31
Resource sharing, 351
Resplll (in IC fabrication), 380
Reverse engllleering 1?6
RGB color space, 192--194
RGB monitors, 192
Rising edge-triggered fl'
Rolling over, 181 Ip-ftOps, 107
ROMs, see Read-Onl y Mem
Rotate registers, 159-160 ory
:Outing (i n chips), 387
5-6000 SP processor 157
RTL components, 151 '
RTL design, see Register-tra
Run-length encoding. 367, level design
S
SAD. see Sum-of-absolute-<1'!t
Sampl ing, 6 I. erences
Sscale, chompen ating weight (example) 173
Can c run. 399 .
Scan convener, 213
SC_CfOR statement, 45 1--452 454
Scheduling. operator. 351-354'
Schematic, 445
SChematic capture tool (use in circUi ts) 84
SC_1Il0 statement, 451 '
SC_METHOD. 454-455, 458
SC_module. 450-452
sc_outo statemenL 451
sc_signal statement. 451
SC_ THREAD testbench process. 458. -162-463
Search(es):
binary. 257
linear. 56
Seal belt warning lighl (example):
.. " ended. on an FPGA. 395-'96
implementing. with a lookup table. 390
usmg OR-b:L<ed gale :tmlv. 3 -3
usi ng off-the-shdf 7-100 403-104
using simplo PLO, -106
ond p:1SS (stale reduction), 3_1. 321
ecure car key(e,ample), 116-11 , IJ5.-L6
el tun;, 9. ee also lultipl xers (mu.,)
emi nduct rs. 36
emicuslom lCs. see Appl ication pecific IntegTllled
Circuils (A IC 1
n. amplifier, 261
&
538 Index
Sensitive processes. 453. 455
Sensitivity lists. 46 1-462
Scnsor(s). 9- 10
bunon. 10
light. 10
traffic light . 188
Sequence generator (exampl e). 124- 125. 327-328
Sequencer. li gh t (exampl e). 184
Sequenti al ci rcuits. 30. 85-86. 95. 126- 127. See also
Fini te-state machines (FSMs)
controll ers. I I I. I 19- 130. 135- 140
converting to FSM (example). 126- 127
Rip-Oops. 96- 111 . 130-135
Sequential logic descripti on:
controll ers in. 463-466
osci ll ators in. 461-463
registers in. 459-46 1
using hardware languages in. 459-466
Sequent ial logic optimizati on. 317-333
and Moore vs. Mealy FSMs. 328-333
state encodi ng as. 323-328
state reducti on as. 317-323
Sequential multi pliers. 343-345
Seri al communi cati on. 160
Seri al comput ati on. 354-355
Seriali zing (in computati ons). 352
Servant 0 latch. 105- 106
Set inputs. synchronous/asynchronous. 135
Setting (in latches). 99
Setup ti me (i n flip-flop inputs). 13 1. 132
Shannon. Claude. 40
Shifters. 173- 176
barrel. 176
examples using. 175
simple. 174
Shift registers. 158. 159
Shockley. Willi am. 33
SHRO funct ion. 477
Si gnal(s).448
currentstate. 463-466
di gital. 4-7
nextstate. 463-466
state. 476
Signal processor. 213
Sign bit. 196
Signed-magni tude. 194
Significand.510
Silicon (element). 37
Silicon chips. 33-35. See also Integrated ci rcuits (ICs)
and economy of scale. 200
fabri cation of. 380
Silicon Valley (California). 37
Simple programmabl e logic device (SPLO). 404-407
CPLOs vs .. 407-408
FPGAs vs .. 407-408
Simul ati on (in ci rcuits). 84
Simul ator. 84
Single- ported register fi le. 208
Single- purpose processor. 42 1
Si x- instructi on programmabl e processors. 434-439
cont rol unit in. 435-437
datapath in. 435-437
instruction set in. 434-435
Size (in di gital systems). 295
Small -scale int egrati on. see SSI
SOc. see System-on-a-chip
Soda machine di spenser (exampl e). 227-229. 515-525
controll er. design of. 5 16-51 8
understanding behavior of controll er and datapath.
51 9-525
Soft ware. 18
Solid-state transistors. 33
Sound. 2 10-2 12
Sound generation circuits. 2 13-214
Sound waves. 2 10. 2 12
Source input. 3 1. 32. 35. 36
SPG bl ocks. 339
Spin (in IC fabri cati on). 380
SPLO. see Simpl e programmable logic devi ce
Spurious values. 17 1
SRAM. see Static random access memory
SR flip-fl ops. 108. 13 1
SR latches. 97- 10 I
basic. 97- 99
level-sensit ive. 99- 10 I
SSI (small -scale integrati on). 34, 401-402
Stages:
pipeline regi sters. 346
programmable processors. 425-428
Standard architecture (for controll ers). 11 9
Standard cell s. 382- 383
Standard represent ati on. 62
State(s):
of ci rcuits. 85- 86. 95 . III
equivalency between. 3 18
State diagram. I 14
State encoding:
alternative minimum-bitwidt h binary. 323- 324
one-hot. 324-326
output . 327-328
in sequential logic optimi zation. 323- 328
Statements. See also specifi c statements
assert. 456. 458-459
di spl ay. 457
2
State minimi zation. 3 17-323
Sl3le reduction:
algorithm for. 3 19
example. for. using impli cati on tabl e. 32 1- 322
Impli cation tabl es. 3 18-320
in sequential logic optimi zation. 3 17- 323
steps in. 320-32 1
Stale signal. 476
Statetype. 463-466
Static random access memory (SRAM). 261-262. 271
Steenng (of sound). 2 1 I. 2 12
Stereo speaker. 2 10
SLOre instruction. 429-43 1, 434
Store operations. 423-424
Structure (in combinati onal logic descripti on). 447-452
Structured ASICs. 383. 408
Subsening (i n program language ). 258
subtract instruction. 435-437
Subt raction (using additi on). 195- 196
Subtraction method. 13- 14. 505
SubtracLOr(s). 190-200
detecting overRolV in. 198-200
exampl es using. 191 - 194, 198
for positive numbers. 190-194
usi ng adder 10 build a. 197-200
Sum. 48
Summation circuit s. 2 14--2 15
Sum-of-absolute-differences (SAD):
wilh concurrency (exampl e). 348-349
design example. 241 - 244
examples using C code. 254-258
Sum-of-minterms form. 63-65
Sum-of-products. 50
Superscalar processor. 44 1
SlI'itch(es). 3 1-35
and di screte transistors. 33
Duallnline Package. 171 - 172
and integrated circuit s. 33-34
relays in. 32
sliding (exampl e). 306-307
and vacuum tubes. 32-33
Switching algebra. 496. 497
Swilch matrices. 394-396. 398-399. See also
Programmable inl erconnects
Synchronizer. bunon press (example).
Synchronous circuit. 102
Synchronous clear. 164
Synchronous clearing. 184
Synchronous reset inputs. 134- 135
Synchronous set. 164
Synchronous Sci inputs. 135
123- 124
Systems:
Index -4 539
17- 19. 21
digital. 17- 18
embedded. 4
SystemC. 450-45? 454-4
470-471 . 474-475 458-463. 465-466 468
System-on-a-chip (SOC 0. 488-491 ' .
T ). 408-409
Tables. implication. 318- 319
Tabular methOd. 311 - 312
ialking doll (example). 269-27\
ap (as mathematical te
Technology rna ' rm). 282
Telephones. 8 PPlOg. 387
Temperat ure averager (e
Temperature histo . Xample). 175
154-155 ry display (example). 109-111
Terms: .
combini ng. to eliminate .
prOduct. 50 a vanable. 297
Terminal count (counter OUI
Testbenches. 455-459 put). 181
T flip-flops. 131
Three-cycl es-high laser .
alternative binary (example):
comroller for. '"g for. 324
first deSign. poorly done. 111-112
FSM for. 115
using one-hot encOding. 326
3-D Images (ultrasound). 216
Three-mstruction prograrnmabl
control unit for. 432-434 e processors. 428-434
datapath for. 431-433
first in truction set in. 428-431
Three-state driver. 206
Throu2hput (i . I' .
- n Pipe lOe regISters) 347
Timer(s): .
as coumer type. 187-188
Tthree-cYCles-high laser. 111-112. 115. 120-122
mllng analYSIs. 254
Timing diagrams. 20
Timing issues. with Mealy FSMs. 331-332
Tradeoff(s): See also Optimization(s)
and algonthm selection. 356
among IC technologies. 409-110
datapath omponenL 333-345
defined. 29-
at higher vs. lo\\'er design 1e\-eL. 355
m RTL design. 34>-354
bet\\'":n serial and concurrent computation. 35+-355
Traffic Itght sensors. I
Tmnsdu rs. 9. 210
G
540 Index
Transfonllation operations. 423-124
Transistors:
CMOS. 35-37. 1.42
discrete. 33
nMOS. 35. 36. 73
pMOS. 37. 42-44. 73
Transitions. 11 4
Transparent latch. sec Level-sensitive SR kl.l ch
Truth table(s). 42
and Boolean func tions. 56-58
as Boolean function standard representation. 62- 63
defined. 56
Tubes. vacuum. 32-33
Two-level logic adders. 334
Two-level logic implementations. 67
Two-level logic size optimi zati on. 296-3 15
automation of. 308-3 15
and don '( care input 305-307
and K-maps. 298-306
usi ng algebraic methods. 296-298
Two' s complement. 196- 197
building a subtractor using adders and, 197- 200
defined. 196
detecting overfl ow using. J 99-200
type declaration. 463
type statement. 463
Typewriters. 71
u
Ultrasound (term). 210
Ultrasound imaging. 2 10
Ultrasound machines. 209-216
beamfonner in. 210-2 13
digital circuits in, 213-215
future challenges with. 216
moniLOr in. 2 J 3
scan converter in. 213
signal processor in. 2 13
transducer in, 210
Unique complement, 499
Uniting theorem. 297
Universal gates, 75, 384
Universal Serial Bus (USB), 161
Upcounters, 181-183, 471-475
Up/down counters. 184
US B (Uni versal Serial Bus), 161
use statement, 476
v
Vacuum lUbeS. 32-33
Variable(s). 49.
combining temlS to elimintllc 3, 297
generic. 503
Veri log (hardware description language), 254, 258.
449-450.453-454.457.460.462.464-465,467.
469-470. 473-474,477-479.484-487
Very Large Instruction Word (VLlW) processor. 44 1
Very-large scale integration (VLSI), 34
VHDL (hardware description language). 254. 258,
447-449. 452-453. 456. 459-461. 463-464. 467-469.
47 1-473.475-477. 480-484
Video. digiti zed. 8,244
Video compression (examples):
usi ng C code. 254-256
usi ng sum-of-absolute differences (SAD) design.
24 1-244
Video di splay. giant . 412-416
Vinex II Pro platfo rm (Xilinx). 409
VLlW (Very Large Instruction Word) processor, 44 1
VLSI (very-large scale integrati on). 34
Volati Ie memory. 265
Voltage. 31
w
wai l for statement. 46 J
wai tO function. 458, 462
Wait state. 330
Wall (uni t). 357
Waves. cosine. 364-366
Wavefonn (of inputs), 84
Weight sampler (example). 153-154
Western Uni on. 8
While loop statements, 256
Wireless communi cation. 16 J
Wire signal, 457
wi res OUtput , 450. 453
Word (data item), 258
Wrapping around (counters). 181
Wristwatch, beeping (example):
using combined MooreiMealy machine, 333
using Mealy machine, 33 1
writeO function. 455
Write pan, 205
X
XNOR gates, 74. 75
XOR gates. 74, 75