Escolar Documentos
Profissional Documentos
Cultura Documentos
(Autonomous)
Dundigal, Hyderabad -500 043
UNIT-1
BASIC COMPUTER ORGANIZATION
CPU
Memory subsystem
I/O subsystem
Fig 1.2: Timing diagram for memory read and memory write operations
CPU ORGANIZATION
Central processing unit (CPU) is the electronic
circuitry within a computer that carries out the
instructions of a computer program by performing
the basic arithmetic, logical, control and
input/output (I/O) operations specified by the
instructions.
Fig 1.4 A 164 memory subsystem constructed from two 82 ROM chips with lower order interleaving
Multi byte Organization
There are two commonly used organizations for multi
byte data.
Big endian
Little endian
In BIG-ENDIAN systems the most significant byte of a
multi-byte data item always has the lowest address,
while the least significant byte has the highest address.
In LITTLE-ENDIAN systems, the least significant byte
of a multi-byte data item always has the lowest address,
while the most significant byte has the highest address
I/O SUBSYSTEM ORGANIZATION AND INTERFACING
Fig 1.5: (a) with its interface and (b) the enable logic for the tri-state buffers
OUTPUT DEVICE
The design of the interface circuitry for an output
device such as a computer monitor is somewhat
different than for the input device.
The design of the interface circuitry for an output
device, such as a computer monitor, is somewhat
different than that for the input device. Tri-state buffers
are replaced by a register.
The tri-state buffers are used in input device interfaces
to make sure that one device writes data to the bus at
any time.
Since the output devices read from the bus, rather that
writes data to it, they dont need
the buffers.
An output device: (a) with its interface and (b) the enable logic for the registers
Fig: A bidirectional I/O device with its interface and enable/load logic
LEVELS OF PROGRAMMING LANGUAGES
Register Transfer
and
Micro operations
24
CONTENTS
Register Transfer
Arithmetic Microoperations
Logic Microoperations
Shift Microoperations
25
Register Transfer Language (RTL)
Digital System: An interconnection of hardware
modules that do a certain task on the information.
Registers + Operations performed on the data stored
in them = Digital Module
Modules are interconnected with common data and
control paths to form a digital computer system
26
Register Transfer Language cont.
Microoperations: operations executed on data stored
in one or more registers.
For any function of the computer, a sequence of
microoperations is used to describe it
The result of the operation may be:
replace the previous binary information of a
register or
transferred to another register
Shift Right Operation
101101110011 010110111001
27
Register Transfer Language cont.
The internal hardware organization of a digital
computer is defined by specifying:
The set of registers it contains and their function
The sequence of microoperations performed on the
binary information stored in the registers
The control that initiates the sequence of
microoperations
Registers + Microoperations Hardware + Control
Functions = Digital Computer
28
Register Transfer Language cont.
Register Transfer Language (RTL) : a symbolic
notation to describe the microoperation transfers
among registers
Next steps:
Define symbols for various types of microoperations,
Describe the hardware that implements these
microoperations
29
Register Transfer (our first microoperation)
Computer registers are designated by capital
letters (sometimes followed by numerals) to
denote the function of the register
R1: processor register
MAR: Memory Address Register (holds an address for a
memory unit)
PC: Program Counter
IR: Instruction Register
SR: Status Register
30
4-2 Register Transfer cont.
The individual flip-flops in an n-bit register are
numbered in sequence from 0 to n-1 (from
the right position toward the left position)
R1 7 6 5 4 3 2 1 0
31
Register Transfer cont.
15 0
PC
Numbering of bits
15 87 0
Upper byte PC(H) PC(L) Lower byte
32
Register Transfer cont.
Information transfer from one register to another is described
by a replacement operator: R2 R1
This statement denotes a transfer of the content of register R1
into register R2
The transfer happens in one clock cycle
The content of the R1 (source) does not change
The content of the R2 (destination) will be lost and replaced
by the new data transferred from R1
We are assuming that the circuits are available from the
outputs of the source register to the inputs of the destination
register, and that the destination register has a parallel load
capability
33
Register Transfer cont.
Conditional transfer occurs only under a
control condition
34
Register Transfer cont.
Hardware implementation of a controlled transfer: P: R2 R1
Block diagram: Control P Load
R2 Clock
Circuit
R1
t t+1
Timing diagram
Clock
Synchronized
Load
with the clock
Transfer occurs here
35
Register Transfer cont.
37
Bus and Memory Transfers
Register A Register B Register C Register D
Bus lines
D3 D 2 D 1 D 0 C3 C2 C1 C0 B3 B2 B1 B0 A3 A2 A1 A0
D3 C3 B3 A3 D2 C2 B2 A2 D1 C1 B1 A1 D0 C0 B0 A0
3 2 1 0 3 2 1 0 3 2 1 0
3 2 1 0 S0
S0 S0 S0
MUX3 MUX2 MUX1 MUX0 S1
S1 S1 S1
39
Bus and Memory Transfers: Three-State
Bus Buffers
A bus system can be constructed with three-
state buffer gates instead of multiplexers
A three-state buffer is a digital circuit that
exhibits three states: logic-0, logic-1, and high-
impedance (Hi-Z)
Control input C
Three-State Buffer
40
Bus and Memory Transfers: Three-State
Bus Buffers cont.
C=1
Buffer
A B A B
C=0
Open Circuit
A B A B
41
Bus and Memory Transfers: Three-State
Bus Buffers cont.
S1 0
Select
S0 1
Bus line for bit 0
24 A0
Decoder 2
Enable E
3
B0
C0
42
Bus and Memory Transfers: Memory
Transfer
Memory read : Transfer from memory
Memory write : Transfer to memory
Data being read or wrote is called a memory word
(called M)- (refer to section 2-7)
It is necessary to specify the address of M when
writing /reading memory
This is done by enclosing the address in square
brackets following the letter M
Example: M[0016] : the memory contents at address
0x0016
43
Bus and Memory Transfers: Memory
Transfer cont.
Assume that the address of a memory unit is
stored in a register called the Address Register
AR
Lets represent a Data Register with DR, then:
Read: DR M[AR]
Write: M[AR] DR
44
Bus and Memory Transfers: Memory
Transfer cont.
AR
x0C 19
x12 x0E 34
R1 x10 45
100 x12 66
x14 0
x16 13
R1M[AR] x18 22
RAM
R1 R1
100 66
45
Arithmetic Microoperations
The microoperations most often encountered
in digital computers are classified into four
categories:
Register transfer microoperations
Arithmetic microoperations (on numeric data
stored in the registers)
Logic microoperations (bit manipulations on non-
numeric data)
Shift microoperations
46
Arithmetic Microoperations cont.
The basic arithmetic microoperations are:
addition, subtraction, increment, decrement,
and shift
Addition Microoperation:
R3 R1+R2
Subtraction Microoperation:
R3 R1-R2 or : 1s complement
R3 R1+R2+1
47
Arithmetic Microoperations cont.
Ones Complement Microoperation:
R2 R2
Twos Complement Microoperation:
R2 R2+1
Increment Microoperation:
R2 R2+1
Decrement Microoperation:
R2 R2-1
48
Half Adder/Full Adder
Half Adder x y c s x
0 0 0 0 c = xy s = xy + xy c
=x y y
0 1 0 1
1 0 0 1 s
1 1 1 0
Full Adder
y y
x y cn-1 cn s
0 0 0 0 0 0 0 0 1
0 0 1 0 1 0 1 c 1 0 cn-1
n-1
0 1 0 0 1 x 1 1 x 0 1
0 1 1 1 0 0 1 1 0
1 0 0 0 1 cn s
1 0 1 1 0
1 1 0 1 0 cn = xy + xcn-1+ ycn-1
1 1 1 1 1 = xy + (x y)cn-1
x s = xycn-1+xycn-1+xycn-1+xycn-1
y = x y cn-1 = (x y) cn-1
S
cn-1
cn
49
Arithmetic Micro operations Binary Adder
B3 A3 B2 A2 B1 A1 B0 A0
C3 C2 C1
FA FA FA FA C0
C4 S3 S2 S1 S0
50
Arithmetic Microoperations Binary Adder-
Subtractor
B3 A3 B2 A2 B1 A1 B0 A0
C3 C2 C1 C0
FA FA FA FA
C4 S3 S2 S1 S0
4-bit adder-subtractor
51
Arithmetic Microoperations Binary Adder-
Subtractor
For unsigned numbers, this gives A B if AB or the 2s complement of (B A) if A
<B
(example: 3 5 = -2= 1110)
For signed numbers, the result is A B provided that there is no overflow.
(example : -3 5= -8) 1101
1011 +
1000
C3 1, if overflow
V=
C4 0, if no overflow
52
Arithmetic Microoperations Binary Adder-
Subtractor cont.
What is the range of unsigned numbers that
can be represented in 4 bits?
What is the range of signed numbers that can
be represented in 4 bits?
Repeat for n-bit?!
53
Arithmetic Microoperations Binary
Incrementer
A3 A2 A1 A0 1
x y x y x y x y
HA HA HA HA
C S C S C S C S
C4 S3 S2 S1 S0
54
Arithmetic Microoperations Binary
Incrementer
Binary Incrementer can also be implemented
using a counter
A binary decrementer can be implemented by
adding 1111 to the desired register each time!
55
Arithmetic Microoperations Arithmetic
Circuit
This circuit performs seven distinct arithmetic
operations and the basic component of it is
the parallel adder
The output of the binary adder is calculated
from the following arithmetic sum:
D = A + Y + Cin
56
Arithmetic Microoperations Arithmetic
Circuit cont.
A3 A2 A1 A0
1 0 B3 B3 S1 S0 1 0 B2 B2 S1 S0 1 0 B1 B1 S1 S0 1 0 B0 B0 S1 S0
3 2 1 0 S1 S0 3 2 1 0 S1 S0 3 2 1 0 S1 S0 3 2 1 0 S1 S0
Y3 X3 Y2 X2 Y1 X1 Y0 X0
C3 C2 C1
FA FA FA FA Cin
Cout D3 D2 D1 D0
57
Logic Microoperations
The four basic microoperations
OR Microoperation
Symbol: , +
Gate:
Gate:
59
Logic Microoperations
The four basic microoperations cont.
Complement (NOT) Microoperation
Symbol:
Gate:
60
Logic Microoperations
The four basic microoperations cont.
XOR (Exclusive-OR) Microoperation
Symbol:
Gate:
61
Logic Microoperations
Other Logic Microoperations
Selective-set Operation
Used to force selected bits of a register into
logic-1 by using the OR operation
62
Logic Microoperations
Other Logic Microoperations cont.
Selective-complement (toggling) Operation
Used to force selected bits of a register to be
complemented by using the XOR operation
63
Logic Microoperations
Other Logic Microoperations cont.
Insert Operation
Step1: mask the desired bits
Step2: OR them with the desired value
64
Logic Microoperations
Other Logic Microoperations cont.
NAND Microoperation
Symbols: and
Gate:
65
Logic Microoperations
Other Logic Microoperations cont.
NOR Microoperation
Symbols: and
Gate:
66
Logic Microoperations
Other Logic Microoperations cont.
Set (Preset) Microoperation
Force all bits into 1s by ORing them with a value in
which all its bits are being assigned to logic-1
Example: 1001102 1111112 = 1111112
Clear (Reset) Microoperation
Force all bits into 0s by ANDing them with a value in
which all its bits are being assigned to logic-0
Example: 1001102 0000002 = 0000002
67
Logic Microoperations
Hardware Implementation
The hardware implementation of logic
microoperations requires that logic gates be
inserted for each bit or pair of bits in the
registers to perform the required logic
function
Most computers use only four (AND, OR, XOR,
and NOT) from which all others can be
derived.
68
Logic Microoperations
Hardware Implementation cont.
S1
41 Operatio
S0
MUX S1 S0 Output n
Ai
0 0 E=AB XOR
Bi
0
0 1 E=AB OR
1 0 E=AB AND
1 Ei
1 1 E=A Complem
ent
Figure B
69
Shift Microoperations
Used for serial transfer of data
Also used in conjunction with arithmetic, logic, and
other data-processing operations
The contents of the register can be shifted to the left
or to the right
As being shifted, the first flip-flop receives its binary
information from the serial input
Three types of shift: Logical, Circular, and Arithmetic
70
Shift Microoperations cont.
Shift Left
**Note that the bit ri is the bit at position (i) of the register
71
Shift Microoperations:
Logical Shifts
Transfers 0 through the serial input
Logical Shift Right: R1shr R1
The same
? rn-1 r3 r2 r1 r0 0
72
Shift Microoperations:
Circular Shifts (Rotate Operation)
Circulates the bits of the register around the
two ends without loss of information
Circular Shift Right: R1cir R1
The same
rn-1 r3 r2 r1 r0
73
Shift Microoperations
Arithmetic Shifts
Shifts a signed binary number to the left or right
An arithmetic shift-left multiplies a signed binary
number by 2: ashl (00100): 01000
An arithmetic shift-right divides the number by 2
ashr (00100) : 00010
An overflow may occur in arithmetic shift-left, and
occurs when the sign bit is changed (sign reversal)
74
Shift Microoperations
Arithmetic Shifts cont.
rn-1 r3 r2 r1 r0
?
? rn-1 r3 r2 r1 r0 0
Sign Bit
Arithmetic Shift Left
75
Shift Microoperations
Arithmetic Shifts cont.
An overflow flip-flop Vs can be used to detect
an arithmetic shift-left overflow
Vs = Rn-1 Rn-2
Rn-1 1 overflow
Vs=
Rn-2 0 no overflow
76
Shift Microoperations cont.
Example: Assume R1=11001110, then:
Arithmetic shift right once : R1 = 11100111
Arithmetic shift right twice : R1 = 11110011
Arithmetic shift left once : R1 = 10011100
Arithmetic shift left twice : R1 = 00111000
Logical shift right once : R1 = 01100111
Logical shift left once : R1 = 10011100
Circular shift right once : R1 = 01100111
Circular shift left once : R1 = 10011101
77
Shift Microoperations
Hardware Implementation cont.
A possible choice for a shift unit would be a
bidirectional shift register with parallel load
(refer to Fig 2-9). Has drawbacks:
Needs two pulses (the clock and the shift signal
pulse)
Not efficient in a processor unit where multiple
number of registers share a common bus
It is more efficient to implement the shift
operation with a combinational circuit
78
Shift Microoperations
Hardware Implementation cont.
Serial Input IR Serial Input IL
A3 A2 A1 A0
Select
H3 H2 H1 H0
79
Arithmetic Logic Shift Unit
Instead of having individual registers
performing the microoperations directly,
computer systems employ a number of
storage registers connected to a common
operational unit called an Arithmetic Logic
Unit (ALU)
80
Arithmetic Logic Shift Unit cont.
S3
S2
S1 Ci
S0
One stage of Di
arithmetic
circuit (Fig.A)
Select
One stage of Fi
ALU Ci+1 0 41
1 MUX
One stage of Ei 2
logic circuit
Bi (Fig.B) 3
Ai
shr
Ai+1
shl
Ai-1
81
Basic Definitions
Digital system is a collection of digital hardware modules
Modules are registers, counters, arithmetic elements, etc connected
via:
- data paths routes on which information is moved
- control paths routes on which control signals are
moved
Micro operations (micro-ops) are operations on data stored in
registers
Digital modules (often just called registers) are defined by their
information contents and the set of micro-ops they perform
Register transfer language is a concise and precise means of
describing those operations
Data-paths and Control units
Registers: denoted by
upper case letters, and
optionally followed by
digits or letters
Register transfer
operations: the movement
of data stored in registers
and the processing
performed on the data
What is Register Transfer Language?
Module 6 Counter
Toll Booth Controller
UNIT -3
BASIC COMPUTER ORGANIZATION AND DESIGN
Instruction Codes
Computer Registers
Computer Instructions
Instruction Cycle
Every different processor type has its own design (different registers,
buses, microoperations, machine instructions, etc)
Modern processor is a very complex device
It contains
Many registers
Multiple arithmetic units, for both integer and floating point calculations
The ability to pipeline several consecutive instructions to speed execution
Etc.
However, to understand how processors work, we will start with a
simplified processor model
This is similar to what real processors were like ~25 years ago
M. Morris Mano introduces a simple processor model he calls the Basic
Computer
We will use this to introduce processor organization and the relationship
of the RTL model to the higher level computer processor
THE BASIC COMPUTER
CPU RAM
0
15 0
4095
INSTRUCTIONS
Program
A sequence of (machine) instructions
(Machine) Instruction
A group of bits that tell the computer to perform a specific operation (a sequence
of micro-operation)
The instructions of a program, along with any needed data are stored
in memory
The CPU reads the next instruction from memory
It is placed in an Instruction Register (IR)
Control circuitry in control unit then translates the instruction into
the sequence of microoperations necessary to implement it
INSTRUCTION FORMAT
A computer instruction is often divided into two parts
An opcode (Operation Code) that specifies the operation for that instruction
An address that specifies the registers and/or locations in memory to use for that
operation
In the Basic Computer, since the memory contains 4096 (= 212) words,
we needs 12 bit to specify which memory address this instruction
will use
In the Basic Computer, bit 15 of the instruction specifies the
addressing mode (0: direct addressing, 1: indirect addressing)
Since the memory words, and hence the instructions, are 16 bits long,
that leaves 3 bits for the instructions opcode
Instruction Format
15 14 12 11 0
I Opcode Address
Addressing
mode
ADDRESSING MODES
The address field of an instruction can represent either
Direct address: the address in memory of the data to use (the address of the operand), or
Indirect address: the address in memory of the address in memory of the data to use
300 1350
457 Operand
1350 Operand
+ +
Effective Address (EA)
AC modification to access an operand
The address, that can be directly used without AC for a
computation-type instruction, or as the target address for a branch-type instruction
PROCESSOR REGISTERS
11 0
PC
Memory
11 0 4096 x 16
AR
15 0
IR CPU
15 0 15 0
TR DR
7 0 7 0 15 0
OUTR INPR AC
List of BC Registers
DR 16 Data Register Holds memory operand
AR 12 Address Register Holds address for memory
AC 16 Accumulator Processor register
IR 16 Instruction Register Holds instruction code
PC 12 Program Counter Holds address of instruction
TR 16 Temporary Register Holds temporary data
INPR 8 Input Register Holds input character
OUTR 8 Output Register Holds output character
Registers
Read INPR
Memory Write
4096 x 16 E ALU
Address
AC
L I C
L I C L
DR IR
L I C L I C
PC TR
AR OUTR LD
L I C
7 1 2 3 4 5 6
16-bit Common Bus
S0 S1 S2
COMMON BUS SYSTEM
Three control lines, S2, S1, and S0 control which register the bus
selects as its input
S2 S1 S 0 Register
0 0 0 x
0 0 1 AR
0 1 0 PC
0 1 1 DR
1 0 0 AC
1 0 1 IR
1 1 0 TR
1 1 1 Memory
Either one of the registers will have its load signal activated, or the
memory will have its read signal activated
Will determine where the data from the bus gets loaded
The 12-bit registers, AR and PC, have 0s loaded onto the bus in the
high order 4 bit positions
When the 8-bit register OUTR is loaded from the bus, the data
comes from the low order 8 bits on the bus
Instructions
CONTROL UNIT
3x8
decoder
7 6543 210
D0
I Combinational Control
D7
Control signals
T15 logic
T0
15 14 . . . . 2 1 0
4 x 16
decoder
T0
T1
T2
T3
T4
D3
CLR
SC
INSTRUCTION CYCLE
After an instruction is executed, the cycle starts again at step 1, for the
next instruction
T1 S2
T0 S1 Bus
S0
Memory
unit 7
Address
Read
AR 1
LD
PC 2
INR
IR 5
LD Clock
Common bus
DETERMINE THE TYPE OF INSTRUCTION
Start
SC 0
AR PC T0
T1
IR M[AR],PC PC + 1
T2
Decode Opcode in IR(12-14),
AR IR(0-11),I IR(15)
(Register or I/O) = 1
D7 = 0 (Memory-reference)
(I/O) = 1 I = 0 (indirect) = 1 = 0 (direct)
I
(register)
T3 T3 T3 T3
Execute Execute ARM[AR] Nothing
input-output register-reference
instruction instruction
SC 0 SC 0 Execute T4
memory-reference
instruction
SC 0
D'7IT3: AR M[AR]
D'7I'T3: Nothing
D7I'T3: Execute a register-reference instr.
D7IT3: Execute an input-output instr.
Instruction Cycle
AR = 135 135 21
136 Subroutine PC = 136 Subroutine
BSA:
D5T4: M[AR] PC, AR AR + 1
D5T5: PC AR, SC 0
AC
FGI 0
AC Data
yes yes
FGI=0
FGO=0
no
no
AC INPR
OUTR AC
D7IT3 = p
IR(i) = Bi, i = 6, , 11
p: SC 0 Clear SC
INP pB11: AC(0-7) INPR, FGI 0 Input char. to AC
OUT pB10: OUTR AC(0-7), FGO 0 Output char. from AC
SKI pB9: if(FGI = 1) then (PC PC + 1) Skip on input flag
SKO pB8: if(FGO = 1) then (PC PC + 1) Skip on output flag
ION pB7: IEN 1 Interrupt enable on
IOF pB6: IEN 0 Interrupt enable off
PROGRAM-CONTROLLED INPUT/OUTPUT
Program-controlled I/O
- Continuous CPU involvement
I/O takes valuable CPU time
- CPU slowed down to I/O speed
- Simple
- Least hardware
Input
Output
LOOP, LDA DATA
LOP, SKO DEV
BUN LOP
OUT DEV
INTERRUPT INITIATED INPUT/OUTPUT
- Open communication only when some data has to be passed --> interrupt.
- The I/O interface, instead of the CPU, monitors the I/O device.
- When the interface founds that the I/O device is ready for data transfer,
it generates an interrupt request to the CPU
Fetch RT0: AR PC
RT1:
IR M[AR], PC PC + 1
Decode RT2:
D0, ..., D7 Decode IR(12 ~ 14),
Indirect D7IT3: AR IR(0 ~ 11), I IR(15)
AR M[AR]
Interrupt
T0T1T2(IEN)(FGI + FGO):
RT0:
RT1: R 1
RT2: AR 0, TR PC
Memory-Reference M[AR] TR, PC 0
AND D 0 T4 :
PC PC + 1, IEN 0, R 0, SC 0
D 0 T5 :
ADD D 1 T4 :
D 1 T5 : DR M[AR]
LDA D 2 T4 : AC AC DR, SC 0
DR M[AR]
D2T5:
STA D 3 T4 :
BUN D 4 T4 : AC AC + DR, E Cout, SC 0
BSA D 5 T4 : DR M[AR]
D 5 T5 : AC DR, SC 0
ISZ D 6 T4 :
M[AR] AC, SC 0
D 6 T5 :
D 6 T6 : PC AR, SC 0
M[AR] PC, AR AR + 1
PC AR, SC 0
COMPLETE COMPUTER DESCRIPTION Microoperations
Register-Reference
D7IT3 = r (Common to all register-reference instr)
IR(i) = Bi (i = 0,1,2, ..., 11)
r: SC 0
CLA rB11:
AC 0
CLE rB10:
CMA rB9: E0
CME rB8: AC AC
E E
CIR rB7:
CIL rB6:
INC rB5: AC shr AC, AC(15) E, E AC(0)
SPA rB4: AC shl AC, AC(0) E, E AC(15)
SNA rB3: AC AC + 1
SZA rB2:
If(AC(15) =0) then (PC PC + 1)
SZE rB1:
HLT rB0: If(AC(15) =1) then (PC PC + 1)
If(AC = 0) then (PC PC + 1)
If(E=0) then (PC PC + 1)
Input-Output D7IT3 = p
IR(i) = Bi
p: S0
INP pB11:
OUT pB10: (Common to all input-output instructions)
SKI pB9: (i = 6,7,8,9,10,11)
SKO pB8: SC 0
ION pB7: AC(0-7) INPR, FGI 0
IOF pB6:
OUTR AC(0-7), FGO 0
DESIGN OF BASIC COMPUTER(BC)
Hardware Components of BC
A memory unit: 4096 x 16.
Registers:
AR, PC, DR, AC, IR, TR, OUTR, INPR, and SC
Flip-Flops(Status):
I, S, E, R, IEN, FGI, and FGO
Decoders: a 3x8 Opcode decoder
a 4x16 timing decoder
Common bus: 16 bits
Control logic gates:
Adder and Logic circuit: Connected to AC
Control Logic Gates
- Input Controls of the nine registers
- Read and Write Controls of memory
- Set, Clear, or Complement Controls of the flip-flops
- S2, S1, S0 Controls to select a register for the bus
- AC, and Adder and Logic circuit
CONTROL OF REGISTERS AND MEMORY
Address Register; AR
Scan all of the register transfer statements that change the content of AR:
RT0: AR PC LD(AR)
RT2: AR IR(0-11) LD(AR)
D7IT3: AR M[AR] LD(AR)
RT0: AR 0 CLR(AR)
D5T4: AR AR + 1 INR(AR)
From bus 12 AR
12
To bus
D'7
I Clock
T3 LD
T2 INR
CLR
R
T0
D
T4
Design of Basic Computer
CONTROL OF FLAGS
IEN: Interrupt Enable Flag
pB7: IEN 1 (I/O Instruction)
pB6: IEN 0 (I/O Instruction)
RT2: IEN 0 (Interrupt)
D7
I p
B7 J Q IEN
T3
B6
K
R
T2
CONTROL OF COMMON BUS
x1
x2 S2
x3 Multiplexer
x4 Encoder S 1 bus select
x5 inputs
x6 S0
x7
x1 x2 x3 x4 x5 x6 x7 selected
S2 S1 S0
0 0 0 0 0 0 0 0 0 register
0 none
1 0 0 0 0 0 0 0 0 1 AR
0 1 0 0 0 0 0 0 1 0 PC
0 0 1 0 0 0 0 0 1 1 DR
0 0 0 1 0 0 0 1 0 0 AC
0 0 0 0 1 0 0 1 0 1 IR
0 0 0 0 0 1 0 1 1 0 TR
For AR 0 0 0 0 0 0 1 1 1 1 Memory
D4T4: PC AR
D5T5: PC AR
x1 = D4T4 + D5T5
DESIGN OF ACCUMULATOR LOGIC
Circuits associated with AC
16
16 Adder and 16 16
From DR logic AC
circuit To bus
From INPR8
LD INR CLR Clock
Control
gates
AND
Ci ADD LD
FA Ii J Q
DR AC(i)
Ci+1
K
From INPR
INPR
bit(i) COM
SHR
AC(i+1)
SHL
AC(i-1)
UNIT-4
MEMORY ORGANIZATION
Memory Hierarchy
Main Memory
Auxiliary Memory
Associative Memory
Cache Memory
Virtual Memory
MEMORY HIERARCHY
CPU Cache
memory
Register
Cache
Main Memory
Magnetic Disk
Magnetic Tape
Main Memory
MAIN MEMORY
RAM and ROM Chips
Typical RAM chip
Chip select 1 CS1
Chip select 2 CS2
Read RD 128 x 8 8-bit data bus
RAM
Write WR
7-bit address AD 7
Decoder
3210
CS1
Data
CS2 128 x 8
RD RAM 1
WR
AD7
CS1
Data
CS2
RD 128
RAM
x8
2
WR
AD7
CS1
Data
CS2 128 x 8
RD RAM 3
WR
AD7
CS1
Data
CS2
RD 128
RAM
x8
4
WR
AD7
CS1
Data
1- 7 CS2 512 x 8
8 } AD9 ROM
9
INPUT-OUTPUT ORGANIZATION
Peripheral Devices
Input-Output Interface
Modes of Transfer
Priority Interrupt
Input-Output Processor
Serial Communication
Peripheral Devices
PERIPHERAL DEVICES
INPUT/OUTPUT INTERFACES
- Unit of Information
Peripherals - Byte
CPU or Memory - Word
- Operating Modes
Peripherals - Autonomous, Asynchronous
CPU or Memory - Synchronous
Input/Output Interfaces
Keyboard
and Printer Magnetic
disk
Magnetic
tape
display
terminal
Each peripheral has an interface module associated with it
Interface
- Decodes the device address (device code)
- Decodes the commands (operation)
- Provides signals for the peripheral controller
- Synchronizes the data flow and supervises
the transfer rate between peripheral and CPU or Memory
Typical I/O instruction
Op. code Device address Function code
(Command)
Input/Output Interfaces
Sense lines
Data lines I/O
Function code lines bus
Device address lines
Connection of I/O Bus to One Interface
Data lines Peripheral
register
Device Buffer register Output
address peripheral
I/O device
AD = 1101 Interface
and
bus Logic controller
Function codeCommand
decoder
Sense lines Status
register
Input/Output Interfaces
Physical Organizations
* Many computers use a common single bus system
for both memory and I/O interface units
- Use one common bus but separate control lines for each function
- Use one common bus with common control lines for both functions
* Some computer systems use two separate buses,
one to communicate with memory and the other with I/O interfaces
I/O Bus
- Communication between CPU and all interface units is via a common
I/O Bus
- An interface connected to a peripheral device may have a number of
data registers , a control register, and a status register
- A command is passed to the peripheral by sending
to the appropriate interface register
- Function code and sense lines are not needed (Transfer of data, control,
and status information is always via the common I/O Bus)
Input/Output Interfaces
Isolated I/O
- Separate I/O read/write control lines in addition to memory read/write control
lines
- Separate (isolated) memory and I/O address spaces
- Distinct input and output instructions
Memory-mapped I/O
- A single set of read/write control lines
(no distinction between memory and I/O transfer)
- Memory and I/O addresses share the common address space
-> reduces memory address range available
- No specific input or output instruction
-> The same memory reference instructions can
be used for I/O transfers
- Considerable flexibility in handling I/O operations
Input/Output Interfaces
I/O INTERFACE
Port A I/O data
register
Bidirectional Bus
data bus buffers
Port B I/O data
register
CPU Chip select I/O
CS
Register select Control Control Device
RS1 Timing register
Register select RS0 and
I/O read Control
RD Status Status
I/O write WR register
Handshaking
- A control signal is accompanied with each data
being transmitted to indicate the presence of data
- The receiving unit responds with another control
signal to acknowledge receipt of the data
Asynchronous Data Transfer
STROBE CONTROL
Strobe Strobe
HANDSHAKING
Strobe Methods
Source-Initiated
Destination-Initiated
Data valid
Data accepted
Data valid
1 1 0 0 0 1 0 1
Start Character bits Stop
bit bits
(1 bit) (at least 1 bit)
A character can be detected by the receiver from the knowledge of 4 rules;
- When data are not being sent, the line is kept in the 1-state (idle state)
- The initiation of a character transmission is detected
by a Start Bit , which is always a 0
- The character bits always follow the Start Bit
- After the last character , a Stop Bit is detected when
the line returns to the 1-state for at least 1 bit time
The receiver knows in advance the transfer rate of the
bits and the number of information bits to expect
UNIVERSAL ASYNCHRONOUS RECEIVER-TRANSMITTER
- UART -
A typical asynchronous communication interface available as an IC
Transmit
Bidirectional Transmitter Shift data
data bus Bus register register
buffers
Internal Bus
Control Transmitter
Transmitter
register control clock
Chip select CS and clock
Register selectRS Timing Status Receiver Receiver CS RS Oper. Register selec
and register control clock 0 x x None
I/O read RD Control and clock 1 0 WR Transmitter re
I/O write WR Receive 1 1 WR Control regist
Receiver Shift data 1 0 RD Receiver regis
register register 1 1 RD Status registe
Transmitter Register
- Accepts a data byte(from CPU) through the data bus
- Transferred to a shift register for serial transmission
Receiver
- Receives serial information into another shift register
- Complete data byte is sent to the receiver register
Status Register Bits
- Used for I/O flags and for recording errors
Control Register Bits
- Define baud rate, no. of bits in each character, whether
to generate and check parity, and no. of stop bits
UNIT-5
PIPELINING AND VECTOR PROCESSING
Parallel Processing
Pipelining
Arithmetic Pipeline
Instruction Pipeline
RISC Pipeline
Vector Processing
- Inter-Instruction level
- Intra-Instruction level
Parallel Processing
PARALLEL COMPUTERS
Architectural Classification
Flynn's classification
Based on the multiplicity of Instruction Streams and Data Streams
Instruction Stream
Sequence of Instructions read from memory
Data Stream
Operations performed on the data in the processor
VLIW
MISD Nonexistence
Systolic arrays
Dataflow
Associative processors
Instruction stream
Characteristics
Limitations
M CU P
M CU P Memory
M CU P Data stream
Instruction stream
Characteristics
- There is no computer at present that can be
classified as MISD
Parallel Processing
SIMD COMPUTER SYSTEMS
Memory
Data bus
Control Unit
Instruction stream
P P P Processor units
Data stream
Alignment network
M M M Memory modules
Characteristics
- Only one copy of the program exists
- A single controller executes one instruction at a time
Parallel Processing
MIMD COMPUTER SYSTEMS
P M P M P M
Interconnection Network
Shared Memory
Characteristics
- Multiple processing units
- Message-passing multicomputers
Pipelining
PIPELINING
A technique of decomposing a sequential process
into suboperations, with each subprocess being
executed in a partial dedicated segment that
operates concurrently with all other segments.
Ai * Bi + Ci for i = 1, 2, 3, ... , 7
Ai Bi MemoryCi
Segment 1
R1 R2
Multiplier
Segment 2
R3 R4
Adder
Segment 3
R5
Input S1 R1 S2 R2 S3 R3 S4 R4
Space-Time Diagram
1 2 3 4 5 6 7 8 9 Clock cycles
Segment 1 T1 T2 T3 T4 T5 T6
2 T1 T2 T3 T4 T5 T6
3 T1 T2 T3 T4 T5 T6
4 T1 T2 T3 T4 T5 T6
Pipelining
PIPELINE SPEEDUP
n: Number of tasks to be performed
Speedup
Sk: Speedup
Sk = n*tn / (k + n - 1)*tp
tn
lim Sk = ( = k, if tn = k * tp )
n tp
Pipelining
PIPELINE AND MULTIPLE FUNCTION UNITS
Example
- 4-stage pipeline
- subopertion in each stage; tp = 20nS
- 100 tasks to be executed
- 1 task in non-pipelined system; 20*4 = 80nS
Pipelined System
(k + n - 1)*tp = (4 + 99) * 20 = 2060nS
Non-Pipelined System
n*k*tp = 100 * 80 = 8000nS
Speedup
Sk = 8000 / 2060 = 3.88
Ii I i+1 I i+2 I i+3
4-Stage Pipeline is basically identical to the system
with 4 identical function units
Multiple Functional Units P1 P2 P3 P4
Arithmetic Pipeline
ARITHMETIC PIPELINE
Floating-point adder Exponents Mantissas
a b A B
X = A x 2a
R R
Y = B x 2b
R R
R R
Instruction Pipeline
INSTRUCTION CYCLE
Six Phases* in an Instruction Cycle
[1] Fetch an instruction from memory
[2] Decode the instruction
[3] Calculate the effective address of the operand
[4] Fetch the operands from memory
[5] Execute the operation
[6] Store the result in the proper place
i FI DA FO EX
i+1 FI DA FO EX
i+2 FI DA FO EX
Pipelined
i FI DA FO EX
i+1 FI DA FO EX
i+2 FI DA FO EX
Instruction Pipeline
INSTRUCTION EXECUTION IN A 4-STAGE PIPELINE
Decode instruction
Segment2: and calculate
effective address
Branch?
yes
no
Fetch operand
Segment3: from memory
Interrupt yes
Interrupt?
handling
no
Update PC
Empty pipe
Step: 1 2 3 4 5 6 7 8 9 10 11 12 13
Instruction 1 FI DA FO EX
2 FI DA FO EX
(Branch) 3 FI DA FO EX
4 FI FI DA FO EX
5 FI DA FO EX
6 FI DA FO EX
7 FI DA FO EX
Instruction Pipeline
MAJOR HAZARDS IN PIPELINED EXECUTION
Structural hazards(Resource Conflicts)
caused by access to memory by two segments at the same time.
Most of these conflicts can be resolved by using separate instruction
and data memories.
Data hazards (Data Dependency Conflicts)
An instruction scheduled to be executed in the pipeline requires the
result of a previous instruction, which is not yet available
bubble IF ID OF OE OS
i+1 FI DA FO EX
Interlock
- hardware detects the data dependencies and delays the scheduling
of the dependent instruction by stalling enough clock cycles
Forwarding (bypassing, short-circuiting)
- Accomplished by a data path that routes a value from a source
(usually an ALU) to a user, bypassing a designated register. This
allows the value to be produced to be used at an earlier stage in the
pipeline than would otherwise be possible
Software Technique
The compiler is designed to detect a data conflict and reorder instructions
As necessary to delay the loading of the conflicting data by inserting no-operation
instructions.This method is called DELAY LOAD
Instruction Pipeline
CONTROL HAZARDS(Branching Difficulties)
Branch Instructions
clock cycle 1 2 3 4 5 6
Load R1 I A E
Load R2 I A E
Add R1+R2 I A E
Store R3 I A E
DO 20 I = 1, 100
20 C(I) = B(I) + A(I)
Conventional computer
Initialize I = 0
20 Read A(I)
Read B(I)
Store C(I) = A(I) + B(I)
Increment I = i + 1
If I 100 goto 20
Vector computer
The values of A and B are either in memory or in processor registers. Each floating
point adder and multiplier unit is supposed to have 4 segments. All segment
registers are initially initialized to zero. Therefore the output of the adder is zero
for the first 8 cycles until both the pipes are full.
Ai and Bi are brought in and multiplied at a rate of one pair per cycle. After 4 cycles
the products are added to the Output of the adder. During the next 4 cycles zero is added.
At the end of the 8th cycle the first four products A1B1 through A4B4 are in the four
adder segments and the next four products A5 B5 through A8B8 are in the multiplier
Segments.
C= A1 B1 + A5B5 + A9 B9 + A13 B13 +.
+ A2 B2 + A6 B6 + A10 B10 + A14 B14 +.
+ A3 B3 + A7 B7 + A11 B11 + A15 B15 +..
+ A4 B4 + A8 B8 + A12 B12 + A16 B16 +.
Multiprocessors
MULTIPROCESSORS
Characteristics of Multiprocessors
Interconnection Structures
Interprocessor Arbitration
Interprocessor Communication
and Synchronization
Cache Coherence
Multiprocesso
rs
Characteristics of Multiprocessor systems
VLSI circuit technology has reduced the cost of the computers to such a low
Level that the concept of applying multiple processors to meet system
performance requirements has become an attractive design possibility.
Multiprocessors
Characteristics of Multiprocessors
Benefits of Multiprocessing:
Loosely coupled systems are more efficient when the interaction between tasks is
minimal, whereas tightly coupled system can tolerate a higher degree of interaction
between tasks.
Multiprocessors
GRANULARITY OF PARALLELISM
Granularity of Parallelism
Coarse-grain
- A task is broken into a handful of pieces, each of which is executed by a powerful processor
- Processors may be heterogeneous
- Computation/communication ratio is very high
Medium-grain
Fine-grain
- Thousands to perhaps millions of small pieces, executed by very small, simple processors or
through pipelines
- Processors typically have instructions broadcasted to them
- Compute/communicate ratio often near unity
Multiprocessors
MEMORY
Shared (Global) Memory
- A Global Memory Space accessible by all processors
- Processors may also have some local memory
Distributed (Local, Message-Passing) Memory
- All memory units are associated with processors
- To retrieve information from another processor's
memory a message must be sent there
Uniform Memory
- All processors take the same time to reach all memory locations
Nonuniform (NUMA) Memory
- Memory access is not uniform
SHARED MEMORY
Memory DISTRIBUTED MEMORY
Network
Network
Processors/Memory
Processors
Multiprocessors
SHARED MEMORY MULTIPROCESSORS
M M M
...
P P ... P
Characteristics
All processors have equally direct access to one large memory address space
Limitations
P P ... P
M M ... M
Characteristics
- Interconnected computers
- Each processor has its own memory, and
communicate via message-passing
Limitations
Bus
Local Bus
SYSTEM BUS
Advantages
- Multiple paths -> high transfer rate
Memory Modules
Disadvantages MM 1 MM 2 MM 3 MM 4
- Memory control logic
- Large number of cables and
connections
CPU 1
CPU 2
CPU 3
CPU 4
Multiprocesso
rs
CROSSBAR SWITCH
Memory modules
Each switch point has control logic to set up MM1 MM2 MM3 MM4
The transfer path between a processor and a
Memory.
CPU1
It also resolves the multiple requests for access to the same memory on the predetermined
} control
data,address, and
CPU 1
from
data
address
Multiplexers and
arbitration } data,address,
and control
Memor from CPU 2
Module R/W
y logi
memory enable c } control
data,address, and
from
CPU 3
} control
data,address, and
from
CPU 4
Multiprocessors
MULTISTAGE SWITCHING NETWORK
Interstage Switch
A 0 A 0
B 1 B 1
A connected to 0 A connected to 1
A 0 A 0
B 1 B 1
B connected to 0 B connected to 1
Multiprocessors
MULTISTAGE INTERCONNECTION NETWORK
Binary Tree with 2 x 2 Switches 0 000
0 1
001
1
0 010
Some requests cannot be For 0
P1 1
Satisfied simultaneously 1 011
Ex: if P1 is connected to P2
0
000 through 001, p2 can be 100
connected to only one of the 0
1
Destinations ie100 through 111 1 101
0 110
1
111
8x8 Omega Switching Network
0 000
1 001
2 010
3 011
4 100
5 101
6 110
7 111
Multiprocessors
HYPERCUBE INTERCONNECTION
n-dimensional hypercube (binary n-cube)
- p = 2n
- processors are conceptually on the corners of a n-dimensional hypercube, and each is directly connected to
the n neighboring nodes
-Degree = n
- Routing Procedure: source 010 , destination 001
Ex-or :011 .So data is transmitted on y axis and then on Z axis i.e. 010 to 000 and then 000 to 001
011 111
010
0 01 11 110
101
001
1 00 10 100
One-cube Two-cube 00
Three-cube
0
Multiprocessors
INTERPROCESSOR ARBITRATION
Only one of CPU, IOP, and Memory can be granted to use the bus at a time
Arbitration mechanism is needed to handle multiple requests to the shared resources to
resolve multiple contention.
SYSTEM BUS:
A bus that connects the major components such as CPUs, IOPs and memory
A typical System bus consists of 100 signal lines divided into three functional groups: data,
address and control lines. In addition there are power distribution lines to the components.
e.g. IEEE standard 796 bus
- 86 lines
Data: 16(multiple of 8) Address: 24
Control: 26
Power: 20
Multiprocessors
Asynchronous Bus
* Strobe pulse - supplied by one of the units to indicate to the other unit when the data transfer
has to occur
Multiprocessors
BUS SIGNALS
- address
- data
Bus signal allocation -
- arbitration control
- interrupt
- timing
- power, ground
Miscellaneous control
Master clock CCLK System initialization INIT Byte high enable BHEN
Memory inhibit (2 lines) INH1 - INH2
Bus lock LOCK Bus arbitration
Bus request BREQ Common bus request CBRQ Bus busy BUSY
Bus clock BCLK Bus priority in BPRN Bus priority out BPRO
Power and ground (20 lines)
Multiprocessors Interprocessor
Arbitration
INTERPROCESSOR ARBITRATION STATIC ARBITRATION
4x2
Priority encoder
2x4
Decoder
Multiprocessors
INTERPROCESSOR ARBITRATION DYNAMIC ARBITRATION
Priorities of the units can be dynamically changeable while the system is in operation
Time Slice
Fixed length time slice is given sequentially to each processor, round-robin
fashion
Polling
Unit address polling - Bus controller advances the address to identify the requesting unit. When processor
that requires the access recognizes its address, it activates the bus busy line and then accesses the bus.
After a number of bus cycles, the polling continues by choosing a different processor.
LRU
The least recently used algorithm gives the highest priority to the requesting device that has not used bus
for the longest interval.
FIFO
The first come first serve scheme requests are served in the order received. The bus controller here
maintains a queue data structure.
Rotating Daisy Chain
Conventional Daisy Chain - Highest priority to the nearest unit to the bus
controller
Rotating Daisy Chain The PO output of the last device is connected to the PI of the first one. Highest
priority to the unit that is nearest to the unit that has most recently accessed the bus(it becomes the bus
controller)
Multiprocessors
INTERPROCESSOR COMMUNICATION
Interprocessor Communication Shared Memory
Receivin
g Processor
Sendin Communication Area
Processor g Mark
Receiver(s) Receivin
g Processor
Message
. .
Receiving
Processor
Interrupt
Receivin
Shared Memory g
Communication Area
Process Processor
or Sending
Receiver(s) Receivin
Instruction Mark g
Message
Processor.
.
Receiving
Processor
Multiprocessors
INTERPROCESSOR SYNCHRONIZATION
Synchronization
Communication of control information between processors
- To enforce the correct sequence of processes
- To ensure mutually exclusive access to shared writable data
Hardware Implementation
Semaphore
- A binary variable
- 1: A processor is executing a critical section, that not available to other processors
0: Available to any requesting processor
- Software controlled Flag that is stored in memory that all processors can be access
Multiprocessors
SEMAPHORE
Testing and Setting the Semaphore
These are being done while locked, so that other processors cannot test and set while current
processor is being executing these instructions
If R=1, another processor is executing the critical section, the processor executed this
instruction does not access the shared memory
X = 52 X = 52 X = 52 Caches
P1 P2 P3 Processors
X = 120 X = 52 X = 52 Caches
P1 P2 P3 Processors
Bus
X = 120 X = 52 X = 52 Caches
P1 P2 P3 Processors
Software Approaches
* Read-Only Data are Cacheable
- Private Cache is for Read-Only data
- Shared Writable Data are not cacheable
- Compiler tags data as cacheable and noncacheable
- Degrade performance due to software overhead
Hardware Approaches
* Snoopy Cache Controller
- Cache Controllers monitor all the bus requests from CPUs and IOPs
- All caches attached to the bus monitor the write operations
- When a word in a cache is written, memory is also updated (write through)
- Local snoopy controllers in all other caches check their memory to determine if they have
a copy of that word; If they have, that location is marked invalid(future reference to this
location causes cache miss)