Ppts Final

INSTITUTE OF AERONAUTICAL ENGINEERING
(Autonomous)
Dundigal, Hyderabad -500 043
COMPUTER SCIENCE AND ENGINEERING
Sub: Computer Organization and Architecture
UNIT-1
BASIC COMPUTER ORGANIZATION
CPU
Memory subsystem
I/O subsystem
Generic computer Organization

1.1.1 System bus:
Physically the bus a set of wires. The components
of a computer are connected to the buses
The system has three buses
Address bus
Data bus
Control bus
The uppermost bus in this figure is the address
bus
Data is transferred via the data bus
Control bus carries the control signal
1.1.2 Instruction cycles:
First the processor fetches or reads the instruction
from memory. Then it decodes the instruction
determining which instruction it has fetched.
Finally, it performs the operations necessary to
execute the instruction.
After fetching it decodes the instruction and
controls the execution procedure. It performs
some Operation internally, and supplies the
address, data & control signals needed by
memory & I/O devices to execute the instruction
Below figure shows the memory read and memory write operations
Fig 1.2: Timing diagram for memory read and memory write operations
CPU ORGANIZATION
Central processing unit (CPU) is the electronic
circuitry within a computer that carries out the
instructions of a computer program by performing
the basic arithmetic, logical, control and
input/output (I/O) operations specified by the
instructions.
In the computer all the all the major components

are connected with the help of the system bus.
Data bus is used to shuffle data between the
various components in a computer system
Internally, CPU has three sections as shown in the fig below
Fig 1.3: CPU Organization

MEMORY SUBSYSTEM ORGANIZATION AND INTERFACING:
Internal organization of the memory chips:

Memory is usually organized in the form of arrays, in which each
cell is capable of storing one bit information.
A possible organization is stored in the fig below
Each row of cell constitutes a memory word, and all cells of a row
are connected to a common column called word line, which is
driven by the address decoder on the chip.
The cells in each column are connected to sense/write circuit by two
bit lines.
The sense /write circuits are connected to the data input/output lines
of the chip.
During read operation these circuits sense or read the information
stored in cells selected by a word line and transmit the information
to the output lines.
During write operation the sense/write circuit receives the input
information and store in the cell of selected word.
Types of Memory
There are two types of memory chips
1. Read Only Memory (ROM)
2. Random Access Memory (RAM)
Masked ROM(or) simply ROM
PROM(Programmed Read Only Memory)
EPROM(Electrically Programmed Read Only
Memory)
EEPROM(Electrically Erasable PROM)
Flash Memory
RAM Chips:
RAM stands for Random access memory. This often referred to

as read/write memory. Unlike the ROM it initially contains no
data.
The digital circuit in which it is used stores data at various
locations in the RAM are retrieves data from these locations.
The data pins are bidirectional unlike in ROM.
A ROM chip loses its data once power is removed so it is a
volatile memory.
RAM chips are differentiated based on the data they maintain.
Dynamic RAM (DRAM)
Static RAM (SRAM)
Memory subsystem configuration
Fig 1.4 A 164 memory subsystem constructed from two 82 ROM chips with lower order interleaving
Multi byte Organization
There are two commonly used organizations for multi
byte data.
Big endian
Little endian
In BIG-ENDIAN systems the most significant byte of a
multi-byte data item always has the lowest address,
while the least significant byte has the highest address.
In LITTLE-ENDIAN systems, the least significant byte
of a multi-byte data item always has the lowest address,
while the most significant byte has the highest address
I/O SUBSYSTEM ORGANIZATION AND INTERFACING
The I/O subsystem is treated as an independent

unit in the computer The CPU initiates I/O
commands generically
Read, write, scan, etc.
This simplifies the CPU
INPUT DEVICE:
The generic interface circuitry for an input
device such as keyboard and also enable logic
for tri state buffer is shown in the figure below.
Fig 1.5: (a) with its interface and (b) the enable logic for the tri-state buffers
OUTPUT DEVICE
The design of the interface circuitry for an output
device such as a computer monitor is somewhat
different than for the input device.
The design of the interface circuitry for an output
device, such as a computer monitor, is somewhat
different than that for the input device. Tri-state buffers
are replaced by a register.
The tri-state buffers are used in input device interfaces
to make sure that one device writes data to the bus at
any time.
Since the output devices read from the bus, rather that
writes data to it, they dont need
the buffers.
An output device: (a) with its interface and (b) the enable logic for the registers
Fig: A bidirectional I/O device with its interface and enable/load logic
LEVELS OF PROGRAMMING LANGUAGES
Computer programming languages are divided into 3

categories.
High level language
Assembly level language
Machine level language
High level languages are platform independent that is these
programs can run on computers with different
microprocessor and operating systems without
modifications. Languages such as C++, Java and
FORTRAN are high level languages.
Assembly languages are at much lower level of abstraction.
Each processor has its own assembly language
Levels of programming languages is shown in the figure
below
ASSEMBLY LANGUAGE INSTRUCTIONS:
A memory-reference instruction has an address part of 12

bits. The address part is denoted by three xs and stand for
the three hexadecimal digits corresponding to the 12-bit
address. The last bit of the instruction is designated by the
symbol I. When I = 0, the last four bits of an instruction
have a hexadecimal digit equivalent from 0 to 6 since the
last bit is 0. When I = 1, the hexadecimal digit equivalent of
the last four bits of the instruction ranges from 8 to E since
the last bit is I.
Register-reference instructions use 16 bits to specify an
operation. The leftmost four bits are always 0111, which is
equivalent to hexadecimal 7. The other three hexadecimal
digits give the binary equivalent of the remaining 12 bits.
The input-output instructions also use all 16 bits to specify
an operation. The last four bits are always 1111, equivalent
to hexadecimal F.
A RELATIVELY SIMPLE INSTRUCTION SET ARCHITECTURE:
A relatively simple computer: CPU details only
UNIT-2
Register Transfer
and
Micro operations
24
CONTENTS
Register Transfer Language
Register Transfer
Bus and Memory Transfers
Arithmetic Microoperations
Logic Microoperations
Shift Microoperations
Arithmetic Logic Shift Unit
25
Register Transfer Language (RTL)
Digital System: An interconnection of hardware
modules that do a certain task on the information.
Registers + Operations performed on the data stored
in them = Digital Module
Modules are interconnected with common data and
control paths to form a digital computer system
26
Register Transfer Language cont.
Microoperations: operations executed on data stored
in one or more registers.
For any function of the computer, a sequence of
microoperations is used to describe it
The result of the operation may be:
replace the previous binary information of a
register or
transferred to another register
Shift Right Operation
101101110011 010110111001
27
The internal hardware organization of a digital
computer is defined by specifying:
The set of registers it contains and their function
The sequence of microoperations performed on the
binary information stored in the registers
The control that initiates the sequence of
microoperations
Registers + Microoperations Hardware + Control
Functions = Digital Computer
28
Register Transfer Language (RTL) : a symbolic
notation to describe the microoperation transfers
among registers
Next steps:
Define symbols for various types of microoperations,
Describe the hardware that implements these
microoperations
29
Register Transfer (our first microoperation)
Computer registers are designated by capital
letters (sometimes followed by numerals) to
denote the function of the register
R1: processor register
MAR: Memory Address Register (holds an address for a
memory unit)
PC: Program Counter
IR: Instruction Register
SR: Status Register
30
4-2 Register Transfer cont.
The individual flip-flops in an n-bit register are
numbered in sequence from 0 to n-1 (from
the right position toward the left position)
R1 7 6 5 4 3 2 1 0
Register R1 Showing individual bits
A block diagram of a register
31
Register Transfer cont.
Other ways of drawing the block diagram of a register:
15 0
PC
Numbering of bits
15 87 0
Upper byte PC(H) PC(L) Lower byte
Partitioned into two parts
32
Information transfer from one register to another is described
by a replacement operator: R2 R1
This statement denotes a transfer of the content of register R1
into register R2
The transfer happens in one clock cycle
The content of the R1 (source) does not change
The content of the R2 (destination) will be lost and replaced
by the new data transferred from R1
We are assuming that the circuits are available from the
outputs of the source register to the inputs of the destination
register, and that the destination register has a parallel load
capability
33
Conditional transfer occurs only under a
control condition
Representation of a (conditional) transfer

P: R2 R1
A binary condition (P equals to 0 or 1)
determines when the transfer occurs
The content of R1 is transferred into R2 only if
P is 1
34
Hardware implementation of a controlled transfer: P: R2 R1
Block diagram: Control P Load
R2 Clock
Circuit
R1
t t+1
Timing diagram
Clock
Synchronized
Load
with the clock
Transfer occurs here
35
Basic Symbols for Register Transfers

Symbol Description Examples
Letters & Denotes a register MAR, R2
numerals
Parenthesis ( ) Denotes a part of a R2(0-7), R2(L)
register
Arrow Denotes transfer of R2 R1
information
Comma , Separates two R2 R1, R1 R2
microoperations
36
Paths must be provided to transfer information from
one register to another
A Common Bus System is a scheme for transferring
information between registers in a multiple-register
configuration
A bus: set of common lines, one for each bit of a
register, through which binary information is
transferred one at a time
Control signals determine which register is selected
by the bus during each particular register transfer
37
Register A Register B Register C Register D
Bus lines
Register D Register C Register B Register A

3 2 1 0 3 2 1 0 3 2 1 0 3 2 1 0
D3 D 2 D 1 D 0 C3 C2 C1 C0 B3 B2 B1 B0 A3 A2 A1 A0
D3 C3 B3 A3 D2 C2 B2 A2 D1 C1 B1 A1 D0 C0 B0 A0
3 2 1 0 3 2 1 0 3 2 1 0
3 2 1 0 S0
S0 S0 S0
MUX3 MUX2 MUX1 MUX0 S1
S1 S1 S1
4-Line Common Bus

38
The transfer of information from a bus into one of
many destination registers is done:
By connecting the bus lines to the inputs of all destination
registers and then:
activating the load control of the particular destination
register selected
We write: R2 C to symbolize that the content of
register C is loaded into the register R2 using the
common system bus
It is equivalent to: BUS C, (select C)
R2 BUS (Load R2)
39
Bus and Memory Transfers: Three-State
Bus Buffers
A bus system can be constructed with three-
state buffer gates instead of multiplexers
A three-state buffer is a digital circuit that
exhibits three states: logic-0, logic-1, and high-
impedance (Hi-Z)
Control input C
Normal input A Output B
Three-State Buffer
40
Bus Buffers cont.
C=1
Buffer
A B A B
C=0
Open Circuit
A B A B
41
Bus Buffers cont.
S1 0
Select
S0 1
Bus line for bit 0
24 A0
Decoder 2
Enable E
3
B0
C0
Bus line with three-state buffer

(replaces MUX0 in the previous
diagram) D0
42
Bus and Memory Transfers: Memory
Transfer
Memory read : Transfer from memory
Memory write : Transfer to memory
Data being read or wrote is called a memory word
(called M)- (refer to section 2-7)
It is necessary to specify the address of M when
writing /reading memory
This is done by enclosing the address in square
brackets following the letter M
Example: M[0016] : the memory contents at address
0x0016
43
Transfer cont.
Assume that the address of a memory unit is
stored in a register called the Address Register
AR
Lets represent a Data Register with DR, then:
Read: DR M[AR]
Write: M[AR] DR
44
Transfer cont.
AR
x0C 19
x12 x0E 34
R1 x10 45
100 x12 66
x14 0
x16 13
R1M[AR] x18 22
RAM
R1 R1
100 66
45
Arithmetic Microoperations
The microoperations most often encountered
in digital computers are classified into four
categories:
Register transfer microoperations
Arithmetic microoperations (on numeric data
stored in the registers)
Logic microoperations (bit manipulations on non-
numeric data)
Shift microoperations
46
Arithmetic Microoperations cont.
The basic arithmetic microoperations are:
addition, subtraction, increment, decrement,
and shift
Addition Microoperation:
R3 R1+R2
Subtraction Microoperation:
R3 R1-R2 or : 1s complement
R3 R1+R2+1
47
Arithmetic Microoperations cont.
Ones Complement Microoperation:
R2 R2
Twos Complement Microoperation:
R2 R2+1
Increment Microoperation:
R2 R2+1
Decrement Microoperation:
R2 R2-1
48
Half Adder/Full Adder
Half Adder x y c s x
0 0 0 0 c = xy s = xy + xy c
=x y y
0 1 0 1
1 0 0 1 s
1 1 1 0
Full Adder
y y
x y cn-1 cn s
0 0 0 0 0 0 0 0 1
0 0 1 0 1 0 1 c 1 0 cn-1
n-1
0 1 0 0 1 x 1 1 x 0 1
0 1 1 1 0 0 1 1 0
1 0 0 0 1 cn s
1 0 1 1 0
1 1 0 1 0 cn = xy + xcn-1+ ycn-1
1 1 1 1 1 = xy + (x y)cn-1
x s = xycn-1+xycn-1+xycn-1+xycn-1
y = x y cn-1 = (x y) cn-1
S
cn-1
cn
49
Arithmetic Micro operations Binary Adder
B3 A3 B2 A2 B1 A1 B0 A0
C3 C2 C1
FA FA FA FA C0
C4 S3 S2 S1 S0
4-bit binary adder (connection of

FAs)
50
Arithmetic Microoperations Binary Adder-
Subtractor
B3 A3 B2 A2 B1 A1 B0 A0
C3 C2 C1 C0
FA FA FA FA
C4 S3 S2 S1 S0
4-bit adder-subtractor
51
Subtractor
For unsigned numbers, this gives A B if AB or the 2s complement of (B A) if A
<B
(example: 3 5 = -2= 1110)
For signed numbers, the result is A B provided that there is no overflow.
(example : -3 5= -8) 1101
1011 +

1000
C3 1, if overflow
V=
C4 0, if no overflow
Overflow detector for signed numbers
52
Subtractor cont.
What is the range of unsigned numbers that
can be represented in 4 bits?
What is the range of signed numbers that can
be represented in 4 bits?
Repeat for n-bit?!
53
Arithmetic Microoperations Binary
Incrementer
A3 A2 A1 A0 1
x y x y x y x y
HA HA HA HA
C S C S C S C S
C4 S3 S2 S1 S0
4-bit Binary Incrementer
54
Arithmetic Microoperations Binary
Incrementer
Binary Incrementer can also be implemented
using a counter
A binary decrementer can be implemented by
adding 1111 to the desired register each time!
55
Arithmetic Microoperations Arithmetic
Circuit
This circuit performs seven distinct arithmetic
operations and the basic component of it is
the parallel adder
The output of the binary adder is calculated
from the following arithmetic sum:
D = A + Y + Cin
56
Arithmetic Microoperations Arithmetic
Circuit cont.
A3 A2 A1 A0
1 0 B3 B3 S1 S0 1 0 B2 B2 S1 S0 1 0 B1 B1 S1 S0 1 0 B0 B0 S1 S0
3 2 1 0 S1 S0 3 2 1 0 S1 S0 3 2 1 0 S1 S0 3 2 1 0 S1 S0
41 MUX 41 MUX 41 MUX 41 MUX

Figure A
Y3 X3 Y2 X2 Y1 X1 Y0 X0
C3 C2 C1
FA FA FA FA Cin
Cout D3 D2 D1 D0
4-bit Arithmetic Circuit
57
The four basic microoperations
OR Microoperation
Symbol: , +
Gate:
Example: 1001102 10101102 = 11101102

OR OR
P+Q: R1R2+R3, R4R5 R6
ADD 58
The four basic microoperations cont.
AND Microoperation
Symbol:
Gate:
Example: 1001102 10101102 = 00001102
59
Complement (NOT) Microoperation
Symbol:

Gate:
Example: 10101102 = 01010012
60
XOR (Exclusive-OR) Microoperation
Symbol:
Gate:
Example: 1001102 10101102 = 11100002
61
Other Logic Microoperations
Selective-set Operation
Used to force selected bits of a register into
logic-1 by using the OR operation
Example: 01002 10002 = 11002

Loaded into a register from
In a processor register
memory to perform the selective-
set operation
62
Other Logic Microoperations cont.
Selective-complement (toggling) Operation
Used to force selected bits of a register to be
complemented by using the XOR operation
Example: 00012 10002 = 10012
Loaded into a register from

In a processor register
memory to perform the selective-
complement operation
63
Insert Operation
Step1: mask the desired bits
Step2: OR them with the desired value
Example: suppose R1 = 0110 1010, and we desire to

replace the leftmost 4 bits (0110) with 1001 then:
Step1: 0110 1010 0000 1111
Step2: 0000 1010 1001 0000
R1 = 1001 1010
64
NAND Microoperation
Symbols: and

Gate:
Example: 1001102 10101102 = 11110012
65
NOR Microoperation
Symbols: and

Gate:
Example: 1001102 10101102 = 00010012
66
Set (Preset) Microoperation
Force all bits into 1s by ORing them with a value in
which all its bits are being assigned to logic-1
Example: 1001102 1111112 = 1111112
Clear (Reset) Microoperation
Force all bits into 0s by ANDing them with a value in
which all its bits are being assigned to logic-0
Example: 1001102 0000002 = 0000002
67
Hardware Implementation
The hardware implementation of logic
microoperations requires that logic gates be
inserted for each bit or pair of bits in the
registers to perform the required logic
function
Most computers use only four (AND, OR, XOR,
and NOT) from which all others can be
derived.
68
Hardware Implementation cont.
S1
41 Operatio
S0
MUX S1 S0 Output n
Ai
0 0 E=AB XOR
Bi
0
0 1 E=AB OR
1 0 E=AB AND
1 Ei
1 1 E=A Complem
ent
3 This is for one bit i
Figure B
69
Used for serial transfer of data
Also used in conjunction with arithmetic, logic, and
other data-processing operations
The contents of the register can be shifted to the left
or to the right
As being shifted, the first flip-flop receives its binary
information from the serial input
Three types of shift: Logical, Circular, and Arithmetic
70
Shift Microoperations cont.
Serial Input r2 Serial Output

rn-1 r3 r1 r0
Determines Shift Right

the shift
type
Serial Output Serial Input

rn-1 r3 r2 r1 r0
Shift Left
**Note that the bit ri is the bit at position (i) of the register
71
Shift Microoperations:
Logical Shifts
Transfers 0 through the serial input
Logical Shift Right: R1shr R1
The same
Logical Shift Left: R2shl R2

The same
? rn-1 r3 r2 r1 r0 0
Logical Shift Left
72
Shift Microoperations:
Circular Shifts (Rotate Operation)
Circulates the bits of the register around the
two ends without loss of information
Circular Shift Right: R1cir R1
The same
Circular Shift Left: R2cil R2

The same
rn-1 r3 r2 r1 r0
Circular Shift Left
73
Arithmetic Shifts
Shifts a signed binary number to the left or right
An arithmetic shift-left multiplies a signed binary
number by 2: ashl (00100): 01000
An arithmetic shift-right divides the number by 2
ashr (00100) : 00010
An overflow may occur in arithmetic shift-left, and
occurs when the sign bit is changed (sign reversal)
74
Arithmetic Shifts cont.
rn-1 r3 r2 r1 r0
?
Sign Bit Arithmetic Shift Right
? rn-1 r3 r2 r1 r0 0
Sign Bit
Arithmetic Shift Left
75
Arithmetic Shifts cont.
An overflow flip-flop Vs can be used to detect
an arithmetic shift-left overflow
Vs = Rn-1 Rn-2
Rn-1 1 overflow
Vs=
Rn-2 0 no overflow
76
Shift Microoperations cont.
Example: Assume R1=11001110, then:
Arithmetic shift right once : R1 = 11100111
Arithmetic shift right twice : R1 = 11110011
Arithmetic shift left once : R1 = 10011100
Arithmetic shift left twice : R1 = 00111000
Logical shift right once : R1 = 01100111
Logical shift left once : R1 = 10011100
Circular shift right once : R1 = 01100111
Circular shift left once : R1 = 10011101
77
A possible choice for a shift unit would be a
bidirectional shift register with parallel load
(refer to Fig 2-9). Has drawbacks:
Needs two pulses (the clock and the shift signal
pulse)
Not efficient in a processor unit where multiple
number of registers share a common bus
It is more efficient to implement the shift
operation with a combinational circuit
78
Serial Input IR Serial Input IL
A3 A2 A1 A0
Select
S 1 0 S 1 0 S 1 0 S 1 0 0 for shift right

1 for shift left
MUX MUX MUX MUX
H3 H2 H1 H0
4-bit Combinational Circuit Shifter
79
Arithmetic Logic Shift Unit
Instead of having individual registers
performing the microoperations directly,
computer systems employ a number of
storage registers connected to a common
operational unit called an Arithmetic Logic
Unit (ALU)
80
Arithmetic Logic Shift Unit cont.
S3
S2
S1 Ci
S0
One stage of Di
arithmetic
circuit (Fig.A)
Select
One stage of Fi
ALU Ci+1 0 41
1 MUX
One stage of Ei 2
logic circuit
Bi (Fig.B) 3
Ai
shr
Ai+1
shl
Ai-1
81
Basic Definitions
Digital system is a collection of digital hardware modules
Modules are registers, counters, arithmetic elements, etc connected
via:
- data paths routes on which information is moved
- control paths routes on which control signals are
moved
Micro operations (micro-ops) are operations on data stored in
registers
Digital modules (often just called registers) are defined by their
information contents and the set of micro-ops they perform
Register transfer language is a concise and precise means of
describing those operations
Data-paths and Control units
Data-path module comprises processing logic and

collection of registers that perform data processing
Control unit module is made up of logic that determines
the sequence of data processing operations carried out in
the data-path
Register Transfer Operations
Registers: denoted by
upper case letters, and
optionally followed by
digits or letters
Register transfer
operations: the movement
of data stored in registers
and the processing
performed on the data
What is Register Transfer Language?
Register Transfer Language (RTL): used to describe CPU

organization in high-level terms
RTL expressions are made up of elements which describe
the registers being manipulated, and the micro-ops being
performed on them
Here are the basic components of RTL expressions:
Instruction Representation
Word size is 16 bits

12 bits to represent a memory address
3-bit opcode
1 bit to distinguish between direct and indirect memory addressing
Instruction Representation
(cont.)
When the I (indirect) bit is

0, the value in AD is the
actual address of the
operand (direct
addressing)
When I is 1, contains the
address of an indirect
word, which in turn will
contain the actual operand
address (indirect
addressing)
Register Structure
Common Micro-Ops
There are 4 types of Micro-Ops:
Transfer: transfers data from one register to another
R0 <- R1
Arithmetic: performs arithmetic on data in registers
R0 <- R1 + R2
Logic/bit manipulation: performs bit (Boolean) operations on data
R0 <- R1 & R2 ; or R0 <- R1 | R2
Shift: shift data in registers by one or more bit positions
R0 <- R1 << 3; or R0 <- R2 >> 2
Micro-Ops Transfer
Parallel
Parallel transfer is typically
used for transfers between
registers
Ex: Transfer all contents of A
into B on one clock pulse
A <- B
Control function: we can do
this by structuring the RTL
expression to indicate the
controlling condition
Ex: P: A<- B
Micro-Ops Transfer
Serial
Serial transfer is used to

specify that a collection
of bits are to be moved,
but that the transfer is
to occur one bit at a
time
Ex:
S: A <- B, B <-B
Micro-Ops Transfer
Bus
A bus consists of a set of parallel data lines
To transfer data using a bus: connect the output of the
source register to the bus; connect the input of the
target register to the bus; when the clock pulse arrives,
the transfer occurs
Micro-Ops Transfer
Memory
Memory transfers are similar to register transfers, but
Memory to register transfers are called read operations,
while register to memory transfers are called write
operations
RTL expressions for a read operation, assuming the use of an
address registers:
AR <- address
DR <- M[AR]
RTL expressions for a write operation, assuming use of a data
register:
AR <- address
DR <- value
M[AR] <- DR
Micro-Ops Arithmetic & Logic
CPU typically provides addition, subtraction, increment, and

decrement operations in its ALU (arithmetic-logic unit).
Logic micro-ops are like arithmetic, but treat each bit of the
register(s) separately
Applications of Logic Micro-ops
How are logic operations useful?
- can be used to change bit values
- delete a group of bits
- insert new bits into a register
Micro-Ops Shift
Move the information in a register by one bit position

Shifts come in three varieties:
- Logical
- Arithmetic
- Circular
Using RTL to specify Digital System
Specification of Digital Components
D flip-flop
Specification and Implementation of simple system: complete
design of the system to implement the RTL code using,
Direct connection
Bus and Tri-state buffers
Bus and Multiplexer
Data-path Design
Example Design and

Operation
Micro-operation RTL Expression X2X1X0

Load A B 010
Add A B+A 000
Subtract A B-A 101
Increment A B+1 110
Decrement A B-1 011
Table: Micro-operation Control Signal Definitions

More Complex Digital System & RTL
There are two complex Digital System and RTL:
Module 6 Counter
Toll Booth Controller
UNIT -3
BASIC COMPUTER ORGANIZATION AND DESIGN
Instruction Codes
Computer Registers
Computer Instructions
Timing and Control
Instruction Cycle
Memory Reference Instructions
Input-Output and Interrupt
Complete Computer Description
Design of Basic Computer
Design of Accumulator Logic

INTRODUCTION
Every different processor type has its own design (different registers,
buses, microoperations, machine instructions, etc)
Modern processor is a very complex device
It contains
Many registers
Multiple arithmetic units, for both integer and floating point calculations
The ability to pipeline several consecutive instructions to speed execution
Etc.
However, to understand how processors work, we will start with a
simplified processor model
This is similar to what real processors were like ~25 years ago
M. Morris Mano introduces a simple processor model he calls the Basic
Computer
We will use this to introduce processor organization and the relationship
of the RTL model to the higher level computer processor
THE BASIC COMPUTER
The Basic Computer has two components, a processor and memory

The memory has 4096 words in it
4096 = 212, so it takes 12 bits to select a word in memory
Each word is 16 bits long
CPU RAM
0
15 0
4095
INSTRUCTIONS
Program
A sequence of (machine) instructions
(Machine) Instruction
A group of bits that tell the computer to perform a specific operation (a sequence
of micro-operation)
The instructions of a program, along with any needed data are stored
in memory
The CPU reads the next instruction from memory
It is placed in an Instruction Register (IR)
Control circuitry in control unit then translates the instruction into
the sequence of microoperations necessary to implement it
INSTRUCTION FORMAT
A computer instruction is often divided into two parts
An opcode (Operation Code) that specifies the operation for that instruction
An address that specifies the registers and/or locations in memory to use for that
operation
In the Basic Computer, since the memory contains 4096 (= 212) words,
we needs 12 bit to specify which memory address this instruction
will use
In the Basic Computer, bit 15 of the instruction specifies the
addressing mode (0: direct addressing, 1: indirect addressing)
Since the memory words, and hence the instructions, are 16 bits long,
that leaves 3 bits for the instructions opcode
Instruction Format
15 14 12 11 0
I Opcode Address
Addressing
mode
ADDRESSING MODES
The address field of an instruction can represent either
Direct address: the address in memory of the data to use (the address of the operand), or
Indirect address: the address in memory of the address in memory of the data to use
Direct addressing Indirect addressing
22 0 ADD 457 35 1 ADD 300
300 1350
457 Operand
1350 Operand
+ +
Effective Address (EA)
AC modification to access an operand
The address, that can be directly used without AC for a
computation-type instruction, or as the target address for a branch-type instruction
PROCESSOR REGISTERS
A processor has many registers to hold instructions, addresses, data, etc

The processor has a register, the Program Counter (PC) that holds the
memory address of the next instruction to get
Since the memory in the Basic Computer only has 4096 locations, the PC only needs 12
bits
In a direct or indirect addressing, the processor needs to keep track of
what locations in memory it is addressing: The Address Register (AR) is
used for this
The AR is a 12 bit register in the Basic Computer
When an operand is found, using either direct or indirect addressing, it is
placed in the Data Register (DR). The processor then uses this value as
data for its operation
The Basic Computer has a single general purpose register the
Accumulator (AC)
PROCESSOR REGISTERS
The significance of a general purpose register is that it can be referred to in

instructions
e.g. load AC with the contents of a specific memory location; store the contents of AC into
a specified memory location
Often a processor will need a scratch register to store intermediate results
or other temporary data; in the Basic Computer this is the Temporary
Register (TR)
The Basic Computer uses a very simple model of input/output (I/O)
operations
Input devices are considered to send 8 bits of character data to the processor
The processor can send 8 bits of character data to output devices
The Input Register (INPR) holds an 8 bit character gotten from an input
device
The Output Register (OUTR) holds an 8 bit character to be send to an output
device
BASIC COMPUTER REGISTERS
Registers in the Basic Computer
11 0
PC
Memory
11 0 4096 x 16
AR
15 0
IR CPU
15 0 15 0
TR DR
7 0 7 0 15 0
OUTR INPR AC
List of BC Registers
DR 16 Data Register Holds memory operand
AR 12 Address Register Holds address for memory
AC 16 Accumulator Processor register
IR 16 Instruction Register Holds instruction code
PC 12 Program Counter Holds address of instruction
TR 16 Temporary Register Holds temporary data
INPR 8 Input Register Holds input character
OUTR 8 Output Register Holds output character
Registers
COMMON BUS SYSTEM
The registers in the Basic Computer are connected using a bus

This gives a savings in circuitry over complete connections between
registers
COMMON BUS SYSTEM
S2
S1 Bus
S0
Memory unit 7
4096 x 16
Address
Write Read
AR 1
LD INR CLR
PC 2
LD INR CLR
DR 3
LD INR CLR
E
ALU AC 4
LD INR CLR
INPR
IR 5
LD
TR 6
LD INR CLR
OUTR
Cloc
LD
16-bit common bus k
Registers
COMMON BUS SYSTEM
Read INPR
Memory Write
4096 x 16 E ALU
Address
AC
L I C
L I C L
DR IR
L I C L I C
PC TR
AR OUTR LD
L I C
7 1 2 3 4 5 6
16-bit Common Bus
S0 S1 S2
COMMON BUS SYSTEM
Three control lines, S2, S1, and S0 control which register the bus
selects as its input
S2 S1 S 0 Register
0 0 0 x
0 0 1 AR
0 1 0 PC
0 1 1 DR
1 0 0 AC
1 0 1 IR
1 1 0 TR
1 1 1 Memory
Either one of the registers will have its load signal activated, or the
memory will have its read signal activated
Will determine where the data from the bus gets loaded
The 12-bit registers, AR and PC, have 0s loaded onto the bus in the
high order 4 bit positions
When the 8-bit register OUTR is loaded from the bus, the data
comes from the low order 8 bits on the bus
Instructions
BASIC COMPUTER INSTRUCTIONS
Basic Computer Instruction Format
Memory-Reference Instructions (OP-code = 000 ~ 110)

15 14 12 11 0
I Opcode Address
Register-Reference Instructions (OP-code = 111, I = 0)

15 12 11 0
0 1 1 1 Register operation
Input-Output Instructions (OP-code =111, I = 1)

15 12 11 0
1 1 1 1 I/O operation
BASIC COMPUTER INSTRUCTIONS
Hex Code
Symbol I = 0 I=1 Description
AND 0xxx 8xxx AND memory word to AC
ADD 1xxx 9xxx Add memory word to AC
LDA 2xxx Axxx Load AC from memory
STA 3xxx Bxxx Store content of AC into memory
BUN 4xxx Cxxx Branch unconditionally
BSA 5xxx Dxxx Branch and save return address
ISZ 6xxx Exxx Increment and skip if zero
CLA 7800 Clear AC

CLE 7400 Clear E
CMA 7200 Complement AC
CME 7100 Complement E
CIR 7080 Circulate right AC and E
CIL 7040 Circulate left AC and E
INC 7020 Increment AC
SPA 7010 Skip next instr. if AC is positive
SNA 7008 Skip next instr. if AC is negative
SZA 7004 Skip next instr. if AC is zero
SZE 7002 Skip next instr. if E is zero
HLT 7001 Halt computer
INP F800 Input character to AC

OUT F400 Output character from AC
SKI F200 Skip on input flag
SKO F100 Skip on output flag
ION F080 Interrupt on
IOF F040 Interrupt off
Instructions
INSTRUCTION SET COMPLETENESS

A computer should have a set of instructions so that the user can
construct machine language programs to evaluate any function that is known
to be computable.
Instruction Types
Functional Instructions
- Arithmetic, logic, and shift instructions
- ADD, CMA, INC, CIR, CIL, AND, CLA
Transfer Instructions
- Data transfers between the main memory
and the processor registers
- LDA, STA
Control Instructions
- Program sequencing and control
- BUN, BSA, ISZ
Input/Output Instructions
- Input and output
- INP, OUT
Instruction codes
CONTROL UNIT
Control unit (CU) of a processor translates from machine instructions

to the control signals for the microoperations that implement them
Control units are implemented in one of two ways

Hardwired Control
CU is made up of sequential and combinational circuits to generate the control
signals
Microprogrammed Control
A control memory on the processor contains microprograms that activate the
necessary control signals
We will consider a hardwired implementation of the control unit for

the Basic Computer
TIMING AND CONTROL
Control unit of Basic Computer
Instruction register (IR)

15 14 13 12 11 - 0 Other inputs
3x8
decoder
7 6543 210
D0
I Combinational Control
D7
Control signals
T15 logic
T0
15 14 . . . . 2 1 0
4 x 16
decoder
4-bit Increment (INR)

sequence Clear (CLR)
counter
(SC) Clock
TIMING SIGNALS
- Generated by 4-bit sequence counter and 416 decoder
- The SC can be incremented or cleared.
- Example: T0, T1, T2, T3, T4, T0, T1, . . .

Assume: At time T4, SC is cleared to 0 if decoder output D3 is active.
D3T4: SC 0
T0 T1 T2 T3 T4 T0
Clock
T0
T1
T2
T3
T4
D3
CLR
SC
INSTRUCTION CYCLE
In Basic Computer, a machine instruction is executed in the following

cycle:
1. Fetch an instruction from memory
2. Decode the instruction
3. Read the effective address from memory if the instruction has an indirect address
4. Execute the instruction
After an instruction is executed, the cycle starts again at step 1, for the
next instruction
Note: Every different processor has its own (different)

instruction cycle
Instruction Cycle
FETCH and DECODE
Fetch and Decode T0: AR PC (S0S1S2=010, T0=1)

T1: IR M [AR], PC PC + 1 (S0S1S2=111, T1=1)
T2: D0, . . . , D7 Decode IR(12-14), AR IR(0-11), I IR(15)
T1 S2
T0 S1 Bus
S0
Memory
unit 7
Address
Read
AR 1
LD
PC 2
INR
IR 5
LD Clock
Common bus
DETERMINE THE TYPE OF INSTRUCTION
Start
SC 0
AR PC T0
T1
IR M[AR],PC PC + 1
T2
Decode Opcode in IR(12-14),
AR IR(0-11),I IR(15)
(Register or I/O) = 1
D7 = 0 (Memory-reference)
(I/O) = 1 I = 0 (indirect) = 1 = 0 (direct)
I
(register)
T3 T3 T3 T3
Execute Execute ARM[AR] Nothing
input-output register-reference
instruction instruction
SC 0 SC 0 Execute T4
memory-reference
instruction
SC 0
D'7IT3: AR M[AR]
D'7I'T3: Nothing
D7I'T3: Execute a register-reference instr.
D7IT3: Execute an input-output instr.
Instruction Cycle
REGISTER REFERENCE INSTRUCTIONS

Register Reference Instructions are identified when
- D7 = 1, I = 0
- Register Ref. Instr. is specified in b0 ~ b11 of IR
- Execution starts with timing signal T3
r = D7 IT3 => Register Reference Instruction

Bi = IR(i) , i=0,1,2,...,11
r: SC 0
CLA rB11: AC 0
CLE rB10: E0
CMA rB9: AC AC
CME rB8: E E
CIR rB7: AC shr AC, AC(15) E, E AC(0)
CIL rB6: AC shl AC, AC(0) E, E AC(15)
INC rB5: AC AC + 1
SPA rB4: if (AC(15) = 0) then (PC PC+1)
SNA rB3: if (AC(15) = 1) then (PC PC+1)
SZA rB2: if (AC = 0) then (PC PC+1)
SZE rB1: if (E = 0) then (PC PC+1)
HLT rB0: S 0 (S is a start-stop flip-flop)
MEMORY REFERENCE INSTRUCTIONS
Symbol Operation Symbolic Description

Decoder
AND D0 AC AC M[AR]
ADD D1 AC AC + M[AR], E Cout
LDA D2 AC M[AR]
STA D3 M[AR] AC
BUN D4 PC AR
BSA D5 M[AR] PC, PC AR + 1
ISZ D6 M[AR] M[AR] + 1, if M[AR] + 1 = 0 then PC PC+1
- The effective address of the instruction is in AR and was placed there during
timing signal T2 when I = 0, or during timing signal T3 when I = 1
- Memory cycle is assumed to be short enough to complete in a CPU cycle
- The execution of MR instruction starts with T4
AND to AC
D0T4: DR M[AR] Read operand
D0T5: AC AC DR, SC 0 AND with AC
ADD to AC
D1T4: DR M[AR] Read operand
D1T5: AC AC + DR, E Cout, SC 0 Add to AC and store carry in E
LDA: Load to AC
D2T4: DR M[AR]
D2T5: AC DR, SC 0
STA: Store AC
D3T4: M[AR] AC, SC 0
BUN: Branch Unconditionally
D4T4: PC AR, SC 0
BSA: Branch and Save Return Address
M[AR] PC, PC AR + 1
Memory, PC, AR at time T4 Memory, PC after execution
20 0 BSA 135 20 0 BSA 135
PC = 21 Next instruction 21 Next instruction
AR = 135 135 21
136 Subroutine PC = 136 Subroutine
1 BUN 135 1 BUN 135

Memory Memory
BSA:
D5T4: M[AR] PC, AR AR + 1
D5T5: PC AR, SC 0
ISZ: Increment and Skip-if-Zero

D6T4: DR M[AR]
D6T5: DR DR + 1
D6T4: M[AR] DR, if (DR = 0) then (PC PC + 1), SC 0
FLOWCHART FOR MEMORY REFERENCE INSTRUCTIONS
Memory-reference instruction
AND ADD LDA STA

D0 T4 D1 T4 D2 T4 D3 T4
DR M[AR] DR M[AR] DR M[AR] M[AR] AC
SC 0
D0 T5 D1 T5 D2 T5

AC AC DR AC AC + DR AC DR
SC 0 E Cout SC 0
SC 0
BUN BSA ISZ
D4 T4 D5 T4 D6 T4
PC AR M[AR] PC DR M[AR]
SC 0 AR AR + 1
D5 T5 D6 T5
PC AR DR DR + 1
SC 0
D6 T6
M[AR] DR
If (DR = 0)
then (PC PC + 1)
SC 0
I/O and Interrupt
INPUT-OUTPUT AND INTERRUPT
A Terminal with a keyboard and a Printer

Input-Output Configuration
Input-output Serial Computer
terminal communication
interface registers and
Printer Receiver flip-flops
OUTR FGO
interface
AC
Keyboard Transmitter INPR FGI

INPR Input register - 8 bits interface
OUTR Output register - 8 bits Serial Communications Path
FGI Input flag - 1 bit Parallel Communications Path
FGO Output flag - 1 bit
IEN Interrupt enable - 1 bit
- The terminal sends and receives serial information

- The serial info. from the keyboard is shifted into INPR
- The serial info. for the printer is stored in the OUTR
- INPR and OUTR communicate with the terminal
serially and with the AC in parallel.
- The flags are needed to synchronize the timing
difference between I/O device and the computer
PROGRAM CONTROLLED DATA TRANSFER
-- CPU -- -- I/O Device --
/* Input */ /* Initially FGI = 0 */ loop: If FGI = 1 goto loop
loop: If FGI = 0 goto loop INPR new data, FGI 1
AC INPR, FGI 0
loop: If FGO = 1 goto loop
/* Output */ /* Initially FGO = 1 */
consume OUTR, FGO 1
loop: If FGO = 0 goto loop
OUTR AC, FGO 0
FGI=0 FGO=1
Start Input Start Output
FGI 0
AC Data
yes yes
FGI=0
FGO=0
no
no
AC INPR
OUTR AC
yes More FGO 0

Character
yes More
no Character
END no
END
INPUT-OUTPUT INSTRUCTIONS
D7IT3 = p
IR(i) = Bi, i = 6, , 11
p: SC 0 Clear SC
INP pB11: AC(0-7) INPR, FGI 0 Input char. to AC
OUT pB10: OUTR AC(0-7), FGO 0 Output char. from AC
SKI pB9: if(FGI = 1) then (PC PC + 1) Skip on input flag
SKO pB8: if(FGO = 1) then (PC PC + 1) Skip on output flag
ION pB7: IEN 1 Interrupt enable on
IOF pB6: IEN 0 Interrupt enable off
PROGRAM-CONTROLLED INPUT/OUTPUT
Program-controlled I/O
- Continuous CPU involvement
I/O takes valuable CPU time
- CPU slowed down to I/O speed
- Simple
- Least hardware
Input
LOOP, SKI DEV

BUN LOOP
INP DEV
Output
LOOP, LDA DATA
LOP, SKO DEV
BUN LOP
OUT DEV
INTERRUPT INITIATED INPUT/OUTPUT
- Open communication only when some data has to be passed --> interrupt.
- The I/O interface, instead of the CPU, monitors the I/O device.
- When the interface founds that the I/O device is ready for data transfer,
it generates an interrupt request to the CPU
- Upon detecting an interrupt, the CPU stops momentarily the task

it is doing, branches to the service routine to process the data
transfer, and then returns to the task it was performing.
* IEN (Interrupt-enable flip-flop)
- can be set and cleared by instructions

- when cleared, the computer cannot be interrupted
I/O and Interrupt
FLOWCHART FOR INTERRUPT CYCLE

R = Interrupt f/f
Instruction cycle =0 =1 Interrupt cycle
R
Fetch and decode Store return address

instructions in location 0
M[0] PC
Execute =0
IEN
instructions
=1 Branch to location 1
=1
FGI
PC 1
=0
=1
FGO IEN0
=0 R0
R 1
- The interrupt cycle is a HW implementation of a branch

and save return address operation.
- At the beginning of the next instruction cycle, the
instruction that is read from memory is in address 1.
- At memory address 1, the programmer must store a branch instruction
that sends the control to an interrupt service routine
- The instruction that returns the control to the original
program is "indirect BUN 0"
REGISTER TRANSFER OPERATIONS IN INTERRUPT CYCLE
Memory
Before interrupt After interrupt cycle
0 0 256
1 0 BUN 1120 PC = 1 0 BUN 1120
Main Main
255 Program 255 Program
PC = 256 256
1120 1120
I/O I/O
Program Program
1 BUN 0 1 BUN 0
Register Transfer Statements for Interrupt Cycle
- R F/F 1 if IEN (FGI + FGO)T0T1T2
T0T1T2 (IEN)(FGI + FGO): R 1
- The fetch and decode phases of the instruction cycle

must be modified Replace T0, T1, T2 with R'T0, R'T1, R'T2
- The interrupt cycle :
RT0: AR 0, TR PC
RT1: M[AR] TR, PC 0
RT2: PC PC + 1, IEN 0, R 0, SC 0
FURTHER QUESTIONS ON INTERRUPT
How can the CPU recognize the device

requesting an interrupt ?
Since different devices are likely to require

different interrupt service routines, how can
the CPU obtain the starting address of the
appropriate routine in each case ?
Should any device be allowed to interrupt the

CPU while another interrupt is being serviced ?
How can the situation be handled when two or

more interrupt requests occur simultaneously ?
Description
COMPLETE COMPUTER DESCRIPTION
Flowchart of Operations
start
SC 0, IEN 0, R 0
=0(Instruction =1(Interrupt
R
Cycle) Cycle)
RT0 RT0
AR PC RT AR 0, TR PC RT1
1
IR M[AR], PC PC + 1 M[AR] TR, PC 0
RT2 RT2
AR IR(0~11), I IR(15) PC PC + 1, IEN 0
D0...D7 Decode IR(12 ~ 14) R 0, SC 0
=1(Register or I/O) D=0(Memory Ref)

7
=1 (I/O) =0 (Register) =1(Indir) =0(Dir)

I I
D7IT3 D7IT3 D7IT3 D7IT3

Execute Execute AR <- M[AR] Idle
I/O RR
Instruction Instruction
Execute MR D7T4
Instruction
COMPLETE COMPUTER DESCRIPTION Microoperations
Fetch RT0: AR PC
RT1:
IR M[AR], PC PC + 1
Decode RT2:
D0, ..., D7 Decode IR(12 ~ 14),
Indirect D7IT3: AR IR(0 ~ 11), I IR(15)
AR M[AR]
Interrupt
T0T1T2(IEN)(FGI + FGO):
RT0:
RT1: R 1
RT2: AR 0, TR PC
Memory-Reference M[AR] TR, PC 0
AND D 0 T4 :
PC PC + 1, IEN 0, R 0, SC 0
D 0 T5 :
ADD D 1 T4 :
D 1 T5 : DR M[AR]
LDA D 2 T4 : AC AC DR, SC 0
DR M[AR]
D2T5:
STA D 3 T4 :
BUN D 4 T4 : AC AC + DR, E Cout, SC 0
BSA D 5 T4 : DR M[AR]
D 5 T5 : AC DR, SC 0
ISZ D 6 T4 :
M[AR] AC, SC 0
D 6 T5 :
D 6 T6 : PC AR, SC 0
M[AR] PC, AR AR + 1
PC AR, SC 0
COMPLETE COMPUTER DESCRIPTION Microoperations
Register-Reference
D7IT3 = r (Common to all register-reference instr)
IR(i) = Bi (i = 0,1,2, ..., 11)
r: SC 0
CLA rB11:
AC 0
CLE rB10:
CMA rB9: E0
CME rB8: AC AC
E E
CIR rB7:
CIL rB6:
INC rB5: AC shr AC, AC(15) E, E AC(0)
SPA rB4: AC shl AC, AC(0) E, E AC(15)
SNA rB3: AC AC + 1
SZA rB2:
If(AC(15) =0) then (PC PC + 1)
SZE rB1:
HLT rB0: If(AC(15) =1) then (PC PC + 1)
If(AC = 0) then (PC PC + 1)
If(E=0) then (PC PC + 1)
Input-Output D7IT3 = p
IR(i) = Bi
p: S0
INP pB11:
OUT pB10: (Common to all input-output instructions)
SKI pB9: (i = 6,7,8,9,10,11)
SKO pB8: SC 0
ION pB7: AC(0-7) INPR, FGI 0
IOF pB6:
OUTR AC(0-7), FGO 0
DESIGN OF BASIC COMPUTER(BC)
Hardware Components of BC
A memory unit: 4096 x 16.
Registers:
AR, PC, DR, AC, IR, TR, OUTR, INPR, and SC
Flip-Flops(Status):
I, S, E, R, IEN, FGI, and FGO
Decoders: a 3x8 Opcode decoder
a 4x16 timing decoder
Common bus: 16 bits
Control logic gates:
Adder and Logic circuit: Connected to AC
Control Logic Gates
- Input Controls of the nine registers
- Read and Write Controls of memory
- Set, Clear, or Complement Controls of the flip-flops
- S2, S1, S0 Controls to select a register for the bus
- AC, and Adder and Logic circuit
CONTROL OF REGISTERS AND MEMORY
Address Register; AR
Scan all of the register transfer statements that change the content of AR:
RT0: AR PC LD(AR)
RT2: AR IR(0-11) LD(AR)
D7IT3: AR M[AR] LD(AR)
RT0: AR 0 CLR(AR)
D5T4: AR AR + 1 INR(AR)
LD(AR) = R'T0 + R'T2 + D'7IT3

CLR(AR) = RT0
INR(AR) = D5T4
From bus 12 AR
12
To bus
D'7
I Clock
T3 LD
T2 INR
CLR
R
T0
D
T4
Design of Basic Computer
CONTROL OF FLAGS
IEN: Interrupt Enable Flag
pB7: IEN 1 (I/O Instruction)
pB6: IEN 0 (I/O Instruction)
RT2: IEN 0 (Interrupt)
p = D7IT3 (Input/Output Instruction)
D7
I p
B7 J Q IEN
T3
B6
K
R
T2
CONTROL OF COMMON BUS
x1
x2 S2
x3 Multiplexer
x4 Encoder S 1 bus select
x5 inputs
x6 S0
x7
x1 x2 x3 x4 x5 x6 x7 selected
S2 S1 S0
0 0 0 0 0 0 0 0 0 register
0 none
1 0 0 0 0 0 0 0 0 1 AR
0 1 0 0 0 0 0 0 1 0 PC
0 0 1 0 0 0 0 0 1 1 DR
0 0 0 1 0 0 0 1 0 0 AC
0 0 0 0 1 0 0 1 0 1 IR
0 0 0 0 0 1 0 1 1 0 TR
For AR 0 0 0 0 0 0 1 1 1 1 Memory
D4T4: PC AR
D5T5: PC AR
x1 = D4T4 + D5T5
DESIGN OF ACCUMULATOR LOGIC
Circuits associated with AC
16
16 Adder and 16 16
From DR logic AC
circuit To bus
From INPR8
LD INR CLR Clock
Control
gates
All the statements that change the content of AC

D0T5: AC AC DR AND with DR
D1T5: AC AC + DR Add with DR
D2T5: AC DR Transfer from DR
pB11: AC(0-7) INPR Transfer from INPR
rB9: AC AC Complement
rB7 : AC shr AC, AC(15) E Shift right
rB6 : AC shl AC, AC(0) E Shift left
rB11 : AC 0 Clear
rB5 : AC AC + 1 Increment
CONTROL OF AC REGISTER
Gate structures for controlling

the LD, INR, and CLR of AC
From Adder 16 16 To bus

and Logic AC
D0 AND LD Clock
T5 INR
D1 ADD CLR
D2 DR
T5
p INPR
B 11
r COM
B9
SHR
B7
SHL
B6
INC
B5
CLR
B 11
ALU (ADDER AND LOGIC CIRCUIT)
One stage of Adder and Logic circuit

DR(i) AC(i)
AND
Ci ADD LD
FA Ii J Q
DR AC(i)
Ci+1
K
From INPR
INPR
bit(i) COM
SHR
AC(i+1)
SHL
AC(i-1)
UNIT-4
MEMORY ORGANIZATION
Memory Hierarchy
Main Memory
Auxiliary Memory
Associative Memory
Cache Memory
Virtual Memory
Memory Management Hardware

Memory Hierarchy
MEMORY HIERARCHY
Memory Hierarchy is to obtain the highest possible

access speed while minimizing the total cost of the memory system
Auxiliary memory
Magnetic
tapes I/O Main
processor memory
Magnetic
disks
CPU Cache
memory
Register
Cache
Main Memory
Magnetic Disk
Magnetic Tape
Main Memory
MAIN MEMORY
RAM and ROM Chips
Typical RAM chip
Chip select 1 CS1
Chip select 2 CS2
Read RD 128 x 8 8-bit data bus
RAM
Write WR
7-bit address AD 7
CS1 CS2 RD WR Memory function State of data bus

0 0 x x Inhibit High-impedence
1 0 0 0 Inhibit High-impedence
1 0 0 1 Write Input data to RAM
1 0 1 x Read Output data from RAM
Typical ROM chip
Chip select 1 CS1

Chip select 2 CS2
512 x 8 8-bit data bus
ROM
9-bit address AD 9
Main Memory
MEMORY ADDRESS MAP

Address space assignment to each memory chip
Example: 512 bytes RAM and 512 bytes ROM
Hexa Address bus

Component address 10 9 8 7 6 5 4 3 2 1
RAM 1 0000 - 007F 0 0 0 x x x x x x x
RAM 2 0080 - 00FF 0 0 1 x x x x x x x
RAM 3 0100 - 017F 0 1 0 x x x x x x x
RAM 4 0180 - 01FF 0 1 1 x x x x x x x
ROM 0200 - 03FF 1 x x x x x x x x x
Memory Connection to CPU

- RAM and ROM chips are connected to a CPU
through the data and address buses
- The low-order lines in the address bus select

the byte within the chips and other lines in the
address bus select a particular chip through
its chip select inputs
Main Memory
CONNECTION OF MEMORY TO CPU

CPU
Address bus
16-1110 9 8 7-1 RD WR Data bus
Decoder
3210
CS1
Data
CS2 128 x 8
RD RAM 1
WR
AD7
CS1
Data
CS2
RD 128
RAM
x8
2
WR
AD7
CS1
Data
CS2 128 x 8
RD RAM 3
WR
AD7
CS1
Data
CS2
RD 128
RAM
x8
4
WR
AD7
CS1
Data
1- 7 CS2 512 x 8
8 } AD9 ROM
9
INPUT-OUTPUT ORGANIZATION
Peripheral Devices
Input-Output Interface
Asynchronous Data Transfer
Modes of Transfer
Priority Interrupt
Direct Memory Access
Input-Output Processor
Serial Communication
Peripheral Devices
PERIPHERAL DEVICES
Input Devices Output Devices

Keyboard Card Puncher, Paper Tape Puncher
Optical input devices CRT
- Card Reader Printer (Impact, Ink Jet,
- Paper Tape Reader Laser, Dot Matrix)
- Bar code reader Plotter
- Digitizer Analog
- Optical Mark Reader Voice
Magnetic Input Devices
- Magnetic Stripe Reader
Screen Input Devices
- Touch Screen
- Light Pen
- Mouse
Analog Input Devices
Input/Output Interfaces
INPUT/OUTPUT INTERFACES
* Provides a method for transferring information between internal storage

(such as memory and CPU registers) and external I/O devices
* Resolves the differences between the computer and peripheral devices
- Peripherals - Electromechanical Devices

CPU or Memory - Electronic Device
- Data Transfer Rate

Peripherals - Usually slower
CPU or Memory - Usually faster than peripherals
Some kinds of Synchronization mechanism may be needed
- Unit of Information
Peripherals - Byte
CPU or Memory - Word
- Operating Modes
Peripherals - Autonomous, Asynchronous
CPU or Memory - Synchronous
I/O BUS AND INTERFACE MODULES

I/O bus
Data
Processor Address
Control
Interface Interface Interface Interface
Keyboard
and Printer Magnetic
disk
Magnetic
tape
display
terminal
Each peripheral has an interface module associated with it
Interface
- Decodes the device address (device code)
- Decodes the commands (operation)
- Provides signals for the peripheral controller
- Synchronizes the data flow and supervises
the transfer rate between peripheral and CPU or Memory
Typical I/O instruction
Op. code Device address Function code
(Command)
CONNECTION OF I/O BUS

Connection of I/O Bus to CPU
Op. Device Function Accumulator Computer

I/O
code address code register control CPU
Sense lines
Data lines I/O
Function code lines bus
Device address lines
Connection of I/O Bus to One Interface
Data lines Peripheral
register
Device Buffer register Output
address peripheral
I/O device
AD = 1101 Interface
and
bus Logic controller
Function codeCommand
decoder
Sense lines Status
register
I/O BUS AND MEMORY BUS

Functions of Buses
* MEMORY BUS is for information transfers between CPU and the MM

* I/O BUS is for information transfers between CPU
and I/O devices through their I/O interface
Physical Organizations
* Many computers use a common single bus system
for both memory and I/O interface units
- Use one common bus but separate control lines for each function
- Use one common bus with common control lines for both functions
* Some computer systems use two separate buses,
one to communicate with memory and the other with I/O interfaces
I/O Bus
- Communication between CPU and all interface units is via a common
I/O Bus
- An interface connected to a peripheral device may have a number of
data registers , a control register, and a status register
- A command is passed to the peripheral by sending
to the appropriate interface register
- Function code and sense lines are not needed (Transfer of data, control,
and status information is always via the common I/O Bus)
ISOLATED vs MEMORY MAPPED I/O
Isolated I/O
- Separate I/O read/write control lines in addition to memory read/write control
lines
- Separate (isolated) memory and I/O address spaces
- Distinct input and output instructions
Memory-mapped I/O
- A single set of read/write control lines
(no distinction between memory and I/O transfer)
- Memory and I/O addresses share the common address space
-> reduces memory address range available
- No specific input or output instruction
-> The same memory reference instructions can
be used for I/O transfers
- Considerable flexibility in handling I/O operations
I/O INTERFACE
Port A I/O data
register
Bidirectional Bus
data bus buffers
Port B I/O data
register
CPU Chip select I/O
CS
Register select Control Control Device
RS1 Timing register
Register select RS0 and
I/O read Control
RD Status Status
I/O write WR register
CS RS1 RS0 Register selected

0 x x None - data bus in high-imped
1 0 0 Port A register
1 0 1 Port B register
1 1 0 Control register
Programmable Interface 1 1 1 Status register
- Information in each port can be assigned a meaning
depending on the mode of operation of the I/O device
-> Port A = Data; Port B = Command; Port C = Status
- CPU initializes(loads) each port by transferring a byte to the Control Register
-> Allows CPU can define the mode of operation of each port
-> Programmable Port: By changing the bits in the control register, it is
possible to change the interface characteristics
ASYNCHRONOUS DATA TRANSFER
Synchronous and Asynchronous Operations

Synchronous - All devices derive the timing
information from common clock line
Asynchronous - No common clock

Asynchronous data transfer between two independent units requires that
control signals be transmitted between the communicating units to
indicate the time at which data is being transmitted
Two Asynchronous Data Transfer Methods
Strobe pulse
- A strobe pulse is supplied by one unit to indicate
the other unit when the transfer has to occur
Handshaking
- A control signal is accompanied with each data
being transmitted to indicate the presence of data
- The receiving unit responds with another control
signal to acknowledge receipt of the data
STROBE CONTROL
* Employs a single control line to time each transfer

* The strobe may be activated by either the source or the destination unit
Source-Initiated Strobe Destination-Initiated Strobe

for Data Transfer for Data Transfer
Block Diagram Block Diagram
Data bus Data bus

Source Destination Source Destination
unit Strobe unit unit Strobe unit
Timing Diagram Timing Diagram
Valid data Valid data

Data Data
Strobe Strobe
HANDSHAKING
Strobe Methods
Source-Initiated
The source unit that initiates the transfer has

no way of knowing whether the destination unit
has actually received data
Destination-Initiated
The destination unit that initiates the transfer

no way of knowing whether the source has
actually placed the data on the bus
To solve this problem, the HANDSHAKE method

introduces a second control signal to provide a Reply
to the unit that initiates the transfer
SOURCE-INITIATED TRANSFER USING HANDSHAKE

Data bus
Block Diagram Source Data valid Destination
unit Data accepted unit
Data bus Valid data

Timing Diagram
Data valid
Data accepted
Sequence of Events Source unit Destination unit

Place data on bus.
Enable data valid.
Accept data from bus.
Enable data accepted
Disable data valid.
Invalidate data on bus.
Disable data accepted.
Ready to accept data
* Allows arbitrary delays from one state to the next (initial state).
* Permits each unit to respond at its own data transfer rate
* The rate of transfer is determined by the slower unit
DESTINATION-INITIATED TRANSFER USING HANDSHAKE

Data bus
Block Diagram Source Data valid Destination
unit Ready for data unit
Timing Diagram Ready for data
Data valid
Data bus Valid data

Sequence of Events
Source unit Destination unit
Ready to accept data.
Place data on bus. Enable ready for data.
Enable data valid.
Accept data from bus.
Disable data valid. Disable ready for data.
Invalidate data on bus
(initial state).
* Handshaking provides a high degree of flexibility and reliability because the
successful completion of a data transfer relies on active participation by both units
* If one unit is faulty, data transfer will not be completed
-> Can be detected by means of a timeout mechanism
ASYNCHRONOUS SERIAL TRANSFER

Asynchronous serial transfer
Four Different Types of Transfer Synchronous serial transfer
Asynchronous parallel transfer
Synchronous parallel transfer
Asynchronous Serial Transfer
- Employs special bits which are inserted at both
ends of the character code
- Each character consists of three parts; Start bit; Data bits; Stop bits.
1 1 0 0 0 1 0 1
Start Character bits Stop
bit bits
(1 bit) (at least 1 bit)
A character can be detected by the receiver from the knowledge of 4 rules;
- When data are not being sent, the line is kept in the 1-state (idle state)
- The initiation of a character transmission is detected
by a Start Bit , which is always a 0
- The character bits always follow the Start Bit
- After the last character , a Stop Bit is detected when
the line returns to the 1-state for at least 1 bit time
The receiver knows in advance the transfer rate of the
bits and the number of information bits to expect
UNIVERSAL ASYNCHRONOUS RECEIVER-TRANSMITTER
- UART -
A typical asynchronous communication interface available as an IC
Transmit
Bidirectional Transmitter Shift data
data bus Bus register register
buffers
Internal Bus
Control Transmitter
Transmitter
register control clock
Chip select CS and clock
Register selectRS Timing Status Receiver Receiver CS RS Oper. Register selec
and register control clock 0 x x None
I/O read RD Control and clock 1 0 WR Transmitter re
I/O write WR Receive 1 1 WR Control regist
Receiver Shift data 1 0 RD Receiver regis
register register 1 1 RD Status registe
Transmitter Register
- Accepts a data byte(from CPU) through the data bus
- Transferred to a shift register for serial transmission
Receiver
- Receives serial information into another shift register
- Complete data byte is sent to the receiver register
Status Register Bits
- Used for I/O flags and for recording errors
Control Register Bits
- Define baud rate, no. of bits in each character, whether
to generate and check parity, and no. of stop bits
UNIT-5
PIPELINING AND VECTOR PROCESSING
Parallel Processing
Pipelining
Arithmetic Pipeline
Instruction Pipeline
RISC Pipeline
Vector Processing
Array Processors(refer book)

Parallel Processing
PARALLEL PROCESSING
Execution of Concurrent Events in the computing

process to achieve faster Computational Speed
Levels of Parallel Processing
- Job or Program level
- Task or Procedure level
- Inter-Instruction level
- Intra-Instruction level
Parallel Processing
PARALLEL COMPUTERS
Architectural Classification
Flynn's classification
Based on the multiplicity of Instruction Streams and Data Streams
Instruction Stream
Sequence of Instructions read from memory
Data Stream
Operations performed on the data in the processor
Number of Data Streams

Single Multiple
Number of Single SISD SIMD

Instruction
Streams Multiple MISD MIMD
Parallel Processing
COMPUTER ARCHITECTURES FOR PARALLEL PROCESSING
Von-Neuman SISD Superscalar processors

based
Superpipelined processors
VLIW
MISD Nonexistence
SIMD Array processors
Systolic arrays
Dataflow
Associative processors
MIMD Shared-memory multiprocessors

Reduction
Bus based
Crossbar switch based
Parallel Processing
SISD COMPUTER SYSTEMS
Control Processor Data stream Memory

Unit Unit
Instruction stream
Characteristics
- Standard von Neumann machine

- Instructions and data are stored in memory
- One operation at a time
Limitations
Von Neumann bottleneck
Maximum speed of the system is limited by the

Memory Bandwidth (bits/sec or bytes/sec)
- Limitation on Memory Bandwidth

- Memory is shared by CPU and I/O
Parallel Processing
MISD COMPUTER SYSTEMS
M CU P
M CU P Memory

M CU P Data stream
Instruction stream
Characteristics
- There is no computer at present that can be
classified as MISD
Parallel Processing
SIMD COMPUTER SYSTEMS
Memory
Data bus
Control Unit
Instruction stream
P P P Processor units
Data stream
Alignment network
M M M Memory modules
Characteristics
- Only one copy of the program exists
- A single controller executes one instruction at a time
Parallel Processing
MIMD COMPUTER SYSTEMS
P M P M P M
Interconnection Network
Shared Memory
Characteristics
- Multiple processing units
- Execution of multiple instructions on multiple data
Types of MIMD computer systems

- Shared memory multiprocessors
- Message-passing multicomputers
Pipelining
PIPELINING
A technique of decomposing a sequential process
into suboperations, with each subprocess being
executed in a partial dedicated segment that
operates concurrently with all other segments.
Ai * Bi + Ci for i = 1, 2, 3, ... , 7
Ai Bi MemoryCi
Segment 1
R1 R2
Multiplier
Segment 2
R3 R4
Adder
Segment 3
R5
R1 Ai, R2 Bi Load Ai and Bi

R3 R1 * R2, R4 Ci Multiply and load Ci
R5 R3 + R4 Add
Pipelining
OPERATIONS IN EACH PIPELINE STAGE
Clock Segment 1 Segment 2 Segment 3

Pulse
Number R1 R2 R3 R4 R5
1 A1 B1
2 A2 B2 A1 * B1 C1
3 A3 B3 A2 * B2 C2 A1 * B1 + C1
4 A4 B4 A3 * B3 C3 A2 * B2 + C2
5 A5 B5 A4 * B4 C4 A3 * B3 + C3
6 A6 B6 A5 * B5 C5 A4 * B4 + C4
7 A7 B7 A6 * B6 C6 A5 * B5 + C5
8 A7 * B7 C7 A6 * B6 + C6
9 A7 * B7 + C7
Pipelining
GENERAL PIPELINE
General Structure of a 4-Segment Pipeline
Clock
Input S1 R1 S2 R2 S3 R3 S4 R4
Space-Time Diagram
1 2 3 4 5 6 7 8 9 Clock cycles
Segment 1 T1 T2 T3 T4 T5 T6
2 T1 T2 T3 T4 T5 T6
3 T1 T2 T3 T4 T5 T6
4 T1 T2 T3 T4 T5 T6
Pipelining
PIPELINE SPEEDUP
n: Number of tasks to be performed
Conventional Machine (Non-Pipelined)

tn: Clock cycle
t1: Time required to complete the n tasks
t 1 = n * tn
Pipelined Machine (k stages)

tp: Clock cycle (time to complete each suboperation)
tk: Time required to complete the n tasks
tk = (k + n - 1) * tp
Speedup
Sk: Speedup
Sk = n*tn / (k + n - 1)*tp
tn
lim Sk = ( = k, if tn = k * tp )
n tp
Pipelining
PIPELINE AND MULTIPLE FUNCTION UNITS
Example
- 4-stage pipeline
- subopertion in each stage; tp = 20nS
- 100 tasks to be executed
- 1 task in non-pipelined system; 20*4 = 80nS
Pipelined System
(k + n - 1)*tp = (4 + 99) * 20 = 2060nS
Non-Pipelined System
n*k*tp = 100 * 80 = 8000nS
Speedup
Sk = 8000 / 2060 = 3.88
Ii I i+1 I i+2 I i+3
4-Stage Pipeline is basically identical to the system
with 4 identical function units
Multiple Functional Units P1 P2 P3 P4
Arithmetic Pipeline
ARITHMETIC PIPELINE
Floating-point adder Exponents Mantissas
a b A B
X = A x 2a
R R
Y = B x 2b
[1] Compare the exponents Compare Difference

Segment 1: exponents
[2] Align the mantissa by subtraction
[3] Add/sub the mantissa

R
[4] Normalize the result
Segment 2: Choose exponent Align mantissa
Segment 3: Add or subtract

mantissas
R R
Segment 4: Adjust Normalize

exponent result
R R
INSTRUCTION CYCLE
Six Phases* in an Instruction Cycle
[1] Fetch an instruction from memory
[2] Decode the instruction
[3] Calculate the effective address of the operand
[4] Fetch the operands from memory
[5] Execute the operation
[6] Store the result in the proper place
* Some instructions skip some phases

* Effective address calculation can be done in
the part of the decoding phase
* Storage of the operation result into a register
is done automatically in the execution phase
==> 4-Stage Pipeline
[1] FI: Fetch an instruction from memory

[2] DA: Decode the instruction and calculate
the effective address of the operand
[3] FO: Fetch the operand
[4] EX: Execute the operation
INSTRUCTION PIPELINE
Execution of Three Instructions in a 4-Stage Pipeline

Conventional
i FI DA FO EX
i+1 FI DA FO EX
i+2 FI DA FO EX
Pipelined
i FI DA FO EX
i+1 FI DA FO EX
i+2 FI DA FO EX
INSTRUCTION EXECUTION IN A 4-STAGE PIPELINE
Segment1: Fetch instruction

from memory
Decode instruction
Segment2: and calculate
effective address
Branch?
yes
no
Fetch operand
Segment3: from memory
Segment4: Execute instruction
Interrupt yes
Interrupt?
handling
no
Update PC
Empty pipe
Step: 1 2 3 4 5 6 7 8 9 10 11 12 13
Instruction 1 FI DA FO EX
2 FI DA FO EX
(Branch) 3 FI DA FO EX
4 FI FI DA FO EX
5 FI DA FO EX
6 FI DA FO EX
7 FI DA FO EX
MAJOR HAZARDS IN PIPELINED EXECUTION
Structural hazards(Resource Conflicts)
caused by access to memory by two segments at the same time.
Most of these conflicts can be resolved by using separate instruction
and data memories.
Data hazards (Data Dependency Conflicts)
An instruction scheduled to be executed in the pipeline requires the
result of a previous instruction, which is not yet available
R1 <- B + C ADD DA B,C + Data dependency

R1 <- R1 + 1
INC DA bubble R1 +1
Control hazards
Branches and other instructions that change the PC
make the fetch of the next instruction to be delayed
JMP ID PC + PC Branch address dependency
bubble IF ID OF OE OS
Hazards in pipelines may make it Pipeline Interlock:

necessary to stall the pipeline Detect Hazards Stall until it is cleared
STRUCTURAL HAZARDS
Structural Hazards(Resource conflicts)
Occur when some resource has not been

duplicated enough to allow all combinations
of instructions in the pipeline to execute
Example: With one memory, a data and an instruction fetch

cannot be initiated in the same clock
i FI DA FO EX
i+1 FI DA FO EX
i+2 stall stall FI DA FO EX
The Pipeline is stalled for resource conflict

<- Two Loads with one port memory
-> Two-port memory will serve without stall
DATA HAZARDS
Data Hazards
Occurs when the execution of an instruction

depends on the results of a previous instruction
ADD R1, R2, R3
SUB R4, R1, R5
Data hazard can be dealt with either hardware
techniques or software technique
Hardware Technique
Interlock
- hardware detects the data dependencies and delays the scheduling
of the dependent instruction by stalling enough clock cycles
Forwarding (bypassing, short-circuiting)
- Accomplished by a data path that routes a value from a source
(usually an ALU) to a user, bypassing a designated register. This
allows the value to be produced to be used at an earlier stage in the
pipeline than would otherwise be possible
Software Technique
The compiler is designed to detect a data conflict and reorder instructions
As necessary to delay the loading of the conflicting data by inserting no-operation
instructions.This method is called DELAY LOAD
CONTROL HAZARDS(Branching Difficulties)
Branch Instructions
- Branch target address is not known until

the branch instruction is decoded.
Branch FI DA FO EX
Instruction
Next FI DA FO EX
Instruction
Target address available
- Stall -> waste of cycle times
Dealing with Control Hazards
* Prefetch Target Instruction

* Branch Target Buffer
* Loop Buffer
* Branch Prediction
* Delayed Branch
CONTROL HAZARDS
Prefetch Target Instruction
Fetch instructions in both streams, instruction to be executed if branch
not taken and the instruction if branch taken
Both are saved until branch branch is executed. Then, select the right
instruction stream and discard the wrong stream
Branch Target Buffer(BTB; Associative Memory)
Present in the fetch segment of the pipeline. It has entry of the Address
of previously executed branches i.e. their Target instruction and
the next few instructions
When fetching an instruction, search BTB.
If found, fetch the instruction stream in BTB;
If not, new stream is fetched and update BTB
Loop Buffer(High Speed Register file)
A variation of BTB. A register file maintained by the instruction fetch segment
of the pipeline.
Register file stores the entire loop that allows to execute a loop
without accessing memory
Branch Prediction
Uses additional logic to guess the outcome of the branch condition before it is executed.
The instruction is fetched based on the guess. Correct guess eliminates the branch penalty
Delayed Branch
Compiler detects the branch and rearranges the instruction sequence
by inserting useful instructions that keep the pipeline busy
in the presence of a branch instruction
RISC Pipeline
RISC PIPELINE
RISC
- Machine with a very fast clock cycle that executes at the rate of one
instruction per cycle
<- Simple Instruction Set
Fixed Length Instruction Format
Register-to-Register Operations
Instruction Cycles of Three-Stage Instruction Pipeline
Data Manipulation Instructions
I: Instruction Fetch
A: Decode, Read Registers, ALU Operations
E: Write a Register
Load and Store Instructions

A: Decode, Evaluate Effective Address
E: Register-to-Memory or Memory-to-Register
Program Control Instructions

A: Decode, Evaluate Branch Address
E: Write Register(PC)
RISC Pipeline
DELAYED LOAD IN RISC PIPELINE
LOAD: R1 M[address 1]
LOAD: R2 M[address 2]
ADD: R3 R1 + R2
STORE: M[address 3] R3
Three-segment pipeline timing
Pipeline timing with data conflict
clock cycle 1 2 3 4 5 6
Load R1 I A E
Load R2 I A E
Add R1+R2 I A E
Store R3 I A E
Pipeline timing with delayed load
clock cycle 1 2 3 4 5 6 7 The data dependency is taken

Load R1 I A E care by the compiler rather
Load R2 I A E than the hardware
NOP I A E
Add R1+R2 I A E
Store R3 I A E
RISC Pipeline
DELAYED BRANCH
Compiler analyzes the instructions before and after
the branch and rearranges the program sequence by
inserting useful instructions in the delay steps
Using no-operation instructions
Clock cycles: 1 2 3 4 5 6 7 8 9 10
1. Load I A E
2. Increment I A E
3. Add I A E
4. Subtract I A E
5. Branch to X I A E
6. NOP I A E
7. NOP I A E
8. Instr. in X I A E
Rearranging the instructions

Clock cycles: 1 2 3 4 5 6 7 8
1. Load I A E
2. Increment I A E
3. Branch to X I A E
4. Add I A E
5. Subtract I A E
6. Instr. in X I A E
Vector Processing
VECTOR PROCESSING
Vector Processing Applications

Problems that can be efficiently formulated in terms of vectors
Long-range weather forecasting
Petroleum explorations
Seismic data analysis
Medical diagnosis
Aerodynamics and space flight simulations
Artificial intelligence and expert systems
Mapping the human genome
Image processing
Vector Processor (computer)

Ability to process vectors, and related data structures such as matrices
and multi-dimensional arrays, much faster than conventional computers
Vector Processors may also be pipelined

Vector Processing
VECTOR PROGRAMMING
DO 20 I = 1, 100
20 C(I) = B(I) + A(I)
Conventional computer
Initialize I = 0
20 Read A(I)
Read B(I)
Store C(I) = A(I) + B(I)
Increment I = i + 1
If I 100 goto 20
Vector computer
C(1:100) = A(1:100) + B(1:100)

Vector Processing
VECTOR INSTRUCTION FORMAT
Vector Instruction Format
Operation Base address Base address Base address Vector
code source 1 source 2 destination length
Pipeline for Inner Product of Matrix Multiplication

Source
A
Source Multiplier Adder

B pipeline pipeline
C= A1 B1 + A5B5 + A9 B9 + A13 B13 +.+ Ak Bk
K may be equal to 100 or even 1000
The values of A and B are either in memory or in processor registers. Each floating
point adder and multiplier unit is supposed to have 4 segments. All segment
registers are initially initialized to zero. Therefore the output of the adder is zero
for the first 8 cycles until both the pipes are full.
Ai and Bi are brought in and multiplied at a rate of one pair per cycle. After 4 cycles
the products are added to the Output of the adder. During the next 4 cycles zero is added.
At the end of the 8th cycle the first four products A1B1 through A4B4 are in the four
adder segments and the next four products A5 B5 through A8B8 are in the multiplier
Segments.
C= A1 B1 + A5B5 + A9 B9 + A13 B13 +.
+ A2 B2 + A6 B6 + A10 B10 + A14 B14 +.
+ A3 B3 + A7 B7 + A11 B11 + A15 B15 +..
+ A4 B4 + A8 B8 + A12 B12 + A16 B16 +.
Multiprocessors
MULTIPROCESSORS
Characteristics of Multiprocessors
Interconnection Structures
Interprocessor Arbitration
Interprocessor Communication
and Synchronization
Cache Coherence
Multiprocesso
rs
Characteristics of Multiprocessor systems
A multiprocessor system is an interconnection of two or more CPUs with

memory and input-output equipment.
Multiprocessors system are classified as multiple instruction stream, multiple

data stream systems(MIMD).
There exists a distinction between multiprocessor and multicomputers that

though both support concurrent operations. In multicomputers several
autonomous computers are connected through a network and they may or
may not communicate but in a multiprocessor system thereis a single OS
Control that provides interaction between processors and all the components
of the system to cooperate in the solution of the problem.
VLSI circuit technology has reduced the cost of the computers to such a low
Level that the concept of applying multiple processors to meet system
performance requirements has become an attractive design possibility.
Multiprocessors
Characteristics of Multiprocessors
Benefits of Multiprocessing:
1. Multiprocessing increases the reliability of the system so that a failure or

error in one part has limited effect on the rest of the system. If a fault causes
one processor to fail, a second processor can be assigned to perform the
functions of the disabled one.
2. Improved System performance. System derives high performance from the

fact that computations can proceed in parallel in one of the two ways:
a) Multiple independent jobs can be made to operate in parallel.
b) A single job can be partitioned into multiple parallel tasks. This can be
achieved in two ways:
The user explicitly declares that the tasks of the program be
executed in parallel
The compiler provided with multiprocessor s/w that can
automatically detect parallelism in program. Actually it checks
for Data dependency.
Multiprocessors
COUPLING OF PROCESSORS
Tightly Coupled System/Shared Memory
- Tasks and/or processors communicate in a highly synchronized fashion
- Communicates through a common global shared memory
- Shared memory system. This doesnt preclude each processor from
having its own local memory(cache memory)
Loosely Coupled System/Distributed Memory
- Tasks or processors do not communicate in a synchronized fashion.
- Communicates by message passing packets consisting of an address, the data content, and
some error detection code.
- Overhead for data exchange is high
- Distributed memory system
Loosely coupled systems are more efficient when the interaction between tasks is
minimal, whereas tightly coupled system can tolerate a higher degree of interaction
between tasks.
Multiprocessors
GRANULARITY OF PARALLELISM
Granularity of Parallelism
Coarse-grain
- A task is broken into a handful of pieces, each of which is executed by a powerful processor
- Processors may be heterogeneous
- Computation/communication ratio is very high
Medium-grain
- Tens to few thousands of pieces

- Processors typically run the same code
- Computation/communication ratio is often hundreds or more
Fine-grain
- Thousands to perhaps millions of small pieces, executed by very small, simple processors or
through pipelines
- Processors typically have instructions broadcasted to them
- Compute/communicate ratio often near unity
Multiprocessors
MEMORY
Shared (Global) Memory
- A Global Memory Space accessible by all processors
- Processors may also have some local memory
Distributed (Local, Message-Passing) Memory
- All memory units are associated with processors
- To retrieve information from another processor's
memory a message must be sent there
Uniform Memory
- All processors take the same time to reach all memory locations
Nonuniform (NUMA) Memory
- Memory access is not uniform
SHARED MEMORY
Memory DISTRIBUTED MEMORY
Network
Network
Processors/Memory
Processors
Multiprocessors
SHARED MEMORY MULTIPROCESSORS
M M M
...
Buses, Interconnection Network Multistage IN,

Crossbar Switch
P P ... P
Characteristics
All processors have equally direct access to one large memory address space
Limitations
Memory access latency; Hot spot problem

Multiprocessors
MESSAGE-PASSING MULTIPROCESSORS
Message-Passing Network Point-to-point connections
P P ... P
M M ... M
Characteristics
- Interconnected computers
- Each processor has its own memory, and
communicate via message-passing
Limitations
- Communication overhead; Hard to programming

Multiprocessors
INTERCONNECTION STRUCTURES
The interconnection between the components of a multiprocessor
System can have different physical configurations depending n the number
of transfer paths that are available between the processors and memory in a shared memory
system and among the processing elements in a loosely coupled system.
Some of the schemes are as:

* Time-Shared Common Bus
* Multiport Memory
* Crossbar Switch
* Multistage Switching Network
* Hypercube System
Time shared common Bus

All processors (and memory) are connected to a common bus or busses
- Memory access is fairly uniform, but not very scalable
BUS
- A collection of signal lines that carry module-to-module communication
- Data highways connecting several digital system elements
Operations of Bus Devices
M3 S7 M6 S5 M4
S2
Bus
M3 wishes to communicate with S5

[1] M3 sends signals (address) on the bus that causes
S5 to respond
[2] M3 sends data to S5 or S5 sends data to
M3(determined by the command line)
Master Device: Device that initiates and controls the communication

Slave Device: Responding device
Multiple-master buses
-> Bus conflict
-> need bus arbitration
SYSTEM BUS STRUCTURE FOR MULTIPROCESSORS
Local Bus
Common System Local

Shared Bus CPU IOP
Memor Controller Memory
y
SYSTEM BUS
IOP Local Bus CPU Loca

System Bus CPU System
Contro
Con l
Controller ller Memory Memory
Local Bus Local Bus

Multiprocessors
MULTIPORT MEMORY
Multiport Memory Module

- Each port serves a CPU
Memory Module Control Logic

- Each memory module has control logic
- Resolve memory module conflicts Fixed priority among CPUs
Advantages
- Multiple paths -> high transfer rate
Memory Modules
Disadvantages MM 1 MM 2 MM 3 MM 4
- Memory control logic
- Large number of cables and
connections
CPU 1
CPU 2
CPU 3
CPU 4
Multiprocesso
rs
CROSSBAR SWITCH
Memory modules
Each switch point has control logic to set up MM1 MM2 MM3 MM4
The transfer path between a processor and a
Memory.
CPU1
It also resolves the multiple requests for access to the same memory on the predetermined
Priority basis. CPU2

Though this organization supports simultaneous there is a separate path associated with each
Module. The H/w required to implement the
transfers from all memory modules because CPU4 CPU3
Block Diagram of Crossbar Switch

switch can become quite large and complex
} control
data,address, and
CPU 1
from
data
address
Multiplexers and
arbitration } data,address,
and control
Memor from CPU 2
Module R/W
y logi
memory enable c } control
data,address, and
from
CPU 3
} control
data,address, and
from
CPU 4
Multiprocessors
MULTISTAGE SWITCHING NETWORK
Interstage Switch
A 0 A 0
B 1 B 1
A connected to 0 A connected to 1
A 0 A 0
B 1 B 1
B connected to 0 B connected to 1
Multiprocessors
MULTISTAGE INTERCONNECTION NETWORK
Binary Tree with 2 x 2 Switches 0 000
0 1
001
1
0 010
Some requests cannot be For 0
P1 1
Satisfied simultaneously 1 011
Ex: if P1 is connected to P2
0
000 through 001, p2 can be 100
connected to only one of the 0
1
Destinations ie100 through 111 1 101
0 110
1
111
8x8 Omega Switching Network
0 000
1 001
2 010
3 011
4 100
5 101
6 110
7 111
Multiprocessors
HYPERCUBE INTERCONNECTION
n-dimensional hypercube (binary n-cube)
- p = 2n
- processors are conceptually on the corners of a n-dimensional hypercube, and each is directly connected to
the n neighboring nodes
-Degree = n
- Routing Procedure: source 010 , destination 001
Ex-or :011 .So data is transmitted on y axis and then on Z axis i.e. 010 to 000 and then 000 to 001
011 111
010
0 01 11 110
101
001
1 00 10 100
One-cube Two-cube 00
Three-cube
0
Multiprocessors
INTERPROCESSOR ARBITRATION
Only one of CPU, IOP, and Memory can be granted to use the bus at a time
Arbitration mechanism is needed to handle multiple requests to the shared resources to
resolve multiple contention.
SYSTEM BUS:
A bus that connects the major components such as CPUs, IOPs and memory
A typical System bus consists of 100 signal lines divided into three functional groups: data,
address and control lines. In addition there are power distribution lines to the components.
e.g. IEEE standard 796 bus
- 86 lines
Data: 16(multiple of 8) Address: 24
Control: 26
Power: 20
Multiprocessors
SYNCHRONOUS & ASYNCHRONOUS DATA TRANSFER

Synchronous Bus
Each data item is transferred over a time slice
known to both source and destination unit
- Common clock source
- Or separate clock and synchronization signal is transmitted periodically to synchronize
the clocks in the system
Asynchronous Bus
* Each data item is transferred by Handshake

mechanism
- Unit that transmits the data transmits a control signal that indicates the presence of data
- Unit that receiving the data responds with another control signal to acknowledge the receipt
of the data
* Strobe pulse - supplied by one of the units to indicate to the other unit when the data transfer
has to occur
Multiprocessors
BUS SIGNALS
- address
- data
Bus signal allocation -
- arbitration control
- interrupt
- timing
- power, ground
IEEE Standard 796 Multibus Signals
Data and address

Data lines (16 lines) DATA0 - DATA15
Address lines (24 lines) ADRS0 - ADRS23
Data transfer
Memory read MRDC Memory write MWTC IO read
IORC IO write IOWC
Transfer acknowledge TACK (XACK) Interrupt control
Interrupt request INT0 - INT7
interrupt acknowledge INTA
Multiprocesso
rs
BUS SIGNALS
IEEE Stand ard 796 Multi bu s Sign als (Con td )
Miscellaneous control
Master clock CCLK System initialization INIT Byte high enable BHEN
Memory inhibit (2 lines) INH1 - INH2
Bus lock LOCK Bus arbitration
Bus request BREQ Common bus request CBRQ Bus busy BUSY
Bus clock BCLK Bus priority in BPRN Bus priority out BPRO
Power and ground (20 lines)
Multiprocessors Interprocessor
Arbitration
INTERPROCESSOR ARBITRATION STATIC ARBITRATION
Serial Arbitration Procedure

Highest priority
arbiter To next
arbiter 1 arbiter 2 arbiter13 PIarbiter
Bus PO 4 PI Bus PO PI Bus PO PI Bus PO
Bus busy line
Parallel Arbitration Procedure

Bus Bus Bus Bus arbiter 1 arbiter 2 arbiter 3 arbiter 4
Ack Req Ack Req Ack Req Ack Req
Bus busy line
4x2
Priority encoder
2x4
Decoder
Multiprocessors
INTERPROCESSOR ARBITRATION DYNAMIC ARBITRATION
Priorities of the units can be dynamically changeable while the system is in operation
Time Slice
Fixed length time slice is given sequentially to each processor, round-robin
fashion
Polling
Unit address polling - Bus controller advances the address to identify the requesting unit. When processor
that requires the access recognizes its address, it activates the bus busy line and then accesses the bus.
After a number of bus cycles, the polling continues by choosing a different processor.
LRU
The least recently used algorithm gives the highest priority to the requesting device that has not used bus
for the longest interval.
FIFO
The first come first serve scheme requests are served in the order received. The bus controller here
maintains a queue data structure.
Rotating Daisy Chain
Conventional Daisy Chain - Highest priority to the nearest unit to the bus
controller
Rotating Daisy Chain The PO output of the last device is connected to the PI of the first one. Highest
priority to the unit that is nearest to the unit that has most recently accessed the bus(it becomes the bus
controller)
Multiprocessors
INTERPROCESSOR COMMUNICATION
Interprocessor Communication Shared Memory
Receivin
g Processor
Sendin Communication Area
Processor g Mark
Receiver(s) Receivin
g Processor
Message
. .
Receiving
Processor
Interrupt
Receivin
Shared Memory g
Communication Area
Process Processor
or Sending
Receiver(s) Receivin
Instruction Mark g
Message
Processor.
.
Receiving
Processor
Multiprocessors
INTERPROCESSOR SYNCHRONIZATION
Synchronization
Communication of control information between processors
- To enforce the correct sequence of processes
- To ensure mutually exclusive access to shared writable data
Hardware Implementation
Mutual Exclusion with a Semaphore

Mutual Exclusion
- One processor to exclude or lock out access to shared resource by
other processors when it is in a Critical Section
- Critical Section is a program sequence that, once begun, must complete execution before
another processor accesses the same shared resource
Semaphore
- A binary variable
- 1: A processor is executing a critical section, that not available to other processors
0: Available to any requesting processor
- Software controlled Flag that is stored in memory that all processors can be access
Multiprocessors
SEMAPHORE
Testing and Setting the Semaphore
- Avoid two or more processors test or set the same semaphore

- May cause two or more processors enter the same critical section at the same time
- Must be implemented with an indivisible operation
R <- M[SEM] / Test semaphore / M[SEM] <- 1 / Set semaphore /
These are being done while locked, so that other processors cannot test and set while current
processor is being executing these instructions
If R=1, another processor is executing the critical section, the processor executed this
instruction does not access the shared memory
If R=0, available for access, set the semaphore to 1 and access
The last instruction in the program must clear the semaphore

CACHE COHERENCE
Caches are Coherent X = 52 Main memory
Bus
X = 52 X = 52 X = 52 Caches
P1 P2 P3 Processors
Cache Incoherency in X = 120 Main memory

Write Through Policy
Bus
X = 120 X = 52 X = 52 Caches
P1 P2 P3 Processors
Cache Incoherency in Write Back Policy X = 52 Main memory
Bus
X = 120 X = 52 X = 52 Caches
P1 P2 P3 Processors
Computer Computer Architectures

MAINTAINING CACHE COHERENCY
Shared Cache
- Disallow private cache
- Access time delay
Software Approaches
* Read-Only Data are Cacheable
- Private Cache is for Read-Only data
- Shared Writable Data are not cacheable
- Compiler tags data as cacheable and noncacheable
- Degrade performance due to software overhead
* Centralized Global Table

- Status of each memory block is maintained in CGT: RO(Read-Only); RW(Read and Write)
- All caches can have copies of RO blocks
- Only one cache can have a copy of RW block
Hardware Approaches
* Snoopy Cache Controller
- Cache Controllers monitor all the bus requests from CPUs and IOPs
- All caches attached to the bus monitor the write operations
- When a word in a cache is written, memory is also updated (write through)
- Local snoopy controllers in all other caches check their memory to determine if they have
a copy of that word; If they have, that location is marked invalid(future reference to this
location causes cache miss)

Ppts Final

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Ppts Final

Enviado por

Direitos autorais:

Formatos disponíveis

INSTITUTE OF AERONAUTICAL ENGINEERING

COMPUTER SCIENCE AND ENGINEERING

Sub: Computer Organization and Architecture

Generic computer Organization

In the computer all the all the major components

Fig 1.3: CPU Organization

Internal organization of the memory chips:

RAM stands for Random access memory. This often referred to

The I/O subsystem is treated as an independent

Computer programming languages are divided into 3

A memory-reference instruction has an address part of 12

Register Transfer Language

Bus and Memory Transfers

Arithmetic Logic Shift Unit

Register R1 Showing individual bits

A block diagram of a register

Other ways of drawing the block diagram of a register:

Partitioned into two parts

Representation of a (conditional) transfer

Basic Symbols for Register Transfers

Register D Register C Register B Register A

4-Line Common Bus

Normal input A Output B

Bus line with three-state buffer

4-bit binary adder (connection of

Overflow detector for signed numbers

4-bit Binary Incrementer

41 MUX 41 MUX 41 MUX 41 MUX

4-bit Arithmetic Circuit

Example: 1001102 10101102 = 11101102

Example: 1001102 10101102 = 00001102

Example: 10101102 = 01010012

Example: 1001102 10101102 = 11100002

Example: 01002 10002 = 11002

Example: 00012 10002 = 10012

Loaded into a register from

Example: suppose R1 = 0110 1010, and we desire to

Example: 1001102 10101102 = 11110012

Example: 1001102 10101102 = 00010012

3 This is for one bit i

Serial Input r2 Serial Output

Determines Shift Right

Serial Output Serial Input

Logical Shift Left: R2shl R2

Logical Shift Left

Circular Shift Left: R2cil R2

Circular Shift Left

Sign Bit Arithmetic Shift Right

S 1 0 S 1 0 S 1 0 S 1 0 0 for shift right

4-bit Combinational Circuit Shifter

Data-path module comprises processing logic and

Register Transfer Language (RTL): used to describe CPU

Word size is 16 bits

When the I (indirect) bit is

Serial transfer is used to

CPU typically provides addition, subtraction, increment, and

Move the information in a register by one bit position

Example Design and

Micro-operation RTL Expression X2X1X0

Table: Micro-operation Control Signal Definitions

Timing and Control

Memory Reference Instructions

Input-Output and Interrupt

Complete Computer Description