Você está na página 1de 223

INSTITUTE OF AERONAUTICAL ENGINEERING

(Autonomous)
Dundigal, Hyderabad -500 043

COMPUTER SCIENCE AND ENGINEERING

Sub: Computer Organization and Architecture

UNIT-1
BASIC COMPUTER ORGANIZATION

CPU
Memory subsystem
I/O subsystem

Generic computer Organization


1.1.1 System bus:
Physically the bus a set of wires. The components
of a computer are connected to the buses
The system has three buses
Address bus
Data bus
Control bus
The uppermost bus in this figure is the address
bus
Data is transferred via the data bus
Control bus carries the control signal
1.1.2 Instruction cycles:
First the processor fetches or reads the instruction
from memory. Then it decodes the instruction
determining which instruction it has fetched.
Finally, it performs the operations necessary to
execute the instruction.
After fetching it decodes the instruction and
controls the execution procedure. It performs
some Operation internally, and supplies the
address, data & control signals needed by
memory & I/O devices to execute the instruction
Below figure shows the memory read and memory write operations

Fig 1.2: Timing diagram for memory read and memory write operations
CPU ORGANIZATION
Central processing unit (CPU) is the electronic
circuitry within a computer that carries out the
instructions of a computer program by performing
the basic arithmetic, logical, control and
input/output (I/O) operations specified by the
instructions.

In the computer all the all the major components


are connected with the help of the system bus.
Data bus is used to shuffle data between the
various components in a computer system
Internally, CPU has three sections as shown in the fig below

Fig 1.3: CPU Organization


MEMORY SUBSYSTEM ORGANIZATION AND INTERFACING:

Internal organization of the memory chips:


Memory is usually organized in the form of arrays, in which each
cell is capable of storing one bit information.
A possible organization is stored in the fig below
Each row of cell constitutes a memory word, and all cells of a row
are connected to a common column called word line, which is
driven by the address decoder on the chip.
The cells in each column are connected to sense/write circuit by two
bit lines.
The sense /write circuits are connected to the data input/output lines
of the chip.
During read operation these circuits sense or read the information
stored in cells selected by a word line and transmit the information
to the output lines.
During write operation the sense/write circuit receives the input
information and store in the cell of selected word.
Types of Memory
There are two types of memory chips
1. Read Only Memory (ROM)
2. Random Access Memory (RAM)
Masked ROM(or) simply ROM
PROM(Programmed Read Only Memory)
EPROM(Electrically Programmed Read Only
Memory)
EEPROM(Electrically Erasable PROM)
Flash Memory
RAM Chips:

RAM stands for Random access memory. This often referred to


as read/write memory. Unlike the ROM it initially contains no
data.
The digital circuit in which it is used stores data at various
locations in the RAM are retrieves data from these locations.
The data pins are bidirectional unlike in ROM.
A ROM chip loses its data once power is removed so it is a
volatile memory.
RAM chips are differentiated based on the data they maintain.
Dynamic RAM (DRAM)
Static RAM (SRAM)
Memory subsystem configuration

Fig 1.4 A 164 memory subsystem constructed from two 82 ROM chips with lower order interleaving
Multi byte Organization
There are two commonly used organizations for multi
byte data.
Big endian
Little endian
In BIG-ENDIAN systems the most significant byte of a
multi-byte data item always has the lowest address,
while the least significant byte has the highest address.
In LITTLE-ENDIAN systems, the least significant byte
of a multi-byte data item always has the lowest address,
while the most significant byte has the highest address
I/O SUBSYSTEM ORGANIZATION AND INTERFACING

The I/O subsystem is treated as an independent


unit in the computer The CPU initiates I/O
commands generically
Read, write, scan, etc.
This simplifies the CPU
INPUT DEVICE:
The generic interface circuitry for an input
device such as keyboard and also enable logic
for tri state buffer is shown in the figure below.

Fig 1.5: (a) with its interface and (b) the enable logic for the tri-state buffers
OUTPUT DEVICE
The design of the interface circuitry for an output
device such as a computer monitor is somewhat
different than for the input device.
The design of the interface circuitry for an output
device, such as a computer monitor, is somewhat
different than that for the input device. Tri-state buffers
are replaced by a register.
The tri-state buffers are used in input device interfaces
to make sure that one device writes data to the bus at
any time.
Since the output devices read from the bus, rather that
writes data to it, they dont need
the buffers.
An output device: (a) with its interface and (b) the enable logic for the registers
Fig: A bidirectional I/O device with its interface and enable/load logic
LEVELS OF PROGRAMMING LANGUAGES

Computer programming languages are divided into 3


categories.
High level language
Assembly level language
Machine level language
High level languages are platform independent that is these
programs can run on computers with different
microprocessor and operating systems without
modifications. Languages such as C++, Java and
FORTRAN are high level languages.
Assembly languages are at much lower level of abstraction.
Each processor has its own assembly language
Levels of programming languages is shown in the figure
below
ASSEMBLY LANGUAGE INSTRUCTIONS:

A memory-reference instruction has an address part of 12


bits. The address part is denoted by three xs and stand for
the three hexadecimal digits corresponding to the 12-bit
address. The last bit of the instruction is designated by the
symbol I. When I = 0, the last four bits of an instruction
have a hexadecimal digit equivalent from 0 to 6 since the
last bit is 0. When I = 1, the hexadecimal digit equivalent of
the last four bits of the instruction ranges from 8 to E since
the last bit is I.
Register-reference instructions use 16 bits to specify an
operation. The leftmost four bits are always 0111, which is
equivalent to hexadecimal 7. The other three hexadecimal
digits give the binary equivalent of the remaining 12 bits.
The input-output instructions also use all 16 bits to specify
an operation. The last four bits are always 1111, equivalent
to hexadecimal F.
A RELATIVELY SIMPLE INSTRUCTION SET ARCHITECTURE:
A relatively simple computer: CPU details only
UNIT-2

Register Transfer
and
Micro operations

24
CONTENTS

Register Transfer Language

Register Transfer

Bus and Memory Transfers

Arithmetic Microoperations

Logic Microoperations

Shift Microoperations

Arithmetic Logic Shift Unit

25
Register Transfer Language (RTL)
Digital System: An interconnection of hardware
modules that do a certain task on the information.
Registers + Operations performed on the data stored
in them = Digital Module
Modules are interconnected with common data and
control paths to form a digital computer system

26
Register Transfer Language cont.
Microoperations: operations executed on data stored
in one or more registers.
For any function of the computer, a sequence of
microoperations is used to describe it
The result of the operation may be:
replace the previous binary information of a
register or
transferred to another register
Shift Right Operation
101101110011 010110111001

27
Register Transfer Language cont.
The internal hardware organization of a digital
computer is defined by specifying:
The set of registers it contains and their function
The sequence of microoperations performed on the
binary information stored in the registers
The control that initiates the sequence of
microoperations
Registers + Microoperations Hardware + Control
Functions = Digital Computer

28
Register Transfer Language cont.
Register Transfer Language (RTL) : a symbolic
notation to describe the microoperation transfers
among registers
Next steps:
Define symbols for various types of microoperations,
Describe the hardware that implements these
microoperations

29
Register Transfer (our first microoperation)
Computer registers are designated by capital
letters (sometimes followed by numerals) to
denote the function of the register
R1: processor register
MAR: Memory Address Register (holds an address for a
memory unit)
PC: Program Counter
IR: Instruction Register
SR: Status Register

30
4-2 Register Transfer cont.
The individual flip-flops in an n-bit register are
numbered in sequence from 0 to n-1 (from
the right position toward the left position)

R1 7 6 5 4 3 2 1 0

Register R1 Showing individual bits

A block diagram of a register

31
Register Transfer cont.

Other ways of drawing the block diagram of a register:

15 0
PC

Numbering of bits

15 87 0
Upper byte PC(H) PC(L) Lower byte

Partitioned into two parts

32
Register Transfer cont.
Information transfer from one register to another is described
by a replacement operator: R2 R1
This statement denotes a transfer of the content of register R1
into register R2
The transfer happens in one clock cycle
The content of the R1 (source) does not change
The content of the R2 (destination) will be lost and replaced
by the new data transferred from R1
We are assuming that the circuits are available from the
outputs of the source register to the inputs of the destination
register, and that the destination register has a parallel load
capability

33
Register Transfer cont.
Conditional transfer occurs only under a
control condition

Representation of a (conditional) transfer


P: R2 R1
A binary condition (P equals to 0 or 1)
determines when the transfer occurs
The content of R1 is transferred into R2 only if
P is 1

34
Register Transfer cont.
Hardware implementation of a controlled transfer: P: R2 R1
Block diagram: Control P Load
R2 Clock
Circuit

R1

t t+1

Timing diagram
Clock
Synchronized
Load
with the clock
Transfer occurs here

35
Register Transfer cont.

Basic Symbols for Register Transfers


Symbol Description Examples
Letters & Denotes a register MAR, R2
numerals
Parenthesis ( ) Denotes a part of a R2(0-7), R2(L)
register
Arrow Denotes transfer of R2 R1
information
Comma , Separates two R2 R1, R1 R2
microoperations
36
Bus and Memory Transfers
Paths must be provided to transfer information from
one register to another
A Common Bus System is a scheme for transferring
information between registers in a multiple-register
configuration
A bus: set of common lines, one for each bit of a
register, through which binary information is
transferred one at a time
Control signals determine which register is selected
by the bus during each particular register transfer

37
Bus and Memory Transfers
Register A Register B Register C Register D

Bus lines

Register D Register C Register B Register A


3 2 1 0 3 2 1 0 3 2 1 0 3 2 1 0

D3 D 2 D 1 D 0 C3 C2 C1 C0 B3 B2 B1 B0 A3 A2 A1 A0

D3 C3 B3 A3 D2 C2 B2 A2 D1 C1 B1 A1 D0 C0 B0 A0

3 2 1 0 3 2 1 0 3 2 1 0
3 2 1 0 S0
S0 S0 S0
MUX3 MUX2 MUX1 MUX0 S1
S1 S1 S1

4-Line Common Bus


38
Bus and Memory Transfers
The transfer of information from a bus into one of
many destination registers is done:
By connecting the bus lines to the inputs of all destination
registers and then:
activating the load control of the particular destination
register selected
We write: R2 C to symbolize that the content of
register C is loaded into the register R2 using the
common system bus
It is equivalent to: BUS C, (select C)
R2 BUS (Load R2)

39
Bus and Memory Transfers: Three-State
Bus Buffers
A bus system can be constructed with three-
state buffer gates instead of multiplexers
A three-state buffer is a digital circuit that
exhibits three states: logic-0, logic-1, and high-
impedance (Hi-Z)
Control input C

Normal input A Output B

Three-State Buffer
40
Bus and Memory Transfers: Three-State
Bus Buffers cont.

C=1

Buffer
A B A B

C=0

Open Circuit
A B A B

41
Bus and Memory Transfers: Three-State
Bus Buffers cont.
S1 0
Select
S0 1
Bus line for bit 0
24 A0
Decoder 2
Enable E
3

B0

C0

Bus line with three-state buffer


(replaces MUX0 in the previous
diagram) D0

42
Bus and Memory Transfers: Memory
Transfer
Memory read : Transfer from memory
Memory write : Transfer to memory
Data being read or wrote is called a memory word
(called M)- (refer to section 2-7)
It is necessary to specify the address of M when
writing /reading memory
This is done by enclosing the address in square
brackets following the letter M
Example: M[0016] : the memory contents at address
0x0016

43
Bus and Memory Transfers: Memory
Transfer cont.
Assume that the address of a memory unit is
stored in a register called the Address Register
AR
Lets represent a Data Register with DR, then:
Read: DR M[AR]
Write: M[AR] DR

44
Bus and Memory Transfers: Memory
Transfer cont.
AR
x0C 19
x12 x0E 34
R1 x10 45
100 x12 66
x14 0
x16 13
R1M[AR] x18 22

RAM

R1 R1
100 66

45
Arithmetic Microoperations
The microoperations most often encountered
in digital computers are classified into four
categories:
Register transfer microoperations
Arithmetic microoperations (on numeric data
stored in the registers)
Logic microoperations (bit manipulations on non-
numeric data)
Shift microoperations
46
Arithmetic Microoperations cont.
The basic arithmetic microoperations are:
addition, subtraction, increment, decrement,
and shift
Addition Microoperation:
R3 R1+R2
Subtraction Microoperation:
R3 R1-R2 or : 1s complement

R3 R1+R2+1
47
Arithmetic Microoperations cont.
Ones Complement Microoperation:
R2 R2
Twos Complement Microoperation:
R2 R2+1
Increment Microoperation:
R2 R2+1
Decrement Microoperation:
R2 R2-1

48
Half Adder/Full Adder
Half Adder x y c s x
0 0 0 0 c = xy s = xy + xy c
=x y y
0 1 0 1
1 0 0 1 s
1 1 1 0
Full Adder
y y
x y cn-1 cn s
0 0 0 0 0 0 0 0 1
0 0 1 0 1 0 1 c 1 0 cn-1
n-1
0 1 0 0 1 x 1 1 x 0 1
0 1 1 1 0 0 1 1 0
1 0 0 0 1 cn s
1 0 1 1 0
1 1 0 1 0 cn = xy + xcn-1+ ycn-1
1 1 1 1 1 = xy + (x y)cn-1

x s = xycn-1+xycn-1+xycn-1+xycn-1
y = x y cn-1 = (x y) cn-1
S
cn-1
cn
49
Arithmetic Micro operations Binary Adder

B3 A3 B2 A2 B1 A1 B0 A0

C3 C2 C1
FA FA FA FA C0

C4 S3 S2 S1 S0

4-bit binary adder (connection of


FAs)

50
Arithmetic Microoperations Binary Adder-
Subtractor
B3 A3 B2 A2 B1 A1 B0 A0

C3 C2 C1 C0
FA FA FA FA

C4 S3 S2 S1 S0

4-bit adder-subtractor

51
Arithmetic Microoperations Binary Adder-
Subtractor
For unsigned numbers, this gives A B if AB or the 2s complement of (B A) if A
<B
(example: 3 5 = -2= 1110)
For signed numbers, the result is A B provided that there is no overflow.
(example : -3 5= -8) 1101
1011 +

1000

C3 1, if overflow
V=
C4 0, if no overflow

Overflow detector for signed numbers

52
Arithmetic Microoperations Binary Adder-
Subtractor cont.
What is the range of unsigned numbers that
can be represented in 4 bits?
What is the range of signed numbers that can
be represented in 4 bits?
Repeat for n-bit?!

53
Arithmetic Microoperations Binary
Incrementer
A3 A2 A1 A0 1

x y x y x y x y

HA HA HA HA

C S C S C S C S

C4 S3 S2 S1 S0

4-bit Binary Incrementer

54
Arithmetic Microoperations Binary
Incrementer
Binary Incrementer can also be implemented
using a counter
A binary decrementer can be implemented by
adding 1111 to the desired register each time!

55
Arithmetic Microoperations Arithmetic
Circuit
This circuit performs seven distinct arithmetic
operations and the basic component of it is
the parallel adder
The output of the binary adder is calculated
from the following arithmetic sum:
D = A + Y + Cin

56
Arithmetic Microoperations Arithmetic
Circuit cont.
A3 A2 A1 A0
1 0 B3 B3 S1 S0 1 0 B2 B2 S1 S0 1 0 B1 B1 S1 S0 1 0 B0 B0 S1 S0

3 2 1 0 S1 S0 3 2 1 0 S1 S0 3 2 1 0 S1 S0 3 2 1 0 S1 S0

41 MUX 41 MUX 41 MUX 41 MUX


Figure A

Y3 X3 Y2 X2 Y1 X1 Y0 X0
C3 C2 C1
FA FA FA FA Cin

Cout D3 D2 D1 D0

4-bit Arithmetic Circuit

57
Logic Microoperations
The four basic microoperations
OR Microoperation
Symbol: , +

Gate:

Example: 1001102 10101102 = 11101102


OR OR
P+Q: R1R2+R3, R4R5 R6
ADD 58
Logic Microoperations
The four basic microoperations cont.
AND Microoperation
Symbol:

Gate:

Example: 1001102 10101102 = 00001102

59
Logic Microoperations
The four basic microoperations cont.
Complement (NOT) Microoperation
Symbol:

Gate:

Example: 10101102 = 01010012

60
Logic Microoperations
The four basic microoperations cont.
XOR (Exclusive-OR) Microoperation
Symbol:

Gate:

Example: 1001102 10101102 = 11100002

61
Logic Microoperations
Other Logic Microoperations
Selective-set Operation
Used to force selected bits of a register into
logic-1 by using the OR operation

Example: 01002 10002 = 11002


Loaded into a register from
In a processor register
memory to perform the selective-
set operation

62
Logic Microoperations
Other Logic Microoperations cont.
Selective-complement (toggling) Operation
Used to force selected bits of a register to be
complemented by using the XOR operation

Example: 00012 10002 = 10012

Loaded into a register from


In a processor register
memory to perform the selective-
complement operation

63
Logic Microoperations
Other Logic Microoperations cont.
Insert Operation
Step1: mask the desired bits
Step2: OR them with the desired value

Example: suppose R1 = 0110 1010, and we desire to


replace the leftmost 4 bits (0110) with 1001 then:
Step1: 0110 1010 0000 1111
Step2: 0000 1010 1001 0000
R1 = 1001 1010

64
Logic Microoperations
Other Logic Microoperations cont.
NAND Microoperation

Symbols: and

Gate:

Example: 1001102 10101102 = 11110012

65
Logic Microoperations
Other Logic Microoperations cont.
NOR Microoperation

Symbols: and

Gate:

Example: 1001102 10101102 = 00010012

66
Logic Microoperations
Other Logic Microoperations cont.
Set (Preset) Microoperation
Force all bits into 1s by ORing them with a value in
which all its bits are being assigned to logic-1
Example: 1001102 1111112 = 1111112
Clear (Reset) Microoperation
Force all bits into 0s by ANDing them with a value in
which all its bits are being assigned to logic-0
Example: 1001102 0000002 = 0000002

67
Logic Microoperations
Hardware Implementation
The hardware implementation of logic
microoperations requires that logic gates be
inserted for each bit or pair of bits in the
registers to perform the required logic
function
Most computers use only four (AND, OR, XOR,
and NOT) from which all others can be
derived.

68
Logic Microoperations
Hardware Implementation cont.
S1
41 Operatio
S0
MUX S1 S0 Output n
Ai
0 0 E=AB XOR
Bi
0
0 1 E=AB OR

1 0 E=AB AND
1 Ei
1 1 E=A Complem
ent

3 This is for one bit i

Figure B
69
Shift Microoperations
Used for serial transfer of data
Also used in conjunction with arithmetic, logic, and
other data-processing operations
The contents of the register can be shifted to the left
or to the right
As being shifted, the first flip-flop receives its binary
information from the serial input
Three types of shift: Logical, Circular, and Arithmetic

70
Shift Microoperations cont.

Serial Input r2 Serial Output


rn-1 r3 r1 r0

Determines Shift Right


the shift
type

Serial Output Serial Input


rn-1 r3 r2 r1 r0

Shift Left

**Note that the bit ri is the bit at position (i) of the register

71
Shift Microoperations:
Logical Shifts
Transfers 0 through the serial input
Logical Shift Right: R1shr R1
The same

Logical Shift Left: R2shl R2


The same

? rn-1 r3 r2 r1 r0 0

Logical Shift Left

72
Shift Microoperations:
Circular Shifts (Rotate Operation)
Circulates the bits of the register around the
two ends without loss of information
Circular Shift Right: R1cir R1
The same

Circular Shift Left: R2cil R2


The same

rn-1 r3 r2 r1 r0

Circular Shift Left

73
Shift Microoperations
Arithmetic Shifts
Shifts a signed binary number to the left or right
An arithmetic shift-left multiplies a signed binary
number by 2: ashl (00100): 01000
An arithmetic shift-right divides the number by 2
ashr (00100) : 00010
An overflow may occur in arithmetic shift-left, and
occurs when the sign bit is changed (sign reversal)

74
Shift Microoperations
Arithmetic Shifts cont.

rn-1 r3 r2 r1 r0
?

Sign Bit Arithmetic Shift Right

? rn-1 r3 r2 r1 r0 0
Sign Bit
Arithmetic Shift Left

75
Shift Microoperations
Arithmetic Shifts cont.
An overflow flip-flop Vs can be used to detect
an arithmetic shift-left overflow

Vs = Rn-1 Rn-2

Rn-1 1 overflow
Vs=
Rn-2 0 no overflow

76
Shift Microoperations cont.
Example: Assume R1=11001110, then:
Arithmetic shift right once : R1 = 11100111
Arithmetic shift right twice : R1 = 11110011
Arithmetic shift left once : R1 = 10011100
Arithmetic shift left twice : R1 = 00111000
Logical shift right once : R1 = 01100111
Logical shift left once : R1 = 10011100
Circular shift right once : R1 = 01100111
Circular shift left once : R1 = 10011101

77
Shift Microoperations
Hardware Implementation cont.
A possible choice for a shift unit would be a
bidirectional shift register with parallel load
(refer to Fig 2-9). Has drawbacks:
Needs two pulses (the clock and the shift signal
pulse)
Not efficient in a processor unit where multiple
number of registers share a common bus
It is more efficient to implement the shift
operation with a combinational circuit

78
Shift Microoperations
Hardware Implementation cont.
Serial Input IR Serial Input IL
A3 A2 A1 A0

Select

S 1 0 S 1 0 S 1 0 S 1 0 0 for shift right


1 for shift left
MUX MUX MUX MUX

H3 H2 H1 H0

4-bit Combinational Circuit Shifter

79
Arithmetic Logic Shift Unit
Instead of having individual registers
performing the microoperations directly,
computer systems employ a number of
storage registers connected to a common
operational unit called an Arithmetic Logic
Unit (ALU)

80
Arithmetic Logic Shift Unit cont.
S3
S2
S1 Ci
S0

One stage of Di
arithmetic
circuit (Fig.A)
Select
One stage of Fi
ALU Ci+1 0 41
1 MUX
One stage of Ei 2
logic circuit
Bi (Fig.B) 3
Ai
shr
Ai+1
shl
Ai-1

81
Basic Definitions
Digital system is a collection of digital hardware modules
Modules are registers, counters, arithmetic elements, etc connected
via:
- data paths routes on which information is moved
- control paths routes on which control signals are
moved
Micro operations (micro-ops) are operations on data stored in
registers
Digital modules (often just called registers) are defined by their
information contents and the set of micro-ops they perform
Register transfer language is a concise and precise means of
describing those operations
Data-paths and Control units

Data-path module comprises processing logic and


collection of registers that perform data processing
Control unit module is made up of logic that determines
the sequence of data processing operations carried out in
the data-path
Register Transfer Operations

Registers: denoted by
upper case letters, and
optionally followed by
digits or letters
Register transfer
operations: the movement
of data stored in registers
and the processing
performed on the data
What is Register Transfer Language?

Register Transfer Language (RTL): used to describe CPU


organization in high-level terms
RTL expressions are made up of elements which describe
the registers being manipulated, and the micro-ops being
performed on them
Here are the basic components of RTL expressions:
Instruction Representation

Word size is 16 bits


12 bits to represent a memory address
3-bit opcode
1 bit to distinguish between direct and indirect memory addressing
Instruction Representation
(cont.)

When the I (indirect) bit is


0, the value in AD is the
actual address of the
operand (direct
addressing)
When I is 1, contains the
address of an indirect
word, which in turn will
contain the actual operand
address (indirect
addressing)
Register Structure
Common Micro-Ops
There are 4 types of Micro-Ops:
Transfer: transfers data from one register to another
R0 <- R1
Arithmetic: performs arithmetic on data in registers
R0 <- R1 + R2
Logic/bit manipulation: performs bit (Boolean) operations on data
R0 <- R1 & R2 ; or R0 <- R1 | R2
Shift: shift data in registers by one or more bit positions
R0 <- R1 << 3; or R0 <- R2 >> 2
Micro-Ops Transfer
Parallel
Parallel transfer is typically
used for transfers between
registers
Ex: Transfer all contents of A
into B on one clock pulse
A <- B
Control function: we can do
this by structuring the RTL
expression to indicate the
controlling condition
Ex: P: A<- B
Micro-Ops Transfer
Serial

Serial transfer is used to


specify that a collection
of bits are to be moved,
but that the transfer is
to occur one bit at a
time
Ex:
S: A <- B, B <-B
Micro-Ops Transfer
Bus
A bus consists of a set of parallel data lines
To transfer data using a bus: connect the output of the
source register to the bus; connect the input of the
target register to the bus; when the clock pulse arrives,
the transfer occurs
Micro-Ops Transfer
Memory
Memory transfers are similar to register transfers, but
Memory to register transfers are called read operations,
while register to memory transfers are called write
operations
RTL expressions for a read operation, assuming the use of an
address registers:
AR <- address
DR <- M[AR]
RTL expressions for a write operation, assuming use of a data
register:
AR <- address
DR <- value
M[AR] <- DR
Micro-Ops Arithmetic & Logic

CPU typically provides addition, subtraction, increment, and


decrement operations in its ALU (arithmetic-logic unit).
Logic micro-ops are like arithmetic, but treat each bit of the
register(s) separately
Applications of Logic Micro-ops
How are logic operations useful?
- can be used to change bit values
- delete a group of bits
- insert new bits into a register
Micro-Ops Shift

Move the information in a register by one bit position


Shifts come in three varieties:
- Logical
- Arithmetic
- Circular
Using RTL to specify Digital System
Specification of Digital Components
D flip-flop
Specification and Implementation of simple system: complete
design of the system to implement the RTL code using,
Direct connection
Bus and Tri-state buffers
Bus and Multiplexer
Data-path Design

Example Design and


Operation

Micro-operation RTL Expression X2X1X0


Load A B 010
Add A B+A 000
Subtract A B-A 101
Increment A B+1 110
Decrement A B-1 011

Table: Micro-operation Control Signal Definitions


More Complex Digital System & RTL
There are two complex Digital System and RTL:

Module 6 Counter
Toll Booth Controller
UNIT -3
BASIC COMPUTER ORGANIZATION AND DESIGN

Instruction Codes

Computer Registers

Computer Instructions

Timing and Control

Instruction Cycle

Memory Reference Instructions

Input-Output and Interrupt

Complete Computer Description

Design of Basic Computer

Design of Accumulator Logic


INTRODUCTION

Every different processor type has its own design (different registers,
buses, microoperations, machine instructions, etc)
Modern processor is a very complex device
It contains
Many registers
Multiple arithmetic units, for both integer and floating point calculations
The ability to pipeline several consecutive instructions to speed execution
Etc.
However, to understand how processors work, we will start with a
simplified processor model
This is similar to what real processors were like ~25 years ago
M. Morris Mano introduces a simple processor model he calls the Basic
Computer
We will use this to introduce processor organization and the relationship
of the RTL model to the higher level computer processor
THE BASIC COMPUTER

The Basic Computer has two components, a processor and memory


The memory has 4096 words in it
4096 = 212, so it takes 12 bits to select a word in memory
Each word is 16 bits long

CPU RAM
0

15 0

4095
INSTRUCTIONS

Program
A sequence of (machine) instructions
(Machine) Instruction
A group of bits that tell the computer to perform a specific operation (a sequence
of micro-operation)
The instructions of a program, along with any needed data are stored
in memory
The CPU reads the next instruction from memory
It is placed in an Instruction Register (IR)
Control circuitry in control unit then translates the instruction into
the sequence of microoperations necessary to implement it
INSTRUCTION FORMAT
A computer instruction is often divided into two parts
An opcode (Operation Code) that specifies the operation for that instruction
An address that specifies the registers and/or locations in memory to use for that
operation
In the Basic Computer, since the memory contains 4096 (= 212) words,
we needs 12 bit to specify which memory address this instruction
will use
In the Basic Computer, bit 15 of the instruction specifies the
addressing mode (0: direct addressing, 1: indirect addressing)
Since the memory words, and hence the instructions, are 16 bits long,
that leaves 3 bits for the instructions opcode

Instruction Format

15 14 12 11 0
I Opcode Address
Addressing
mode
ADDRESSING MODES
The address field of an instruction can represent either
Direct address: the address in memory of the data to use (the address of the operand), or
Indirect address: the address in memory of the address in memory of the data to use

Direct addressing Indirect addressing

22 0 ADD 457 35 1 ADD 300

300 1350
457 Operand
1350 Operand

+ +
Effective Address (EA)
AC modification to access an operand
The address, that can be directly used without AC for a
computation-type instruction, or as the target address for a branch-type instruction
PROCESSOR REGISTERS

A processor has many registers to hold instructions, addresses, data, etc


The processor has a register, the Program Counter (PC) that holds the
memory address of the next instruction to get
Since the memory in the Basic Computer only has 4096 locations, the PC only needs 12
bits
In a direct or indirect addressing, the processor needs to keep track of
what locations in memory it is addressing: The Address Register (AR) is
used for this
The AR is a 12 bit register in the Basic Computer
When an operand is found, using either direct or indirect addressing, it is
placed in the Data Register (DR). The processor then uses this value as
data for its operation
The Basic Computer has a single general purpose register the
Accumulator (AC)
PROCESSOR REGISTERS

The significance of a general purpose register is that it can be referred to in


instructions
e.g. load AC with the contents of a specific memory location; store the contents of AC into
a specified memory location
Often a processor will need a scratch register to store intermediate results
or other temporary data; in the Basic Computer this is the Temporary
Register (TR)
The Basic Computer uses a very simple model of input/output (I/O)
operations
Input devices are considered to send 8 bits of character data to the processor
The processor can send 8 bits of character data to output devices
The Input Register (INPR) holds an 8 bit character gotten from an input
device
The Output Register (OUTR) holds an 8 bit character to be send to an output
device
BASIC COMPUTER REGISTERS
Registers in the Basic Computer

11 0
PC
Memory
11 0 4096 x 16
AR

15 0
IR CPU
15 0 15 0
TR DR

7 0 7 0 15 0
OUTR INPR AC

List of BC Registers
DR 16 Data Register Holds memory operand
AR 12 Address Register Holds address for memory
AC 16 Accumulator Processor register
IR 16 Instruction Register Holds instruction code
PC 12 Program Counter Holds address of instruction
TR 16 Temporary Register Holds temporary data
INPR 8 Input Register Holds input character
OUTR 8 Output Register Holds output character
Registers

COMMON BUS SYSTEM

The registers in the Basic Computer are connected using a bus


This gives a savings in circuitry over complete connections between
registers
COMMON BUS SYSTEM
S2
S1 Bus
S0
Memory unit 7
4096 x 16
Address
Write Read
AR 1
LD INR CLR
PC 2
LD INR CLR
DR 3
LD INR CLR
E
ALU AC 4
LD INR CLR
INPR
IR 5
LD
TR 6
LD INR CLR
OUTR
Cloc
LD
16-bit common bus k
Registers

COMMON BUS SYSTEM

Read INPR
Memory Write
4096 x 16 E ALU
Address

AC

L I C
L I C L
DR IR
L I C L I C
PC TR

AR OUTR LD
L I C
7 1 2 3 4 5 6
16-bit Common Bus
S0 S1 S2
COMMON BUS SYSTEM

Three control lines, S2, S1, and S0 control which register the bus
selects as its input
S2 S1 S 0 Register
0 0 0 x
0 0 1 AR
0 1 0 PC
0 1 1 DR
1 0 0 AC
1 0 1 IR
1 1 0 TR
1 1 1 Memory

Either one of the registers will have its load signal activated, or the
memory will have its read signal activated
Will determine where the data from the bus gets loaded
The 12-bit registers, AR and PC, have 0s loaded onto the bus in the
high order 4 bit positions
When the 8-bit register OUTR is loaded from the bus, the data
comes from the low order 8 bits on the bus
Instructions

BASIC COMPUTER INSTRUCTIONS

Basic Computer Instruction Format

Memory-Reference Instructions (OP-code = 000 ~ 110)


15 14 12 11 0
I Opcode Address

Register-Reference Instructions (OP-code = 111, I = 0)


15 12 11 0
0 1 1 1 Register operation

Input-Output Instructions (OP-code =111, I = 1)


15 12 11 0
1 1 1 1 I/O operation
BASIC COMPUTER INSTRUCTIONS
Hex Code
Symbol I = 0 I=1 Description
AND 0xxx 8xxx AND memory word to AC
ADD 1xxx 9xxx Add memory word to AC
LDA 2xxx Axxx Load AC from memory
STA 3xxx Bxxx Store content of AC into memory
BUN 4xxx Cxxx Branch unconditionally
BSA 5xxx Dxxx Branch and save return address
ISZ 6xxx Exxx Increment and skip if zero

CLA 7800 Clear AC


CLE 7400 Clear E
CMA 7200 Complement AC
CME 7100 Complement E
CIR 7080 Circulate right AC and E
CIL 7040 Circulate left AC and E
INC 7020 Increment AC
SPA 7010 Skip next instr. if AC is positive
SNA 7008 Skip next instr. if AC is negative
SZA 7004 Skip next instr. if AC is zero
SZE 7002 Skip next instr. if E is zero
HLT 7001 Halt computer

INP F800 Input character to AC


OUT F400 Output character from AC
SKI F200 Skip on input flag
SKO F100 Skip on output flag
ION F080 Interrupt on
IOF F040 Interrupt off
Instructions

INSTRUCTION SET COMPLETENESS


A computer should have a set of instructions so that the user can
construct machine language programs to evaluate any function that is known
to be computable.
Instruction Types
Functional Instructions
- Arithmetic, logic, and shift instructions
- ADD, CMA, INC, CIR, CIL, AND, CLA
Transfer Instructions
- Data transfers between the main memory
and the processor registers
- LDA, STA
Control Instructions
- Program sequencing and control
- BUN, BSA, ISZ
Input/Output Instructions
- Input and output
- INP, OUT
Instruction codes

CONTROL UNIT

Control unit (CU) of a processor translates from machine instructions


to the control signals for the microoperations that implement them

Control units are implemented in one of two ways


Hardwired Control
CU is made up of sequential and combinational circuits to generate the control
signals
Microprogrammed Control
A control memory on the processor contains microprograms that activate the
necessary control signals

We will consider a hardwired implementation of the control unit for


the Basic Computer
TIMING AND CONTROL

Control unit of Basic Computer

Instruction register (IR)


15 14 13 12 11 - 0 Other inputs

3x8
decoder
7 6543 210
D0
I Combinational Control
D7
Control signals
T15 logic
T0
15 14 . . . . 2 1 0
4 x 16
decoder

4-bit Increment (INR)


sequence Clear (CLR)
counter
(SC) Clock
TIMING SIGNALS
- Generated by 4-bit sequence counter and 416 decoder
- The SC can be incremented or cleared.

- Example: T0, T1, T2, T3, T4, T0, T1, . . .


Assume: At time T4, SC is cleared to 0 if decoder output D3 is active.
D3T4: SC 0
T0 T1 T2 T3 T4 T0
Clock

T0

T1

T2

T3

T4

D3

CLR
SC
INSTRUCTION CYCLE

In Basic Computer, a machine instruction is executed in the following


cycle:
1. Fetch an instruction from memory
2. Decode the instruction
3. Read the effective address from memory if the instruction has an indirect address
4. Execute the instruction

After an instruction is executed, the cycle starts again at step 1, for the
next instruction

Note: Every different processor has its own (different)


instruction cycle
Instruction Cycle

FETCH and DECODE

Fetch and Decode T0: AR PC (S0S1S2=010, T0=1)


T1: IR M [AR], PC PC + 1 (S0S1S2=111, T1=1)
T2: D0, . . . , D7 Decode IR(12-14), AR IR(0-11), I IR(15)

T1 S2
T0 S1 Bus

S0
Memory
unit 7
Address
Read
AR 1
LD
PC 2
INR
IR 5
LD Clock
Common bus
DETERMINE THE TYPE OF INSTRUCTION
Start
SC 0
AR PC T0
T1
IR M[AR],PC PC + 1
T2
Decode Opcode in IR(12-14),
AR IR(0-11),I IR(15)

(Register or I/O) = 1
D7 = 0 (Memory-reference)
(I/O) = 1 I = 0 (indirect) = 1 = 0 (direct)
I
(register)
T3 T3 T3 T3
Execute Execute ARM[AR] Nothing
input-output register-reference
instruction instruction
SC 0 SC 0 Execute T4
memory-reference
instruction
SC 0

D'7IT3: AR M[AR]
D'7I'T3: Nothing
D7I'T3: Execute a register-reference instr.
D7IT3: Execute an input-output instr.
Instruction Cycle

REGISTER REFERENCE INSTRUCTIONS


Register Reference Instructions are identified when
- D7 = 1, I = 0
- Register Ref. Instr. is specified in b0 ~ b11 of IR
- Execution starts with timing signal T3

r = D7 IT3 => Register Reference Instruction


Bi = IR(i) , i=0,1,2,...,11
r: SC 0
CLA rB11: AC 0
CLE rB10: E0
CMA rB9: AC AC
CME rB8: E E
CIR rB7: AC shr AC, AC(15) E, E AC(0)
CIL rB6: AC shl AC, AC(0) E, E AC(15)
INC rB5: AC AC + 1
SPA rB4: if (AC(15) = 0) then (PC PC+1)
SNA rB3: if (AC(15) = 1) then (PC PC+1)
SZA rB2: if (AC = 0) then (PC PC+1)
SZE rB1: if (E = 0) then (PC PC+1)
HLT rB0: S 0 (S is a start-stop flip-flop)
MEMORY REFERENCE INSTRUCTIONS

Symbol Operation Symbolic Description


Decoder
AND D0 AC AC M[AR]
ADD D1 AC AC + M[AR], E Cout
LDA D2 AC M[AR]
STA D3 M[AR] AC
BUN D4 PC AR
BSA D5 M[AR] PC, PC AR + 1
ISZ D6 M[AR] M[AR] + 1, if M[AR] + 1 = 0 then PC PC+1
- The effective address of the instruction is in AR and was placed there during
timing signal T2 when I = 0, or during timing signal T3 when I = 1
- Memory cycle is assumed to be short enough to complete in a CPU cycle
- The execution of MR instruction starts with T4
AND to AC
D0T4: DR M[AR] Read operand
D0T5: AC AC DR, SC 0 AND with AC
ADD to AC
D1T4: DR M[AR] Read operand
D1T5: AC AC + DR, E Cout, SC 0 Add to AC and store carry in E
MEMORY REFERENCE INSTRUCTIONS
LDA: Load to AC
D2T4: DR M[AR]
D2T5: AC DR, SC 0
STA: Store AC
D3T4: M[AR] AC, SC 0
BUN: Branch Unconditionally
D4T4: PC AR, SC 0
BSA: Branch and Save Return Address
M[AR] PC, PC AR + 1
Memory, PC, AR at time T4 Memory, PC after execution
20 0 BSA 135 20 0 BSA 135
PC = 21 Next instruction 21 Next instruction

AR = 135 135 21
136 Subroutine PC = 136 Subroutine

1 BUN 135 1 BUN 135


Memory Memory
MEMORY REFERENCE INSTRUCTIONS

BSA:
D5T4: M[AR] PC, AR AR + 1
D5T5: PC AR, SC 0

ISZ: Increment and Skip-if-Zero


D6T4: DR M[AR]
D6T5: DR DR + 1
D6T4: M[AR] DR, if (DR = 0) then (PC PC + 1), SC 0
FLOWCHART FOR MEMORY REFERENCE INSTRUCTIONS
Memory-reference instruction

AND ADD LDA STA


D0 T4 D1 T4 D2 T4 D3 T4
DR M[AR] DR M[AR] DR M[AR] M[AR] AC
SC 0
D0 T5 D1 T5 D2 T5

AC AC DR AC AC + DR AC DR
SC 0 E Cout SC 0
SC 0
BUN BSA ISZ
D4 T4 D5 T4 D6 T4
PC AR M[AR] PC DR M[AR]
SC 0 AR AR + 1
D5 T5 D6 T5
PC AR DR DR + 1
SC 0
D6 T6
M[AR] DR
If (DR = 0)
then (PC PC + 1)
SC 0
I/O and Interrupt

INPUT-OUTPUT AND INTERRUPT

A Terminal with a keyboard and a Printer


Input-Output Configuration
Input-output Serial Computer
terminal communication
interface registers and
Printer Receiver flip-flops
OUTR FGO
interface

AC

Keyboard Transmitter INPR FGI


INPR Input register - 8 bits interface
OUTR Output register - 8 bits Serial Communications Path
FGI Input flag - 1 bit Parallel Communications Path
FGO Output flag - 1 bit
IEN Interrupt enable - 1 bit

- The terminal sends and receives serial information


- The serial info. from the keyboard is shifted into INPR
- The serial info. for the printer is stored in the OUTR
- INPR and OUTR communicate with the terminal
serially and with the AC in parallel.
- The flags are needed to synchronize the timing
difference between I/O device and the computer
PROGRAM CONTROLLED DATA TRANSFER
-- CPU -- -- I/O Device --
/* Input */ /* Initially FGI = 0 */ loop: If FGI = 1 goto loop
loop: If FGI = 0 goto loop INPR new data, FGI 1
AC INPR, FGI 0
loop: If FGO = 1 goto loop
/* Output */ /* Initially FGO = 1 */
consume OUTR, FGO 1
loop: If FGO = 0 goto loop
OUTR AC, FGO 0
FGI=0 FGO=1
Start Input Start Output

FGI 0
AC Data
yes yes
FGI=0
FGO=0
no
no
AC INPR
OUTR AC

yes More FGO 0


Character
yes More
no Character
END no
END
INPUT-OUTPUT INSTRUCTIONS

D7IT3 = p
IR(i) = Bi, i = 6, , 11

p: SC 0 Clear SC
INP pB11: AC(0-7) INPR, FGI 0 Input char. to AC
OUT pB10: OUTR AC(0-7), FGO 0 Output char. from AC
SKI pB9: if(FGI = 1) then (PC PC + 1) Skip on input flag
SKO pB8: if(FGO = 1) then (PC PC + 1) Skip on output flag
ION pB7: IEN 1 Interrupt enable on
IOF pB6: IEN 0 Interrupt enable off
PROGRAM-CONTROLLED INPUT/OUTPUT

Program-controlled I/O
- Continuous CPU involvement
I/O takes valuable CPU time
- CPU slowed down to I/O speed
- Simple
- Least hardware

Input

LOOP, SKI DEV


BUN LOOP
INP DEV

Output
LOOP, LDA DATA
LOP, SKO DEV
BUN LOP
OUT DEV
INTERRUPT INITIATED INPUT/OUTPUT
- Open communication only when some data has to be passed --> interrupt.

- The I/O interface, instead of the CPU, monitors the I/O device.

- When the interface founds that the I/O device is ready for data transfer,
it generates an interrupt request to the CPU

- Upon detecting an interrupt, the CPU stops momentarily the task


it is doing, branches to the service routine to process the data
transfer, and then returns to the task it was performing.

* IEN (Interrupt-enable flip-flop)

- can be set and cleared by instructions


- when cleared, the computer cannot be interrupted
I/O and Interrupt

FLOWCHART FOR INTERRUPT CYCLE


R = Interrupt f/f
Instruction cycle =0 =1 Interrupt cycle
R

Fetch and decode Store return address


instructions in location 0
M[0] PC
Execute =0
IEN
instructions
=1 Branch to location 1
=1
FGI
PC 1
=0
=1
FGO IEN0
=0 R0
R 1

- The interrupt cycle is a HW implementation of a branch


and save return address operation.
- At the beginning of the next instruction cycle, the
instruction that is read from memory is in address 1.
- At memory address 1, the programmer must store a branch instruction
that sends the control to an interrupt service routine
- The instruction that returns the control to the original
program is "indirect BUN 0"
REGISTER TRANSFER OPERATIONS IN INTERRUPT CYCLE
Memory
Before interrupt After interrupt cycle
0 0 256
1 0 BUN 1120 PC = 1 0 BUN 1120
Main Main
255 Program 255 Program
PC = 256 256
1120 1120
I/O I/O
Program Program
1 BUN 0 1 BUN 0
Register Transfer Statements for Interrupt Cycle
- R F/F 1 if IEN (FGI + FGO)T0T1T2
T0T1T2 (IEN)(FGI + FGO): R 1

- The fetch and decode phases of the instruction cycle


must be modified Replace T0, T1, T2 with R'T0, R'T1, R'T2
- The interrupt cycle :
RT0: AR 0, TR PC
RT1: M[AR] TR, PC 0
RT2: PC PC + 1, IEN 0, R 0, SC 0
FURTHER QUESTIONS ON INTERRUPT

How can the CPU recognize the device


requesting an interrupt ?

Since different devices are likely to require


different interrupt service routines, how can
the CPU obtain the starting address of the
appropriate routine in each case ?

Should any device be allowed to interrupt the


CPU while another interrupt is being serviced ?

How can the situation be handled when two or


more interrupt requests occur simultaneously ?
Description
COMPLETE COMPUTER DESCRIPTION
Flowchart of Operations
start
SC 0, IEN 0, R 0
=0(Instruction =1(Interrupt
R
Cycle) Cycle)
RT0 RT0
AR PC RT AR 0, TR PC RT1
1
IR M[AR], PC PC + 1 M[AR] TR, PC 0
RT2 RT2
AR IR(0~11), I IR(15) PC PC + 1, IEN 0
D0...D7 Decode IR(12 ~ 14) R 0, SC 0

=1(Register or I/O) D=0(Memory Ref)


7

=1 (I/O) =0 (Register) =1(Indir) =0(Dir)


I I

D7IT3 D7IT3 D7IT3 D7IT3


Execute Execute AR <- M[AR] Idle
I/O RR
Instruction Instruction
Execute MR D7T4
Instruction
COMPLETE COMPUTER DESCRIPTION Microoperations

Fetch RT0: AR PC
RT1:
IR M[AR], PC PC + 1
Decode RT2:
D0, ..., D7 Decode IR(12 ~ 14),
Indirect D7IT3: AR IR(0 ~ 11), I IR(15)
AR M[AR]
Interrupt
T0T1T2(IEN)(FGI + FGO):
RT0:
RT1: R 1
RT2: AR 0, TR PC
Memory-Reference M[AR] TR, PC 0
AND D 0 T4 :
PC PC + 1, IEN 0, R 0, SC 0
D 0 T5 :
ADD D 1 T4 :
D 1 T5 : DR M[AR]
LDA D 2 T4 : AC AC DR, SC 0
DR M[AR]
D2T5:
STA D 3 T4 :
BUN D 4 T4 : AC AC + DR, E Cout, SC 0
BSA D 5 T4 : DR M[AR]
D 5 T5 : AC DR, SC 0
ISZ D 6 T4 :
M[AR] AC, SC 0
D 6 T5 :
D 6 T6 : PC AR, SC 0
M[AR] PC, AR AR + 1
PC AR, SC 0
COMPLETE COMPUTER DESCRIPTION Microoperations

Register-Reference
D7IT3 = r (Common to all register-reference instr)
IR(i) = Bi (i = 0,1,2, ..., 11)
r: SC 0
CLA rB11:
AC 0
CLE rB10:
CMA rB9: E0
CME rB8: AC AC
E E
CIR rB7:
CIL rB6:
INC rB5: AC shr AC, AC(15) E, E AC(0)
SPA rB4: AC shl AC, AC(0) E, E AC(15)
SNA rB3: AC AC + 1
SZA rB2:
If(AC(15) =0) then (PC PC + 1)
SZE rB1:
HLT rB0: If(AC(15) =1) then (PC PC + 1)
If(AC = 0) then (PC PC + 1)
If(E=0) then (PC PC + 1)
Input-Output D7IT3 = p
IR(i) = Bi
p: S0
INP pB11:
OUT pB10: (Common to all input-output instructions)
SKI pB9: (i = 6,7,8,9,10,11)
SKO pB8: SC 0
ION pB7: AC(0-7) INPR, FGI 0
IOF pB6:
OUTR AC(0-7), FGO 0
DESIGN OF BASIC COMPUTER(BC)
Hardware Components of BC
A memory unit: 4096 x 16.
Registers:
AR, PC, DR, AC, IR, TR, OUTR, INPR, and SC
Flip-Flops(Status):
I, S, E, R, IEN, FGI, and FGO
Decoders: a 3x8 Opcode decoder
a 4x16 timing decoder
Common bus: 16 bits
Control logic gates:
Adder and Logic circuit: Connected to AC
Control Logic Gates
- Input Controls of the nine registers
- Read and Write Controls of memory
- Set, Clear, or Complement Controls of the flip-flops
- S2, S1, S0 Controls to select a register for the bus
- AC, and Adder and Logic circuit
CONTROL OF REGISTERS AND MEMORY
Address Register; AR
Scan all of the register transfer statements that change the content of AR:
RT0: AR PC LD(AR)
RT2: AR IR(0-11) LD(AR)
D7IT3: AR M[AR] LD(AR)
RT0: AR 0 CLR(AR)
D5T4: AR AR + 1 INR(AR)

LD(AR) = R'T0 + R'T2 + D'7IT3


CLR(AR) = RT0
INR(AR) = D5T4

From bus 12 AR
12
To bus
D'7
I Clock
T3 LD
T2 INR
CLR
R
T0
D
T4
Design of Basic Computer

CONTROL OF FLAGS
IEN: Interrupt Enable Flag
pB7: IEN 1 (I/O Instruction)
pB6: IEN 0 (I/O Instruction)
RT2: IEN 0 (Interrupt)

p = D7IT3 (Input/Output Instruction)

D7
I p
B7 J Q IEN
T3

B6
K
R
T2
CONTROL OF COMMON BUS

x1
x2 S2
x3 Multiplexer
x4 Encoder S 1 bus select
x5 inputs
x6 S0
x7

x1 x2 x3 x4 x5 x6 x7 selected
S2 S1 S0
0 0 0 0 0 0 0 0 0 register
0 none
1 0 0 0 0 0 0 0 0 1 AR
0 1 0 0 0 0 0 0 1 0 PC
0 0 1 0 0 0 0 0 1 1 DR
0 0 0 1 0 0 0 1 0 0 AC
0 0 0 0 1 0 0 1 0 1 IR
0 0 0 0 0 1 0 1 1 0 TR
For AR 0 0 0 0 0 0 1 1 1 1 Memory
D4T4: PC AR
D5T5: PC AR

x1 = D4T4 + D5T5
DESIGN OF ACCUMULATOR LOGIC
Circuits associated with AC
16
16 Adder and 16 16
From DR logic AC
circuit To bus
From INPR8
LD INR CLR Clock

Control
gates

All the statements that change the content of AC


D0T5: AC AC DR AND with DR
D1T5: AC AC + DR Add with DR
D2T5: AC DR Transfer from DR
pB11: AC(0-7) INPR Transfer from INPR
rB9: AC AC Complement
rB7 : AC shr AC, AC(15) E Shift right
rB6 : AC shl AC, AC(0) E Shift left
rB11 : AC 0 Clear
rB5 : AC AC + 1 Increment
CONTROL OF AC REGISTER

Gate structures for controlling


the LD, INR, and CLR of AC

From Adder 16 16 To bus


and Logic AC
D0 AND LD Clock
T5 INR
D1 ADD CLR
D2 DR
T5
p INPR
B 11
r COM
B9
SHR
B7
SHL
B6
INC
B5
CLR
B 11
ALU (ADDER AND LOGIC CIRCUIT)

One stage of Adder and Logic circuit


DR(i) AC(i)

AND

Ci ADD LD
FA Ii J Q
DR AC(i)
Ci+1
K
From INPR
INPR
bit(i) COM

SHR
AC(i+1)
SHL
AC(i-1)
UNIT-4
MEMORY ORGANIZATION

Memory Hierarchy

Main Memory

Auxiliary Memory

Associative Memory

Cache Memory

Virtual Memory

Memory Management Hardware


Memory Hierarchy

MEMORY HIERARCHY

Memory Hierarchy is to obtain the highest possible


access speed while minimizing the total cost of the memory system
Auxiliary memory
Magnetic
tapes I/O Main
processor memory
Magnetic
disks

CPU Cache
memory

Register

Cache

Main Memory

Magnetic Disk

Magnetic Tape
Main Memory

MAIN MEMORY
RAM and ROM Chips
Typical RAM chip
Chip select 1 CS1
Chip select 2 CS2
Read RD 128 x 8 8-bit data bus
RAM
Write WR
7-bit address AD 7

CS1 CS2 RD WR Memory function State of data bus


0 0 x x Inhibit High-impedence
0 1 x x Inhibit High-impedence
1 0 0 0 Inhibit High-impedence
1 0 0 1 Write Input data to RAM
1 0 1 x Read Output data from RAM
1 1 x x Inhibit High-impedence

Typical ROM chip

Chip select 1 CS1


Chip select 2 CS2
512 x 8 8-bit data bus
ROM
9-bit address AD 9
Main Memory

MEMORY ADDRESS MAP


Address space assignment to each memory chip

Example: 512 bytes RAM and 512 bytes ROM

Hexa Address bus


Component address 10 9 8 7 6 5 4 3 2 1
RAM 1 0000 - 007F 0 0 0 x x x x x x x
RAM 2 0080 - 00FF 0 0 1 x x x x x x x
RAM 3 0100 - 017F 0 1 0 x x x x x x x
RAM 4 0180 - 01FF 0 1 1 x x x x x x x
ROM 0200 - 03FF 1 x x x x x x x x x

Memory Connection to CPU


- RAM and ROM chips are connected to a CPU
through the data and address buses

- The low-order lines in the address bus select


the byte within the chips and other lines in the
address bus select a particular chip through
its chip select inputs
Main Memory

CONNECTION OF MEMORY TO CPU


CPU
Address bus
16-1110 9 8 7-1 RD WR Data bus

Decoder
3210
CS1

Data
CS2 128 x 8
RD RAM 1
WR
AD7
CS1

Data
CS2
RD 128
RAM
x8
2
WR
AD7
CS1

Data
CS2 128 x 8
RD RAM 3
WR
AD7
CS1
Data
CS2
RD 128
RAM
x8
4
WR
AD7
CS1
Data

1- 7 CS2 512 x 8
8 } AD9 ROM
9
INPUT-OUTPUT ORGANIZATION

Peripheral Devices

Input-Output Interface

Asynchronous Data Transfer

Modes of Transfer

Priority Interrupt

Direct Memory Access

Input-Output Processor

Serial Communication
Peripheral Devices

PERIPHERAL DEVICES

Input Devices Output Devices


Keyboard Card Puncher, Paper Tape Puncher
Optical input devices CRT
- Card Reader Printer (Impact, Ink Jet,
- Paper Tape Reader Laser, Dot Matrix)
- Bar code reader Plotter
- Digitizer Analog
- Optical Mark Reader Voice
Magnetic Input Devices
- Magnetic Stripe Reader
Screen Input Devices
- Touch Screen
- Light Pen
- Mouse
Analog Input Devices
Input/Output Interfaces

INPUT/OUTPUT INTERFACES

* Provides a method for transferring information between internal storage


(such as memory and CPU registers) and external I/O devices

* Resolves the differences between the computer and peripheral devices

- Peripherals - Electromechanical Devices


CPU or Memory - Electronic Device

- Data Transfer Rate


Peripherals - Usually slower
CPU or Memory - Usually faster than peripherals
Some kinds of Synchronization mechanism may be needed

- Unit of Information
Peripherals - Byte
CPU or Memory - Word

- Operating Modes
Peripherals - Autonomous, Asynchronous
CPU or Memory - Synchronous
Input/Output Interfaces

I/O BUS AND INTERFACE MODULES


I/O bus
Data
Processor Address
Control

Interface Interface Interface Interface

Keyboard
and Printer Magnetic
disk
Magnetic
tape
display
terminal
Each peripheral has an interface module associated with it

Interface
- Decodes the device address (device code)
- Decodes the commands (operation)
- Provides signals for the peripheral controller
- Synchronizes the data flow and supervises
the transfer rate between peripheral and CPU or Memory
Typical I/O instruction
Op. code Device address Function code
(Command)
Input/Output Interfaces

CONNECTION OF I/O BUS


Connection of I/O Bus to CPU

Op. Device Function Accumulator Computer


I/O
code address code register control CPU

Sense lines
Data lines I/O
Function code lines bus
Device address lines
Connection of I/O Bus to One Interface
Data lines Peripheral
register
Device Buffer register Output
address peripheral
I/O device
AD = 1101 Interface
and
bus Logic controller
Function codeCommand
decoder
Sense lines Status
register
Input/Output Interfaces

I/O BUS AND MEMORY BUS


Functions of Buses

* MEMORY BUS is for information transfers between CPU and the MM


* I/O BUS is for information transfers between CPU
and I/O devices through their I/O interface

Physical Organizations
* Many computers use a common single bus system
for both memory and I/O interface units
- Use one common bus but separate control lines for each function
- Use one common bus with common control lines for both functions
* Some computer systems use two separate buses,
one to communicate with memory and the other with I/O interfaces
I/O Bus
- Communication between CPU and all interface units is via a common
I/O Bus
- An interface connected to a peripheral device may have a number of
data registers , a control register, and a status register
- A command is passed to the peripheral by sending
to the appropriate interface register
- Function code and sense lines are not needed (Transfer of data, control,
and status information is always via the common I/O Bus)
Input/Output Interfaces

ISOLATED vs MEMORY MAPPED I/O

Isolated I/O
- Separate I/O read/write control lines in addition to memory read/write control
lines
- Separate (isolated) memory and I/O address spaces
- Distinct input and output instructions

Memory-mapped I/O
- A single set of read/write control lines
(no distinction between memory and I/O transfer)
- Memory and I/O addresses share the common address space
-> reduces memory address range available
- No specific input or output instruction
-> The same memory reference instructions can
be used for I/O transfers
- Considerable flexibility in handling I/O operations
Input/Output Interfaces

I/O INTERFACE
Port A I/O data
register
Bidirectional Bus
data bus buffers
Port B I/O data
register
CPU Chip select I/O
CS
Register select Control Control Device
RS1 Timing register
Register select RS0 and
I/O read Control
RD Status Status
I/O write WR register

CS RS1 RS0 Register selected


0 x x None - data bus in high-imped
1 0 0 Port A register
1 0 1 Port B register
1 1 0 Control register
Programmable Interface 1 1 1 Status register
- Information in each port can be assigned a meaning
depending on the mode of operation of the I/O device
-> Port A = Data; Port B = Command; Port C = Status
- CPU initializes(loads) each port by transferring a byte to the Control Register
-> Allows CPU can define the mode of operation of each port
-> Programmable Port: By changing the bits in the control register, it is
possible to change the interface characteristics
Asynchronous Data Transfer

ASYNCHRONOUS DATA TRANSFER

Synchronous and Asynchronous Operations


Synchronous - All devices derive the timing
information from common clock line
Asynchronous - No common clock

Asynchronous Data Transfer


Asynchronous data transfer between two independent units requires that
control signals be transmitted between the communicating units to
indicate the time at which data is being transmitted
Two Asynchronous Data Transfer Methods
Strobe pulse
- A strobe pulse is supplied by one unit to indicate
the other unit when the transfer has to occur

Handshaking
- A control signal is accompanied with each data
being transmitted to indicate the presence of data
- The receiving unit responds with another control
signal to acknowledge receipt of the data
Asynchronous Data Transfer

STROBE CONTROL

* Employs a single control line to time each transfer


* The strobe may be activated by either the source or the destination unit

Source-Initiated Strobe Destination-Initiated Strobe


for Data Transfer for Data Transfer

Block Diagram Block Diagram

Data bus Data bus


Source Destination Source Destination
unit Strobe unit unit Strobe unit

Timing Diagram Timing Diagram

Valid data Valid data


Data Data

Strobe Strobe
HANDSHAKING

Strobe Methods

Source-Initiated

The source unit that initiates the transfer has


no way of knowing whether the destination unit
has actually received data

Destination-Initiated

The destination unit that initiates the transfer


no way of knowing whether the source has
actually placed the data on the bus

To solve this problem, the HANDSHAKE method


introduces a second control signal to provide a Reply
to the unit that initiates the transfer
Asynchronous Data Transfer

SOURCE-INITIATED TRANSFER USING HANDSHAKE


Data bus
Block Diagram Source Data valid Destination
unit Data accepted unit

Data bus Valid data


Timing Diagram

Data valid

Data accepted

Sequence of Events Source unit Destination unit


Place data on bus.
Enable data valid.
Accept data from bus.
Enable data accepted
Disable data valid.
Invalidate data on bus.
Disable data accepted.
Ready to accept data
* Allows arbitrary delays from one state to the next (initial state).
* Permits each unit to respond at its own data transfer rate
* The rate of transfer is determined by the slower unit
Asynchronous Data Transfer

DESTINATION-INITIATED TRANSFER USING HANDSHAKE


Data bus
Block Diagram Source Data valid Destination
unit Ready for data unit

Timing Diagram Ready for data

Data valid

Data bus Valid data


Sequence of Events
Source unit Destination unit
Ready to accept data.
Place data on bus. Enable ready for data.
Enable data valid.
Accept data from bus.
Disable data valid. Disable ready for data.
Invalidate data on bus
(initial state).
* Handshaking provides a high degree of flexibility and reliability because the
successful completion of a data transfer relies on active participation by both units
* If one unit is faulty, data transfer will not be completed
-> Can be detected by means of a timeout mechanism
Asynchronous Data Transfer

ASYNCHRONOUS SERIAL TRANSFER


Asynchronous serial transfer
Four Different Types of Transfer Synchronous serial transfer
Asynchronous parallel transfer
Synchronous parallel transfer
Asynchronous Serial Transfer
- Employs special bits which are inserted at both
ends of the character code
- Each character consists of three parts; Start bit; Data bits; Stop bits.

1 1 0 0 0 1 0 1
Start Character bits Stop
bit bits
(1 bit) (at least 1 bit)
A character can be detected by the receiver from the knowledge of 4 rules;
- When data are not being sent, the line is kept in the 1-state (idle state)
- The initiation of a character transmission is detected
by a Start Bit , which is always a 0
- The character bits always follow the Start Bit
- After the last character , a Stop Bit is detected when
the line returns to the 1-state for at least 1 bit time
The receiver knows in advance the transfer rate of the
bits and the number of information bits to expect
UNIVERSAL ASYNCHRONOUS RECEIVER-TRANSMITTER
- UART -
A typical asynchronous communication interface available as an IC
Transmit
Bidirectional Transmitter Shift data
data bus Bus register register
buffers

Internal Bus
Control Transmitter
Transmitter
register control clock
Chip select CS and clock
Register selectRS Timing Status Receiver Receiver CS RS Oper. Register selec
and register control clock 0 x x None
I/O read RD Control and clock 1 0 WR Transmitter re
I/O write WR Receive 1 1 WR Control regist
Receiver Shift data 1 0 RD Receiver regis
register register 1 1 RD Status registe
Transmitter Register
- Accepts a data byte(from CPU) through the data bus
- Transferred to a shift register for serial transmission
Receiver
- Receives serial information into another shift register
- Complete data byte is sent to the receiver register
Status Register Bits
- Used for I/O flags and for recording errors
Control Register Bits
- Define baud rate, no. of bits in each character, whether
to generate and check parity, and no. of stop bits
UNIT-5
PIPELINING AND VECTOR PROCESSING

Parallel Processing

Pipelining

Arithmetic Pipeline

Instruction Pipeline

RISC Pipeline

Vector Processing

Array Processors(refer book)


Parallel Processing
PARALLEL PROCESSING

Execution of Concurrent Events in the computing


process to achieve faster Computational Speed

Levels of Parallel Processing

- Job or Program level

- Task or Procedure level

- Inter-Instruction level

- Intra-Instruction level
Parallel Processing
PARALLEL COMPUTERS
Architectural Classification

Flynn's classification
Based on the multiplicity of Instruction Streams and Data Streams
Instruction Stream
Sequence of Instructions read from memory
Data Stream
Operations performed on the data in the processor

Number of Data Streams


Single Multiple

Number of Single SISD SIMD


Instruction
Streams Multiple MISD MIMD
Parallel Processing
COMPUTER ARCHITECTURES FOR PARALLEL PROCESSING

Von-Neuman SISD Superscalar processors


based
Superpipelined processors

VLIW

MISD Nonexistence

SIMD Array processors

Systolic arrays
Dataflow
Associative processors

MIMD Shared-memory multiprocessors


Reduction
Bus based
Crossbar switch based
Parallel Processing
SISD COMPUTER SYSTEMS

Control Processor Data stream Memory


Unit Unit

Instruction stream
Characteristics

- Standard von Neumann machine


- Instructions and data are stored in memory
- One operation at a time

Limitations

Von Neumann bottleneck

Maximum speed of the system is limited by the


Memory Bandwidth (bits/sec or bytes/sec)

- Limitation on Memory Bandwidth


- Memory is shared by CPU and I/O
Parallel Processing
MISD COMPUTER SYSTEMS

M CU P

M CU P Memory


M CU P Data stream

Instruction stream

Characteristics
- There is no computer at present that can be
classified as MISD
Parallel Processing
SIMD COMPUTER SYSTEMS
Memory
Data bus

Control Unit
Instruction stream

P P P Processor units
Data stream
Alignment network

M M M Memory modules

Characteristics
- Only one copy of the program exists
- A single controller executes one instruction at a time
Parallel Processing
MIMD COMPUTER SYSTEMS

P M P M P M

Interconnection Network

Shared Memory

Characteristics
- Multiple processing units

- Execution of multiple instructions on multiple data

Types of MIMD computer systems


- Shared memory multiprocessors

- Message-passing multicomputers
Pipelining
PIPELINING
A technique of decomposing a sequential process
into suboperations, with each subprocess being
executed in a partial dedicated segment that
operates concurrently with all other segments.
Ai * Bi + Ci for i = 1, 2, 3, ... , 7
Ai Bi MemoryCi
Segment 1
R1 R2

Multiplier
Segment 2

R3 R4

Adder
Segment 3

R5

R1 Ai, R2 Bi Load Ai and Bi


R3 R1 * R2, R4 Ci Multiply and load Ci
R5 R3 + R4 Add
Pipelining
OPERATIONS IN EACH PIPELINE STAGE

Clock Segment 1 Segment 2 Segment 3


Pulse
Number R1 R2 R3 R4 R5
1 A1 B1
2 A2 B2 A1 * B1 C1
3 A3 B3 A2 * B2 C2 A1 * B1 + C1
4 A4 B4 A3 * B3 C3 A2 * B2 + C2
5 A5 B5 A4 * B4 C4 A3 * B3 + C3
6 A6 B6 A5 * B5 C5 A4 * B4 + C4
7 A7 B7 A6 * B6 C6 A5 * B5 + C5
8 A7 * B7 C7 A6 * B6 + C6
9 A7 * B7 + C7
Pipelining
GENERAL PIPELINE
General Structure of a 4-Segment Pipeline
Clock

Input S1 R1 S2 R2 S3 R3 S4 R4

Space-Time Diagram

1 2 3 4 5 6 7 8 9 Clock cycles
Segment 1 T1 T2 T3 T4 T5 T6
2 T1 T2 T3 T4 T5 T6
3 T1 T2 T3 T4 T5 T6
4 T1 T2 T3 T4 T5 T6
Pipelining
PIPELINE SPEEDUP
n: Number of tasks to be performed

Conventional Machine (Non-Pipelined)


tn: Clock cycle
t1: Time required to complete the n tasks
t 1 = n * tn

Pipelined Machine (k stages)


tp: Clock cycle (time to complete each suboperation)
tk: Time required to complete the n tasks
tk = (k + n - 1) * tp

Speedup
Sk: Speedup

Sk = n*tn / (k + n - 1)*tp
tn
lim Sk = ( = k, if tn = k * tp )
n tp
Pipelining
PIPELINE AND MULTIPLE FUNCTION UNITS
Example
- 4-stage pipeline
- subopertion in each stage; tp = 20nS
- 100 tasks to be executed
- 1 task in non-pipelined system; 20*4 = 80nS

Pipelined System
(k + n - 1)*tp = (4 + 99) * 20 = 2060nS

Non-Pipelined System
n*k*tp = 100 * 80 = 8000nS

Speedup
Sk = 8000 / 2060 = 3.88
Ii I i+1 I i+2 I i+3
4-Stage Pipeline is basically identical to the system
with 4 identical function units
Multiple Functional Units P1 P2 P3 P4
Arithmetic Pipeline
ARITHMETIC PIPELINE
Floating-point adder Exponents Mantissas
a b A B
X = A x 2a
R R
Y = B x 2b

[1] Compare the exponents Compare Difference


Segment 1: exponents
[2] Align the mantissa by subtraction

[3] Add/sub the mantissa


R
[4] Normalize the result
Segment 2: Choose exponent Align mantissa

Segment 3: Add or subtract


mantissas

R R

Segment 4: Adjust Normalize


exponent result

R R
Instruction Pipeline
INSTRUCTION CYCLE
Six Phases* in an Instruction Cycle
[1] Fetch an instruction from memory
[2] Decode the instruction
[3] Calculate the effective address of the operand
[4] Fetch the operands from memory
[5] Execute the operation
[6] Store the result in the proper place

* Some instructions skip some phases


* Effective address calculation can be done in
the part of the decoding phase
* Storage of the operation result into a register
is done automatically in the execution phase

==> 4-Stage Pipeline

[1] FI: Fetch an instruction from memory


[2] DA: Decode the instruction and calculate
the effective address of the operand
[3] FO: Fetch the operand
[4] EX: Execute the operation
Instruction Pipeline
INSTRUCTION PIPELINE

Execution of Three Instructions in a 4-Stage Pipeline


Conventional

i FI DA FO EX

i+1 FI DA FO EX

i+2 FI DA FO EX

Pipelined

i FI DA FO EX
i+1 FI DA FO EX
i+2 FI DA FO EX
Instruction Pipeline
INSTRUCTION EXECUTION IN A 4-STAGE PIPELINE

Segment1: Fetch instruction


from memory

Decode instruction
Segment2: and calculate
effective address

Branch?
yes
no
Fetch operand
Segment3: from memory

Segment4: Execute instruction

Interrupt yes
Interrupt?
handling
no
Update PC

Empty pipe
Step: 1 2 3 4 5 6 7 8 9 10 11 12 13
Instruction 1 FI DA FO EX
2 FI DA FO EX
(Branch) 3 FI DA FO EX
4 FI FI DA FO EX
5 FI DA FO EX
6 FI DA FO EX
7 FI DA FO EX
Instruction Pipeline
MAJOR HAZARDS IN PIPELINED EXECUTION
Structural hazards(Resource Conflicts)
caused by access to memory by two segments at the same time.
Most of these conflicts can be resolved by using separate instruction
and data memories.
Data hazards (Data Dependency Conflicts)
An instruction scheduled to be executed in the pipeline requires the
result of a previous instruction, which is not yet available

R1 <- B + C ADD DA B,C + Data dependency


R1 <- R1 + 1
INC DA bubble R1 +1
Control hazards
Branches and other instructions that change the PC
make the fetch of the next instruction to be delayed
JMP ID PC + PC Branch address dependency

bubble IF ID OF OE OS

Hazards in pipelines may make it Pipeline Interlock:


necessary to stall the pipeline Detect Hazards Stall until it is cleared
Instruction Pipeline
STRUCTURAL HAZARDS
Structural Hazards(Resource conflicts)

Occur when some resource has not been


duplicated enough to allow all combinations
of instructions in the pipeline to execute

Example: With one memory, a data and an instruction fetch


cannot be initiated in the same clock
i FI DA FO EX

i+1 FI DA FO EX

i+2 stall stall FI DA FO EX

The Pipeline is stalled for resource conflict


<- Two Loads with one port memory
-> Two-port memory will serve without stall
Instruction Pipeline
DATA HAZARDS
Data Hazards

Occurs when the execution of an instruction


depends on the results of a previous instruction
ADD R1, R2, R3
SUB R4, R1, R5
Data hazard can be dealt with either hardware
techniques or software technique
Hardware Technique

Interlock
- hardware detects the data dependencies and delays the scheduling
of the dependent instruction by stalling enough clock cycles
Forwarding (bypassing, short-circuiting)
- Accomplished by a data path that routes a value from a source
(usually an ALU) to a user, bypassing a designated register. This
allows the value to be produced to be used at an earlier stage in the
pipeline than would otherwise be possible

Software Technique
The compiler is designed to detect a data conflict and reorder instructions
As necessary to delay the loading of the conflicting data by inserting no-operation
instructions.This method is called DELAY LOAD
Instruction Pipeline
CONTROL HAZARDS(Branching Difficulties)

Branch Instructions

- Branch target address is not known until


the branch instruction is decoded.
Branch FI DA FO EX
Instruction
Next FI DA FO EX
Instruction
Target address available

- Stall -> waste of cycle times

Dealing with Control Hazards

* Prefetch Target Instruction


* Branch Target Buffer
* Loop Buffer
* Branch Prediction
* Delayed Branch
Instruction Pipeline
CONTROL HAZARDS
Prefetch Target Instruction
Fetch instructions in both streams, instruction to be executed if branch
not taken and the instruction if branch taken
Both are saved until branch branch is executed. Then, select the right
instruction stream and discard the wrong stream
Branch Target Buffer(BTB; Associative Memory)
Present in the fetch segment of the pipeline. It has entry of the Address
of previously executed branches i.e. their Target instruction and
the next few instructions
When fetching an instruction, search BTB.
If found, fetch the instruction stream in BTB;
If not, new stream is fetched and update BTB
Loop Buffer(High Speed Register file)
A variation of BTB. A register file maintained by the instruction fetch segment
of the pipeline.
Register file stores the entire loop that allows to execute a loop
without accessing memory
Branch Prediction
Uses additional logic to guess the outcome of the branch condition before it is executed.
The instruction is fetched based on the guess. Correct guess eliminates the branch penalty
Delayed Branch
Compiler detects the branch and rearranges the instruction sequence
by inserting useful instructions that keep the pipeline busy
in the presence of a branch instruction
RISC Pipeline
RISC PIPELINE
RISC
- Machine with a very fast clock cycle that executes at the rate of one
instruction per cycle
<- Simple Instruction Set
Fixed Length Instruction Format
Register-to-Register Operations
Instruction Cycles of Three-Stage Instruction Pipeline
Data Manipulation Instructions
I: Instruction Fetch
A: Decode, Read Registers, ALU Operations
E: Write a Register

Load and Store Instructions


I: Instruction Fetch
A: Decode, Evaluate Effective Address
E: Register-to-Memory or Memory-to-Register

Program Control Instructions


I: Instruction Fetch
A: Decode, Evaluate Branch Address
E: Write Register(PC)
RISC Pipeline
DELAYED LOAD IN RISC PIPELINE
LOAD: R1 M[address 1]
LOAD: R2 M[address 2]
ADD: R3 R1 + R2
STORE: M[address 3] R3
Three-segment pipeline timing
Pipeline timing with data conflict

clock cycle 1 2 3 4 5 6
Load R1 I A E
Load R2 I A E
Add R1+R2 I A E
Store R3 I A E

Pipeline timing with delayed load

clock cycle 1 2 3 4 5 6 7 The data dependency is taken


Load R1 I A E care by the compiler rather
Load R2 I A E than the hardware
NOP I A E
Add R1+R2 I A E
Store R3 I A E
RISC Pipeline
DELAYED BRANCH
Compiler analyzes the instructions before and after
the branch and rearranges the program sequence by
inserting useful instructions in the delay steps
Using no-operation instructions
Clock cycles: 1 2 3 4 5 6 7 8 9 10
1. Load I A E
2. Increment I A E
3. Add I A E
4. Subtract I A E
5. Branch to X I A E
6. NOP I A E
7. NOP I A E
8. Instr. in X I A E

Rearranging the instructions


Clock cycles: 1 2 3 4 5 6 7 8
1. Load I A E
2. Increment I A E
3. Branch to X I A E
4. Add I A E
5. Subtract I A E
6. Instr. in X I A E
Vector Processing
VECTOR PROCESSING

Vector Processing Applications


Problems that can be efficiently formulated in terms of vectors
Long-range weather forecasting
Petroleum explorations
Seismic data analysis
Medical diagnosis
Aerodynamics and space flight simulations
Artificial intelligence and expert systems
Mapping the human genome
Image processing

Vector Processor (computer)


Ability to process vectors, and related data structures such as matrices
and multi-dimensional arrays, much faster than conventional computers

Vector Processors may also be pipelined


Vector Processing
VECTOR PROGRAMMING

DO 20 I = 1, 100
20 C(I) = B(I) + A(I)

Conventional computer
Initialize I = 0
20 Read A(I)
Read B(I)
Store C(I) = A(I) + B(I)
Increment I = i + 1
If I 100 goto 20

Vector computer

C(1:100) = A(1:100) + B(1:100)


Vector Processing
VECTOR INSTRUCTION FORMAT
Vector Instruction Format
Operation Base address Base address Base address Vector
code source 1 source 2 destination length

Pipeline for Inner Product of Matrix Multiplication


Source
A

Source Multiplier Adder


B pipeline pipeline

C= A1 B1 + A5B5 + A9 B9 + A13 B13 +.+ Ak Bk

K may be equal to 100 or even 1000

The values of A and B are either in memory or in processor registers. Each floating
point adder and multiplier unit is supposed to have 4 segments. All segment
registers are initially initialized to zero. Therefore the output of the adder is zero
for the first 8 cycles until both the pipes are full.

Ai and Bi are brought in and multiplied at a rate of one pair per cycle. After 4 cycles
the products are added to the Output of the adder. During the next 4 cycles zero is added.
At the end of the 8th cycle the first four products A1B1 through A4B4 are in the four
adder segments and the next four products A5 B5 through A8B8 are in the multiplier
Segments.
C= A1 B1 + A5B5 + A9 B9 + A13 B13 +.
+ A2 B2 + A6 B6 + A10 B10 + A14 B14 +.
+ A3 B3 + A7 B7 + A11 B11 + A15 B15 +..
+ A4 B4 + A8 B8 + A12 B12 + A16 B16 +.
Multiprocessors
MULTIPROCESSORS

Characteristics of Multiprocessors

Interconnection Structures

Interprocessor Arbitration

Interprocessor Communication
and Synchronization

Cache Coherence
Multiprocesso
rs
Characteristics of Multiprocessor systems

A multiprocessor system is an interconnection of two or more CPUs with


memory and input-output equipment.

Multiprocessors system are classified as multiple instruction stream, multiple


data stream systems(MIMD).

There exists a distinction between multiprocessor and multicomputers that


though both support concurrent operations. In multicomputers several
autonomous computers are connected through a network and they may or
may not communicate but in a multiprocessor system thereis a single OS
Control that provides interaction between processors and all the components
of the system to cooperate in the solution of the problem.

VLSI circuit technology has reduced the cost of the computers to such a low
Level that the concept of applying multiple processors to meet system
performance requirements has become an attractive design possibility.
Multiprocessors
Characteristics of Multiprocessors
Benefits of Multiprocessing:

1. Multiprocessing increases the reliability of the system so that a failure or


error in one part has limited effect on the rest of the system. If a fault causes
one processor to fail, a second processor can be assigned to perform the
functions of the disabled one.

2. Improved System performance. System derives high performance from the


fact that computations can proceed in parallel in one of the two ways:
a) Multiple independent jobs can be made to operate in parallel.
b) A single job can be partitioned into multiple parallel tasks. This can be
achieved in two ways:
The user explicitly declares that the tasks of the program be
executed in parallel
The compiler provided with multiprocessor s/w that can
automatically detect parallelism in program. Actually it checks
for Data dependency.
Multiprocessors
COUPLING OF PROCESSORS
Tightly Coupled System/Shared Memory
- Tasks and/or processors communicate in a highly synchronized fashion
- Communicates through a common global shared memory
- Shared memory system. This doesnt preclude each processor from
having its own local memory(cache memory)
Loosely Coupled System/Distributed Memory
- Tasks or processors do not communicate in a synchronized fashion.
- Communicates by message passing packets consisting of an address, the data content, and
some error detection code.
- Overhead for data exchange is high
- Distributed memory system

Loosely coupled systems are more efficient when the interaction between tasks is
minimal, whereas tightly coupled system can tolerate a higher degree of interaction
between tasks.
Multiprocessors
GRANULARITY OF PARALLELISM
Granularity of Parallelism
Coarse-grain

- A task is broken into a handful of pieces, each of which is executed by a powerful processor
- Processors may be heterogeneous
- Computation/communication ratio is very high

Medium-grain

- Tens to few thousands of pieces


- Processors typically run the same code
- Computation/communication ratio is often hundreds or more

Fine-grain

- Thousands to perhaps millions of small pieces, executed by very small, simple processors or
through pipelines
- Processors typically have instructions broadcasted to them
- Compute/communicate ratio often near unity
Multiprocessors
MEMORY
Shared (Global) Memory
- A Global Memory Space accessible by all processors
- Processors may also have some local memory
Distributed (Local, Message-Passing) Memory
- All memory units are associated with processors
- To retrieve information from another processor's
memory a message must be sent there
Uniform Memory
- All processors take the same time to reach all memory locations
Nonuniform (NUMA) Memory
- Memory access is not uniform

SHARED MEMORY
Memory DISTRIBUTED MEMORY
Network

Network

Processors/Memory
Processors
Multiprocessors
SHARED MEMORY MULTIPROCESSORS
M M M
...

Buses, Interconnection Network Multistage IN,


Crossbar Switch

P P ... P
Characteristics

All processors have equally direct access to one large memory address space

Limitations

Memory access latency; Hot spot problem


Multiprocessors
MESSAGE-PASSING MULTIPROCESSORS
Message-Passing Network Point-to-point connections

P P ... P

M M ... M

Characteristics

- Interconnected computers
- Each processor has its own memory, and
communicate via message-passing

Limitations

- Communication overhead; Hard to programming


Multiprocessors
INTERCONNECTION STRUCTURES
The interconnection between the components of a multiprocessor
System can have different physical configurations depending n the number
of transfer paths that are available between the processors and memory in a shared memory
system and among the processing elements in a loosely coupled system.

Some of the schemes are as:


* Time-Shared Common Bus
* Multiport Memory
* Crossbar Switch
* Multistage Switching Network
* Hypercube System

Time shared common Bus


All processors (and memory) are connected to a common bus or busses
- Memory access is fairly uniform, but not very scalable
BUS
- A collection of signal lines that carry module-to-module communication
- Data highways connecting several digital system elements
Operations of Bus Devices
M3 S7 M6 S5 M4
S2

Bus

M3 wishes to communicate with S5


[1] M3 sends signals (address) on the bus that causes
S5 to respond
[2] M3 sends data to S5 or S5 sends data to
M3(determined by the command line)

Master Device: Device that initiates and controls the communication


Slave Device: Responding device
Multiple-master buses
-> Bus conflict
-> need bus arbitration
SYSTEM BUS STRUCTURE FOR MULTIPROCESSORS

Local Bus

Common System Local


Shared Bus CPU IOP
Memor Controller Memory
y

SYSTEM BUS

IOP Local Bus CPU Loca


System Bus CPU System
Contro
Con l
Controller ller Memory Memory

Local Bus Local Bus


Multiprocessors
MULTIPORT MEMORY

Multiport Memory Module


- Each port serves a CPU

Memory Module Control Logic


- Each memory module has control logic
- Resolve memory module conflicts Fixed priority among CPUs

Advantages
- Multiple paths -> high transfer rate
Memory Modules
Disadvantages MM 1 MM 2 MM 3 MM 4
- Memory control logic
- Large number of cables and
connections
CPU 1

CPU 2

CPU 3

CPU 4
Multiprocesso
rs
CROSSBAR SWITCH
Memory modules
Each switch point has control logic to set up MM1 MM2 MM3 MM4
The transfer path between a processor and a
Memory.
CPU1
It also resolves the multiple requests for access to the same memory on the predetermined

Priority basis. CPU2


Though this organization supports simultaneous there is a separate path associated with each
Module. The H/w required to implement the
transfers from all memory modules because CPU4 CPU3

Block Diagram of Crossbar Switch


switch can become quite large and complex

} control
data,address, and

CPU 1
from
data
address
Multiplexers and
arbitration } data,address,
and control
Memor from CPU 2
Module R/W
y logi
memory enable c } control
data,address, and
from
CPU 3
} control
data,address, and
from
CPU 4
Multiprocessors
MULTISTAGE SWITCHING NETWORK

Interstage Switch

A 0 A 0

B 1 B 1

A connected to 0 A connected to 1

A 0 A 0

B 1 B 1

B connected to 0 B connected to 1
Multiprocessors
MULTISTAGE INTERCONNECTION NETWORK
Binary Tree with 2 x 2 Switches 0 000
0 1
001
1
0 010
Some requests cannot be For 0
P1 1
Satisfied simultaneously 1 011
Ex: if P1 is connected to P2
0
000 through 001, p2 can be 100
connected to only one of the 0
1
Destinations ie100 through 111 1 101

0 110
1
111
8x8 Omega Switching Network
0 000
1 001

2 010
3 011

4 100
5 101

6 110
7 111
Multiprocessors
HYPERCUBE INTERCONNECTION
n-dimensional hypercube (binary n-cube)
- p = 2n
- processors are conceptually on the corners of a n-dimensional hypercube, and each is directly connected to
the n neighboring nodes
-Degree = n
- Routing Procedure: source 010 , destination 001
Ex-or :011 .So data is transmitted on y axis and then on Z axis i.e. 010 to 000 and then 000 to 001
011 111

010
0 01 11 110

101
001
1 00 10 100

One-cube Two-cube 00
Three-cube
0
Multiprocessors
INTERPROCESSOR ARBITRATION
Only one of CPU, IOP, and Memory can be granted to use the bus at a time
Arbitration mechanism is needed to handle multiple requests to the shared resources to
resolve multiple contention.

SYSTEM BUS:
A bus that connects the major components such as CPUs, IOPs and memory

A typical System bus consists of 100 signal lines divided into three functional groups: data,
address and control lines. In addition there are power distribution lines to the components.
e.g. IEEE standard 796 bus
- 86 lines
Data: 16(multiple of 8) Address: 24
Control: 26
Power: 20
Multiprocessors

SYNCHRONOUS & ASYNCHRONOUS DATA TRANSFER


Synchronous Bus
Each data item is transferred over a time slice
known to both source and destination unit
- Common clock source
- Or separate clock and synchronization signal is transmitted periodically to synchronize
the clocks in the system

Asynchronous Bus

* Each data item is transferred by Handshake


mechanism
- Unit that transmits the data transmits a control signal that indicates the presence of data
- Unit that receiving the data responds with another control signal to acknowledge the receipt
of the data

* Strobe pulse - supplied by one of the units to indicate to the other unit when the data transfer
has to occur
Multiprocessors
BUS SIGNALS
- address
- data
Bus signal allocation -
- arbitration control
- interrupt
- timing
- power, ground

IEEE Standard 796 Multibus Signals

Data and address


Data lines (16 lines) DATA0 - DATA15
Address lines (24 lines) ADRS0 - ADRS23
Data transfer
Memory read MRDC Memory write MWTC IO read
IORC IO write IOWC
Transfer acknowledge TACK (XACK) Interrupt control
Interrupt request INT0 - INT7
interrupt acknowledge INTA
Multiprocesso
rs
BUS SIGNALS

IEEE Stand ard 796 Multi bu s Sign als (Con td )

Miscellaneous control
Master clock CCLK System initialization INIT Byte high enable BHEN
Memory inhibit (2 lines) INH1 - INH2
Bus lock LOCK Bus arbitration
Bus request BREQ Common bus request CBRQ Bus busy BUSY
Bus clock BCLK Bus priority in BPRN Bus priority out BPRO
Power and ground (20 lines)
Multiprocessors Interprocessor
Arbitration
INTERPROCESSOR ARBITRATION STATIC ARBITRATION

Serial Arbitration Procedure


Highest priority
arbiter To next
arbiter 1 arbiter 2 arbiter13 PIarbiter
Bus PO 4 PI Bus PO PI Bus PO PI Bus PO

Bus busy line

Parallel Arbitration Procedure


Bus Bus Bus Bus arbiter 1 arbiter 2 arbiter 3 arbiter 4
Ack Req Ack Req Ack Req Ack Req
Bus busy line

4x2
Priority encoder

2x4
Decoder
Multiprocessors
INTERPROCESSOR ARBITRATION DYNAMIC ARBITRATION

Priorities of the units can be dynamically changeable while the system is in operation

Time Slice
Fixed length time slice is given sequentially to each processor, round-robin
fashion
Polling
Unit address polling - Bus controller advances the address to identify the requesting unit. When processor
that requires the access recognizes its address, it activates the bus busy line and then accesses the bus.
After a number of bus cycles, the polling continues by choosing a different processor.
LRU
The least recently used algorithm gives the highest priority to the requesting device that has not used bus
for the longest interval.
FIFO
The first come first serve scheme requests are served in the order received. The bus controller here
maintains a queue data structure.
Rotating Daisy Chain
Conventional Daisy Chain - Highest priority to the nearest unit to the bus
controller
Rotating Daisy Chain The PO output of the last device is connected to the PI of the first one. Highest
priority to the unit that is nearest to the unit that has most recently accessed the bus(it becomes the bus
controller)
Multiprocessors
INTERPROCESSOR COMMUNICATION
Interprocessor Communication Shared Memory
Receivin
g Processor
Sendin Communication Area

Processor g Mark
Receiver(s) Receivin
g Processor
Message
. .

Receiving
Processor

Interrupt
Receivin
Shared Memory g
Communication Area
Process Processor
or Sending
Receiver(s) Receivin
Instruction Mark g
Message
Processor.
.
Receiving
Processor
Multiprocessors
INTERPROCESSOR SYNCHRONIZATION
Synchronization
Communication of control information between processors
- To enforce the correct sequence of processes
- To ensure mutually exclusive access to shared writable data

Hardware Implementation

Mutual Exclusion with a Semaphore


Mutual Exclusion
- One processor to exclude or lock out access to shared resource by
other processors when it is in a Critical Section
- Critical Section is a program sequence that, once begun, must complete execution before
another processor accesses the same shared resource

Semaphore
- A binary variable
- 1: A processor is executing a critical section, that not available to other processors
0: Available to any requesting processor
- Software controlled Flag that is stored in memory that all processors can be access
Multiprocessors
SEMAPHORE
Testing and Setting the Semaphore

- Avoid two or more processors test or set the same semaphore


- May cause two or more processors enter the same critical section at the same time
- Must be implemented with an indivisible operation

R <- M[SEM] / Test semaphore / M[SEM] <- 1 / Set semaphore /

These are being done while locked, so that other processors cannot test and set while current
processor is being executing these instructions

If R=1, another processor is executing the critical section, the processor executed this
instruction does not access the shared memory

If R=0, available for access, set the semaphore to 1 and access

The last instruction in the program must clear the semaphore


CACHE COHERENCE
Caches are Coherent X = 52 Main memory
Bus

X = 52 X = 52 X = 52 Caches

P1 P2 P3 Processors

Cache Incoherency in X = 120 Main memory


Write Through Policy
Bus

X = 120 X = 52 X = 52 Caches

P1 P2 P3 Processors

Cache Incoherency in Write Back Policy X = 52 Main memory

Bus

X = 120 X = 52 X = 52 Caches

P1 P2 P3 Processors

Computer Computer Architectures


MAINTAINING CACHE COHERENCY
Shared Cache
- Disallow private cache
- Access time delay

Software Approaches
* Read-Only Data are Cacheable
- Private Cache is for Read-Only data
- Shared Writable Data are not cacheable
- Compiler tags data as cacheable and noncacheable
- Degrade performance due to software overhead

* Centralized Global Table


- Status of each memory block is maintained in CGT: RO(Read-Only); RW(Read and Write)
- All caches can have copies of RO blocks
- Only one cache can have a copy of RW block

Hardware Approaches
* Snoopy Cache Controller
- Cache Controllers monitor all the bus requests from CPUs and IOPs
- All caches attached to the bus monitor the write operations
- When a word in a cache is written, memory is also updated (write through)
- Local snoopy controllers in all other caches check their memory to determine if they have
a copy of that word; If they have, that location is marked invalid(future reference to this
location causes cache miss)

Você também pode gostar