Computer Architecture

UNIT IV
Memory organization
Memory management units (MMU):

Translates addresses
logical
address
CPU
memory
management
unit
physical
address
main
memory
Address translation :
Requires some sort of register/table to allow
arbitrary mappings of logical to physical
addresses.
Two basic schemes:
segmented
paged
Segmentation and paging can be combined
(x86).
Segmentation
Segment registers : Segment selectors
Selectors may be loaded into any of the six segment registers
(CS,DS,ES,SS,FS & GS)
Segment selectors point to a segment descriptors
segment descriptors contains addresses (linear address) (base : offset) and
control information.
Segment selectors
GDT Global descriptor table
LDT Local descriptor table
13 bit index
213 = 8192 segment descriptors
Protected mode operation
There is only one GDT in protected mode.

GDT is located in memory through use of the GDTR (global descriptor table
register) GDTR 6 bytes length (48 bits)
This register holds the 32-bit base address and 16-bit segment limit for the global
descriptor table (GDT).
Load GDT LGDT (instruction) Store GDT SGDT (instruction)
Protected mode tasks, may have their own LDT.
LDTR (Local descriptor table register) 16 bits length
LLDT (instruction) SLDT (instruction)
Segment descriptors
Base: Defines the location of the segment within the 4-gigabyte (GB)
physical address space. Base address = 32 bits (0 -31)
Limit: Defines the size of the segment. Segment limit = 20 bits (0-19)
Granularity bit:
1. If the Granularity bit is clear, the segment size is from 1 byte to 1
megabyte, in increments of 1 byte.
2. If the Granularity bit is set, the segment size is from 4 kilobytes to 4

gigabytes, in increments of 4K bytes.
P bit: Indicates whether the segment is present in memory.
DPL bits: Descriptor Privilege Level. Specify the privilege level required
to access the segment.
(00,01,10,11 levels)
00 highest
11 - lowest
S bit: Determines whether a given segment is a system segment or a code or
data segment.
If the S bit is set: code or a data segment.
If it is clear: a system segment.
Type: The interpretation of this field depends on whether the segment
descriptor is for an application segment or a system segment.
S bit is set: code or a data segment.

Type 4 bits E ED/C R/W A
E - Executable
ED - Expansion Direction
- Conforming
W - Writable
R - Readable
A - Accessed
0 = expand up 1 = expand down

(data seg)
(stack seg)
Check the privilege level and confirming the

execution
The processor automatically sets this bit

whenever a descriptor is referenced. The bit
cleared by software.
Data segment : E =0, ED, W, A

Code segment : E =1, C, R, A
D bit/B bit: default / big (16 bit /32 bit)

AVL bit: Available to the programmer. Used by the system designer.
System Descriptors
S=0
Segment Translation
Generating Linear Address
Base address: 00100000h

Segment limit: 000FFH
G=0
End address: 001000FFH
P=1 present
DPL =00 Highest privilege level
S=1 code or Data segment
Type 0010
E ED/C R/W A
0
0
1
0
Data segment : E =0, ED = up W = writable
Null selector : A selector that has an index value of zero and points to the GDT is
called a null selector.
Any access to the null selector an exception is generated.
Null descriptor: reserved.
It is not used to access the memory.
4 KB Paging Scheme
32 bit linear address
P present bit
P=1 page is in the RAM memory
P=0
page is in the disk memory
R / W bit
R/W = 1 Read write
R/W = 0 Read only
U / S bit
U /S = 1 User
U /S = 0 supervisor
A Accessed
This bit is set if a read or write was performed to the page

selected by the PDE & PTE.
D Dirty This bit is set if a write has been performed to the page selected
by the PTE.
AVL Available 3 bits are available for the programmer to use for any
purpose. Eg: counting the number of times the entry is
accessed.
Page Translation
4MB Paging Scheme
Privilege Levels
Protection is necessary for reliable multitasking.

Protection can be used to prevent tasks from interfering with each other.
For example, protection can keep one task from overwriting the instructions or data of
another task.
Protection can be applied to segments and pages.
Two bits in a processor register define the privilege level of the program currently running
(called the current privilege level or CPL).
The CPL is checked during address translation for segmentation and paging.
Selector - RPL
Descriptor - DPL
Four level privilege:
are used to perform protection checks each time

an address is generated.
level 0 highest level
level 3 lowest level
Program execute with a particular level of privilege.

CPL = current privilege level
The lower two bits of the CS register specify the CPL of the program.
The CPL is compared with the RPL and DPL during address generation to enforce
protection.
A less privileged program may not access higher privileged segments.

Special gates between rings are provided to allow an outer ring to access an inner rings
resources in a predefined manner as opposed to arbitrary usage.
Call gate:
A call gate has two main functions:
1. To define an entry point of a procedure.
2. To specify the privilege level required to enter a procedure.
SEGMENT-LEVEL PROTECTION
Each memory reference is checked to verify that it satisfies the protection checks.
All checks are made before the memory cycle is started; any violation prevents the
cycle from starting and results in an exception.
Because checks are performed in parallel with address translation, there is no
performance penalty.
There are five protection checks:
1. Type check- It is used to determine whether the current memory access is
allowed.
2. Limit check - It is used to check the limit.
3. Addressable domain check - CPL = 0 (highest privilege level)
RPL & DPL is in any level the segment is accessed.
4. Procedure entry points check - The procedure entry point check is performed
through the use of a call gate.
Call gates are used to control the transfer of execution between procedures of
different PLs.
5. Privileged instruction check - Some instructions are privileged and may only be
executed when the CPL = 0.
(LGDT,LLDT etc..)
Any violation of these protection checks results in an exception.
Page level protection

After the protection checks for segment address generation the page level protection
checks are performed.
Two checks:
1. Type check ( Read & write)
2. Addressable Domain check (via Privilege levels)
The PDE and PTE contain two bits that are used to perform these two checks.
Two protection bits U/S and R/W
access level 3
access level 2,1 or 0
U/S
R/W
0
0
none
read/write
0
1
none
1
0
read only
1
1
read/write
Instruction and Data caches

Cache is a small high-speed memory. Stores data from some frequently used
addresses (of main memory).
Cache hit Data found in cache. Results in data transfer at maximum speed.
Cache miss Data not found in cache. Processor loads data from Memory and
copies into cache. This results in extra delay, called miss penalty.
Hit ratio = percentage of memory accesses satisfied by the cache.
Miss ratio = 1-hit ratio
Average memory access time =

Hit ratio * Tcache + (1 Hit ratio) * (Tcache + TRAM )
RAM access time = 70 ns

Cache access time = 10 ns
Hit ratio =0.85
Assume there is no external cache.
Tavg = 0.85 * 10 + (1- 0.85) * (10 + 70)
= 20.5 ns
Cache Line : Cache is partitioned into lines (also called blocks). During
data transfer, a whole line is read or written.
Each line has a tag that indicates the address in Memory from which the line
has been copied
In Pentium processor
WB/WT# - (writeback/writethrough) - input pin
allows a data cache line to be defined as writeback (1) or writethrough
(0) on a line-by-line basis.
Writeback : Writing results only to the cache are called writeback.
Writethrough : Writing results to the cache and to main memory are called
Writethrough.
Types of Cache
1. Fully Associative
2. Direct Mapped
3. Set Associative
Sequential Access :
Start at the beginning and read through in order
Access time depends on location of data and previous location
Example: tape
Direct Access :
Individual blocks have unique address
Access is by jumping to vicinity then performing a sequential search
Access time depends on location of data within "block" and previous
location
Example: hard disk
Random access:
Each location has a unique address
Access time is independent of location or previous access
e.g. RAM
Associative access :
Data is retrieved based on a portion of its contents rather than its address
Access time is independent of location or previous access
e.g. cache
Performance
Transfer Rate : Rate at which data can be moved
For random-access memory, equal to 1/(cycle time)

For non-random-access memory, the following relationship holds:
TN = TA + N/R
where
TN = Average time to read or write N bits
TA = Average access time
N = Number of bits
R = Transfer rate, in bits per second(bps)
Fully Associative Cache

Allows any line in main memory
to be stored at any location in the
cache.
Main memory and cache are both
divided into lines of equal size.
Advantages:
No Contention
Easy to implement
Disadvantages:
Very expensive
Very wasteful of cache storage since
you must store full primary memory
address
No restriction on mapping from Memory to Cache.

It requires large number of comparators to check all the address.
Associative search of tags is expensive.
Feasible for very small size caches only (less than 4 K).
Some special purpose cache, such as the virtual memory Translation
Lookaside Buffer (TLB) is an associative cache.
Associative mapping works the best, but is complex to implement.
Direct-Mapped Cache
One way set associative cache.
Memory divided into cache pages
Page size and cache size both are equal.
Line 0 of any page - Line 0 of cache
Directly maps the memory line into an
equivalent cache line.
Direct has the lowest performance, but is
easiest to implement.
Direct is often used for instruction cache.
Less flexible
Advantages:
Low cost; doesnt require an associative
memory in hardware
Uses less cache space
Disadvantages:
Contention with main memory data
with same index bits.
Set-Associative Cache
Set associative is a compromise between the
other two.
The bigger the way the better the
performance, but the more complex and
expensive.
Combination of fully associative and direct
mapped caching schemes.
Divide the cache in to equal sections called
cache ways.
Not as expensive and complex as a fully associative approach.
Not as much contention as in a direct mapping approach.
Page size is equal to the size of the cache way.
Each cache way is treated like a small direct mapped cache.
Design of cache organization

Cache size : 4KB
Line size : 32 bytes
Physical address : 32 bit
Fully Associative Cache
32 bit physical address is divided
into two fields.
n = cache size / line size = number of lines

b = log2(line size) = bit for offset (consider a line size in terms of bytes)
remaining upper bits = tag address bits
Consider fully associate mapping

scheme with 27 bit tag and 5 bit offset
011111010111011100011011001 11000
Compare all tag fields for the value
011111010111011100011011001.
If a match is found, return byte 11000
(2410) of the line.
Direct Cache Addressing

b = log2(line size) = bit for offset
log2(number of lines) = bits for cache index
Direct mapping scheme with 20 bit tag, 7

bit index and 5 bit offset
01111101011101110001 1011001 11000
Compare the tag field of line 1011001

(8910) for the value
01111101011101110001.
If it matches, return byte 11000 (2410) of
the line.
Set Associative Mapping

b = log2(line size) = bit for offset
log2(number of lines) = bits for cache index

w = number of lines / set

s = n / w = number of sets
Two way set-associate mapping with 21 bit tag, 6 bit index and 5 bit
offset
011111010111011100011
011001
11000
Compare the tag fields of lines 011001 to 11000 for the value
011111010111011100011.
If a match is found, return byte 11000 (2410) of that line
Instruction & Data Cache of Pentium

Cache size : 8KB
Line size : 32 bytes
Physical address : 32 bits
Both caches are organized as
2-way set associative caches
128 sets, total 256 entries
Each entry in a set has its own

tag
Data Cache of Pentium

Tags in the data cache are triple ported
They can be accessed from 3 different places at the same time

U pipeline
V pipeline
Bus snooping
Each entry in data cache can be configured for write through or write-back
Parity bits are used to maintain data integrity
Each tag and every byte in data cache has its own parity bit.
Instruction Cache of Pentium

Instruction cache is write protected to prevent self-modifying code.
Tags in instruction cache are also triple ported
Two ports for split-line accesses
Third port for bus snooping
In Pentium (since CISC), instructions are of variable length(1-15bytes).
Multibyte instructions may straddle two sequential lines stored in code
cache.
Then it has to go for two sequential access which degrades performance.
Solution: Split line Access
Split-line Access
It permits upper half of one line and lower half of next to be fetched from
code cache in one clock cycle.
When split-line is read, the information is not correctly aligned.
The bytes need to be rotated so that prefetch queue receives instruction in

proper order.
Instruction boundaries within the cache line need to be defined
There is one parity bit for every 8 byte of data in instruction cache
Split-line Access
It permits upper half of one line and lower half of next to be fetched from
code cache in one clock cycle.
When split-line is read, the information is not correctly aligned.
The bytes need to be rotated so that prefetch queue receives instruction in

proper order.
Instruction boundaries within the cache line need to be defined
There is one parity bit for every 8 byte of data in instruction cache
Split-line Access
Effect of Line Width on Cache Performance

64KB
128KB
256KB
Variation of cache performance as a function of line size.
Effect of Associativity on Cache Performance

0.3
Miss rate
0.2
0.1
0
Direct
2-way
4-way
8-way
16-way
32-way
64-way
Associativity
Performance improvement of caches with increased associativity.
Typical Levels in a Hierarchical Memory

Capacity
Access latency
100s B
ns
Cost per GB
Regs
10s KB
a few ns
MBs
10s ns
100s MB
100s ns
10s GB
10s ms
TBs
min+
$Millions
Cache 1
Cache 2
Speed
gap
Main
Secondary
$100s Ks
$10s Ks
$1000s
$10s
Tertiary
Names and key characteristics of levels in a memory hierarchy.
$1s
One level cache

Average memory access time =
Hit ratio * Tcache + (1 Hit ratio) * (Tcache + TRAM )
H = hit rate
M = miss penalty (The total access time seen by the Processor when a miss occurs as the
miss penalty)
C= the time to access information in the cache
Two level cache

h1 = cache hit rate.
h2 = hit rate on L2.
Average memory access time:
tav = h1tL1 + (h2-h1)tL2 + (1- h2-h1)tmain
Multiprocessor System
When multiple processors are used in a single system, there needs to be a
mechanism whereby all processors agree on the contents of shared cache
information.
For e.g., two or more processors may utilize data from the same memory
location, X.
Each processor may change value of X, thus which value of X has to be
considered?
If each processor change the value of the data item, we have different
(incoherent) values of Xs data in each cache.
Solution : Cache Coherency Mechanism
A multiprocessor system with incoherent cache data
Types of Data
Clean Data : The data in the cache and the data in the main memory
both are same, the data in the cache is called clean data.
Dirty Data : The data is modified within cache but not modified in
main memory, the data in the cache is called dirty data.
Stale Data : The data is modified with in main memory but not
modified in cache, the data in the cache is called stale data.
Out of- date main memory Data: The data is modified within cache
but not modified in main memory, the data in the main memory is
called Out of- date main memory Data.
Cache Coherency
Pentiums mechanism is called MESI
(Modified/Exclusive/Shared/Invalid)Protocol.
The four states are defined as follows:

Modified:
The current line has been modified (does not match with main memory) and is
only available in a single cache.
Exclusive:
The current line has not been modified (matches with main memory)
and is only available in a single cache.
Writing to this line changes its state to modified
Shared:
Copies of the current line may exist in more than one cache.
A write to this line causes a write through to main memory and may invalidate
the copies in the other cache.
Invalid:
The current line is empty.
A read from this line will generate a miss.
This protocol uses two bits stored with each line of data to keep track of the
state of cache line
Only the shared and invalid states are used in code cache.
MESI protocol requires Pentium to monitor all accesses to main memory in a

multiprocessor system. This is called bus snooping.
Bus Snooping: It is used to maintain consistent data in a multiprocessor
system where each processor has a separate cache.
If the Processor 3 writes its local copy of X(30) back to memory, the memory
write cycle will be detected by the other 3 processors.
Each processor will then run an internal inquire cycle to determine whether its
data cache contains address of X.
Processor 1 and 2 then updates their cache based on individual MESI states.
Pentiums address lines are used as inputs during an inquire cycle to accomplish
bus snooping.
Cache consistency cycles

Inquire cycle
EADS# - (External address strobe) - input pin

This signal indicates that a valid external address has been driven
onto the Pentium processor address pins to be used for an inquire
cycle.
HIT# - (inquire cycle hit / miss) - output pin
The hit indication is driven to reflect the outcome of an inquire
cycle.
If an inquire cycle hits a valid line in either data or instruction cache.
asserted two clocks after EADS#.
HITM# - (hit / miss modified cache line) - output pin

The hit to a modified line output is driven to reflect the outcome of an
inquire cycle.
It is asserted after inquire cycles which resulted in a hit to a modified
line in the data cache.
INV (invalidation) - input pin
determines the final cache line state (S or I) in case of an inquire cycle hit.
It is sampled together with the address for the inquire cycle in the
clock EADS# is sampled active.
High cache line is invalidated
Low cache line is shared
Miss inv is no effect
Hit modified line line will be written back regardless of the state
of INV.
Cache Coherency Protocol Implementations

Snooping
used with low-end, bus-based MPs
few processors
centralized memory
Directory-based
used with higher-end MPs
more processors
distributed memory
When we write, should we write to cache or memory?
Write through cache :write to both cache and main memory.
Cache and memory are always consistent.
Write back cache :
write only to cache and set a dirty bit.

When the block gets replaced from the cache,
Snoop : when a cache is watching the address lines for transaction, this is
called a snoop.
This function allows the cache to see if any transactions are
accessing memory it contains within itself.
Snarf: when a cache takes the information from the data lines, the cache is
said to have snarfed the data.
This function allows the cache to be updated and maintain consistency
Replacement Algorithms
Once the cache has been filled, when a new block is brought into the cache, one of
the existing blocks must be replaced.
For direct mapping, there is only one possible line for any particular block, and no
choice is possible.
For the associative and set associative techniques, a replacement algorithm is needed.
To achieve high speed, such an algorithm must be implemented in hardware.
Least recently used (LRU): Replace that block in the set that has been in the cache
longest with no reference to it.
first-in-first-out (FIFO): Replace that block in the set that has been in the cache
longest.
FIFO is easily implemented as a round-robin or circular buffer technique. (Circular
counter)
least frequently used (LFU): Replace that block in the set that has experienced the
fewest references.
LFU could be implemented by associating a counter with each line.
A technique not based on usage (i.e., not LRU, LFU, FIFO, or some variant) is to
pick a line at random from among the candidate lines.
Random policy: simpler, but at the expense performance. Linear Feedback Shift
Register (LFSR)
Think of FIFO as cars going through a tunnel. The first car to go in the tunnel will
be the first one to go out the other side.
LRU cache :You will throw away items that you have not used for a long time, and
keep the ones that you use frequently.
LRU Algorithm
One or more bits are added to the cache entry to support the LRU algorithm.
One LRU bit & Two valid bits for two lines.
If any invalid line (out of two) is found out that is replaced with the newly
referred data.
If all the lines are valid a LRU line is replaced by the new one.
Four way set associative - LRU algorithm
FLUSH# - (Flush cycle) - input pin

cache flush input forces the Pentium processor to write back all
modified lines in the data cache and invalidate its internal caches.
A Flush Acknowledge special cycle will be generated by the Pentium
processor indicating completion of the write back and invalidation.
Byte enables indicate the type of bus cycle. BE4 is low and all other BEs are
high.
BE7 BE6 BE5 BE4 BE3 BE2 BE1 BE0
1
1 1
0
1
1
1
1
Cache instructions:
INVD invalidate cache
Effectively erases all the information in the data cache. (by marking
it all invalid).
WBIND - write back and invalidate cache

write back special cycle is driven after the WBIND instruction is
executed.
1
1 1 1
0
1
1
1
INVD instruction should be used with care. This instruction does not
write back modified cache lines.
Flush cycle is driven after the INVD and WBIND instructions are
executed.
1
1 1 1
1
1
0
1
write back cycle is generated followed by the flush cycle.
Importance of Hit Ratio

Effective memory time is:
Ta = hTc + (1 h)Tm
Speedup due to the cache is:

Sc = Tm / Ta
Example:
Assume main memory access time of 100ns and cache access time of 10ns and there
is a hit ratio of .9.
Ta = .9(10ns) + (1 - .9)(100ns) = 19ns
Sc = 100ns / 19ns = 5.26
Same as above only hit ratio is now .95 instead:
Ta = .95(10ns) + (1 - .95)(100ns) = 14.5ns
Sc = 100ns / 14.5ns = 6.9
Translation Look aside Buffer

Translates virtual addresses into physical addresses.
Physical addresses are used to access the cache and main memory.
TLBs are caches themselves.
Data cache contains two TLBs.
The first is 4 way set associative with 64 entries. (It is used to handle 4 KB pages)
The second TLB used by the data cache is 4 way set associative with 8 entries. (It is
used to handle 4 MB pages)
Both the TLBs are parity protected and dual ported.

Instruction cache uses a single 4 way set associative TLB with 32 entries.
Both 4 KB and 4 MB page addresses are supported. Entries are replaced using 3 bit LRU
algorithm.
Instruction : INVLPG Invalidate TLB entries.
hit
PA
VA
CPU
Translation
with a TLB
TLB
Lookup
miss
miss
Cache
Main
Memory
hit
Translation
data
1/2 t
20 t
Four-Way Set Associative Cache
28 = 256 sets each with four ways (each with one block)
31 30
...
13 12 11
...
22
Tag
Byte offset
2 1 0
Index
Index V Tag
0
1
2
.
.
.
253
254
255
V Tag
Data
0
1
2
.
.
.
253
254
255
V Tag
Data
V Tag
Data
0
1
2
.
.
.
253
254
255
Data
0
1
2
.
.
.
253
254
255
32
4x1 select
Hit
Data
bits
Summary of Memory Hierarchy

Cache memory:
provides illusion of
very high speed
Main memory:
reasonable cost,
but slow & small
Virtual memory:
provides illusion of
very large size
Virtual
memory
Main memory
Cache
Registers
Words
Lines
(transferred
explicitly
via load/store)
Pages
(transferred
automatically
upon cache miss)
(transferred
automatically
upon page fault)
Data movement in a memory hierarchy.
Locality
makes
the
illusions
work
Cache Unit in POWER PC

32 Kbytes, 8-way set associative
Unified (instruction and data in the same cache)

Line size 64 bytes
(512 entries and 64 sets)
Each line is divided into 2 8words sectors each of which can be

snooped, loaded, flushed or invalidated independently.
Cache coherency MESI protocol (sector has 2 bits)
LRU algorithm is used for cache line replacement.
Cache unit organization
MPC Block Diagram
Memory Management Unit

It supports up to 4 peta bytes (252 bytes) of virtual memory and 4 Gbytes (232
bytes) of physical memory.
3 Types of TLBs
1. Unified TLB (UTLB) -256 entry, two way set associative, for 4Kb
pages contains instruction and data address translations.
Supervisor software can invalidate UTLB entries selectively
2. Block TLB (BTLB) 4 entry, fully associative for blocks
(optionally 128 kb to 8 Mb) maintains address translations for
blocks of memory. BAT array is maintained by system software.
3. Instruction TLB (ITLB) - 4 entry, fully associative for up to 4
copies of most recently used ins address translation.
The hashed page table is a variable-sized data structure that defines the
mapping between virtual page numbers and physical page numbers.
The 601 provides hardware table search capability through the hashed page
table on UTLB misses.
Virtual memory is imaginary memory: it gives you the illusion of a memory

arrangement thats not physically there.
240 bytes. Tera bytes
A petabyte (PB) is 1015 bytes of data, 1,000 terabytes (TB) or 1,000,000
gigabytes (GB).
Memory management hardware must support paging and/or segmentation.

OS must be able to manage the movement of pages and/or segments between
secondary memory and main memory.
LA logical address
LA0 to LA19 are translated by the MMU into physical address bits PA0
to PA19.
Lower order LA20 to LA31 are directed to on chip cache.
Segment register is selected by LA0 to LA3
The 601 supports the following four main types of address translation:
Page address translationtranslates the page frame address for a 4-
Kbyte page size

Block address translationtranslates the block number for blocks that
range in size from 128 Kbyte to 8 Mbyte
I/O controller interface address translationused to generate I/O
controller interface accesses on the external bus
Direct address translationwhen address translation is disabled, the

physical address is identical to the logical address
Address Translation
Memory Unit
Contains read and write queues
Buffer operation between the external interface and the cache.
MULTITASKING
Multiple tasks are executing simultaneously.
Rapidly switching from task to task gives the impression that all tasks are
running at the same time.
Task state segment (TSS)
Special memory structure.

It stores all the 32 bit registers, 16 bit segment registers, additional
storage for the stack pointers and segment selectors for each
protection level task.
Task state segment (TSS)

Task register: 16 bit register
LTR Load task register
STR Store task register
LTR instruction is used to
initially access a task during
system initialization.
Later call and jump
instructions are used for task
switching.
During call and jump
instructions TR is loaded.
Static RAM
Two inverters are cross-connected to form a latch.

The latch is connected to two bit lines by transistors T1 and T2.
These transistors act as switches that can be opened or closed under control of the
word line.
Static RAMs can be accessed very quickly.
Access times on the order of a few nanoseconds
SRAMs are used in applications where speed is of critical concern.
Dynamic RAM
For the write operation, a voltage signal is applied to the bit line; a high voltage
represents 1, and a low voltage represents 0. A signal is then applied to the address
line, allowing a charge to be transferred to the capacitor.
For the read operation, when the address line is selected, the transistor turns
on and the charge stored on the capacitor is fed out onto a bit line and to a sense
amplifier. The sense amplifier compares the capacitor voltage to a reference value
and determines if the cell contains a logic 1 or a logic 0.
Nonvolatile Memory
S u p p ly vo l t a g e
ROM
PROM
EPROM
Word contents
1010
1001
Word
lines
0010
1101
B i t li nes
Read-only memory organization, with the fixed contents shown on the right.
A logic value 0 is stored in the cell if the transistor is connected to ground at

point P; otherwise, a 1 is stored.
The bit line is connected through a resistor to the power supply.
To read the state of the cell, the word line is activated to close the transistor switch.
As a result, the voltage on the bit line drops to near zero if there is a connection
between the transistor and ground.
If there is no connection to ground, the bit line remains at the high voltage level,
indicating a 1.
A sense circuit at the end of the bit line
generates the proper output value.
The state of the connection to ground in
each cell is determined when
the chip is manufactured.
Flash Memory
S o u r c e l i n es
Control gate
Floating gate
Source
Word
lines
n
p substrate
n+
B i t li nes
Drain
EEPROM or Flash memory organization. Each memory cell is built of a floatinggate MOS transistor.
Flash program and erase methods
Flash Memory Comparison
NOR (Code Executable in Place like Memory)
Fast read and slow write
NAND (Data-storage)
Fast write and lower cost
Flash Type
Code
Storage
File
Storage
NOR
-Intel/Sharp
-AMD/Fujitsu/Toshiba
NAND
-Samsung/Thoshiba
Performance
Important:
-High Random Access
-Byte Programming
Acceptable:
-Slow Programming
-Slow Erasing
Important:
-High Sped Programming
-High Speed Erasing
-High Speed Serial Read
Acceptable:
-Slow Random Access
Application
Program Storage
-Cellular Phone
-DVD, Set TOP Box for
BIOS
Small form factor

-Digital Still Camera
-Silicon Audio, PDA
-Mass storage as Silicon
Disk-Drive
95
Disk Memory Basics

Sector
Read/write head
Actuator
Recording area
Track c 1
Track 2
Track 1
Track 0
Arm
Direction of
rotation
Platter
Spindle
Disk memory elements and key terms.
The set of all the tracks in the

same relative position on the platter
is referred to as a cylinder.
For example, all of the shaded tracks
in Figure are part of one cylinder.
1-12 platters mounted on a spindle that rotates at speeds of 3600 to well over10,000
revolution per minute.
The access time to the data in a desired sector on the disk consists of three
components.
1. Seek time: On a movable head system, the time it takes to position the head at the
track is known as seek time.
2. The time it takes for the beginning of the sector to reach the head is known as
rotational delay, or rotational latency.
Rotational latency : The time for the disk to rotate until the beginning of the sector
data arrives under the read/write head.
3. Data transfer time: consisting of the time for the sector to pass under the head
which reads the bits on the fly.
Access Time for a Disk

3. Disk rotation until sector
has passed under the head:
Data transfer time (< 1 ms)
2. Disk rotation until the desired

sector arrives under the head:
Rotational latency (0-10s ms)
2
3
1. Head movement
from current position
to desired cylinder:
Seek time (0-10s ms)
Sector
Rotation
The three components of disk access time. Disks that spin faster have
a shorter average and worst-case access time.
Disk access latency= seek time + rotational latency

Disk capacity = surfaces x Tracks/surface x sectors/track x Bytes/sector
Calculate the capacity of a two-platter disk unit with 18,000 cylinders, an average
of 520 sectors per track, and a sector size of 512 B.
Two platters = 4 recording surfaces
Maximum raw capacity of the disk is = 4 x 18,000 x 520 x 512 B = 1.917 x 10 10 B
10% overhead or capacity wastage of gaps, sector number, and coding for CRC
TRANSFER TIME The transfer time to or from the disk depends on the rotation
speed of the disk in the following fashion:
where
T transfer time
b number of bytes to be transferred
N number of bytes on a track
r rotation speed, in revolutions per second
Thus the total average access time can be expressed as
where Ts is the average seek time.
The actual details of disk I/O operation depend on the computer system, the
operating system, and the nature of the I/O channel and disk controller hardware
Disk Arrays and RAID (Redundant Array of Independent Disks).

Multiple-disk database design
The need for high-capacity, high-throughput secondary (disk) memory
Processor
speed
RAM
size
Disk I/O
rate
Number of
disks
Disk
capacity
Number of
disks
1 GIPS
1 GB
100 MB/s
100 GB
1 TIPS
1 TB
100 GB/s
1000
100 TB
100
1 PIPS
1 PB
100 TB/s
1 Million
100 PB
100 000
1 EIPS
1 EB
100 PB/s
1 Billion
100 EB
100 Million
1 RAM byte
for each IPS
1 I/O bit per sec

for each IPS
100 disk bytes

for each RAM byte
Amdahls
rules of
thumb for
system
balance
I/O Organization: Accessing I/O devices, Input/output programming, Interrupts,

Exception Handling, DMA, Buses, I/O interfaces - Serial port, Parallel port, PCI bus,
SCSI bus, USB bus, Firewall and Infiniband, I/O peripherals.

Computer Architecture

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Computer Architecture

Enviado por

Direitos autorais:

Formatos disponíveis

UNIT IV

Memory management units (MMU):

Protected mode operation

There is only one GDT in protected mode.

2. If the Granularity bit is set, the segment size is from 4 kilobytes to 4

S bit is set: code or a data segment.

0 = expand up 1 = expand down

Check the privilege level and confirming the

The processor automatically sets this bit

Data segment : E =0, ED, W, A

D bit/B bit: default / big (16 bit /32 bit)

Base address: 00100000h

page is in the disk memory

R/W = 1 Read write

R/W = 0 Read only

This bit is set if a read or write was performed to the page

4MB Paging Scheme

Protection is necessary for reliable multitasking.

are used to perform protection checks each time

Program execute with a particular level of privilege.

A less privileged program may not access higher privileged segments.

Any violation of these protection checks results in an exception.

Page level protection

Instruction and Data caches

Miss ratio = 1-hit ratio

Average memory access time =

RAM access time = 70 ns

For random-access memory, equal to 1/(cycle time)

Fully Associative Cache

No restriction on mapping from Memory to Cache.

Design of cache organization

n = cache size / line size = number of lines

Consider fully associate mapping

Direct Cache Addressing

Direct mapping scheme with 20 bit tag, 7

Compare the tag field of line 1011001

Set Associative Mapping

n = cache size / line size = number of lines

log2(number of lines) = bits for cache index

w = number of lines / set

Instruction & Data Cache of Pentium

Each entry in a set has its own

Data Cache of Pentium

They can be accessed from 3 different places at the same time

Instruction Cache of Pentium

The bytes need to be rotated so that prefetch queue receives instruction in

The bytes need to be rotated so that prefetch queue receives instruction in

Effect of Line Width on Cache Performance

Variation of cache performance as a function of line size.

Effect of Associativity on Cache Performance

Performance improvement of caches with increased associativity.

Typical Levels in a Hierarchical Memory

Names and key characteristics of levels in a memory hierarchy.

One level cache

Two level cache

Solution : Cache Coherency Mechanism

A multiprocessor system with incoherent cache data

The four states are defined as follows:

MESI protocol requires Pentium to monitor all accesses to main memory in a

Cache consistency cycles

EADS# - (External address strobe) - input pin

HITM# - (hit / miss modified cache line) - output pin

Cache Coherency Protocol Implementations

write only to cache and set a dirty bit.