Você está na página 1de 66

Centre for Computer Technology

ICT123 Computer Architecture


Week 09

Virtual Memory Systems and Cache Systems

Content at a Glance
Review

week 8 Introduction Virtual Memory Paging Segmentation


Cache Direct Set
March 20, 2012

Associative

associative
Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Memory Hierarchy

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

SRAM v DRAM
Both Dynamic
Power

volatile cell

needed to preserve data

Simpler

Static

to build, smaller More dense Less expensive Needs refresh Larger memory units
Faster Cache

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Types of ROM
ROM

- Written during manufacture, Very expensive for small runs PROM - Programmable (once), Needs special equipment to program Read mostly Erasable Programmable (EPROM), Erased by UV Electrically Erasable (EEPROM), Takes much longer to write than read
March 20, 2012

Flash

memory, Erase whole memory electrically


Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Hard Failure
Permanent Random,

Error Correction
defect

Soft

non-destructive No permanent damage to memory

Error

Detected

using CRC or Hamming error correcting code

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Virtual Memory
Use of

main memory and disk space to provide the illusion of endless amount of physical memory. Memory on hard disk Allows for effective multiprogramming and relieves the user of tight constraints of main memory
March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

A System with Physical Memory Only Memory


Physical Addresses 0: 1:

CPU

N-1:

Addresses generated by the CPU correspond directly to bytes in physical memory Examples include, Most Cray machines, early PCs, some embedded systems
(Virtual Memory, CS 105, Tour of the Black Holes of Computing!) Richard Salomon, Sudipto Mitra Copyright Box Hill Institute
March 20, 2012

A System with Virtual Memory Memory


Page Table Virtual Addresses 0: 1: Physical Addresses 0: 1:

CPU

P-1:

N-1: Disk

Address Translation: Hardware converts virtual addresses to physical addresses via OS-managed lookup table (page table)
March 20, 2012 Richard Salomon, Sudipto Mitra Examples include, Workstations, servers, modern PCs, etc.

Copyright Box the Black (Virtual Memory, CS 105, Tour of Hill InstituteHoles of Computing!)

Virtual Memory
Combination

of physical memory and disk space to create a memory image for a running application Divided into logical segments of variable size (segmentation) or fixed length pages (pagination) Paged systems are easier to manage, and are made transparent to the application (user). Virtual memory management is an operating system issue
March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Paging
Split

memory into equal sized, small chunks -page frames, typically 4k each Split programs (processes) into equal sized small chunks - pages Allocate a number of page frames to a process Operating System maintains list of free frames A process does not require contiguous page frames Use page table to keep track
March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Disable Caching

Ref.

Dirty Protection Present/Absent

Page Frame #

Present/Absent bit Is page loaded in main memory? Page Frame # - physical page frame where virtual page is loaded Protection read, write, executable

If access is not allowed, produce segmentation fault

dirty bit has page been modified?

If yes, then need to write back to disk when page is swapped out of physical memory Helps OS decide whether or not to swap page out

Referenced set when page is accessed

Cached copy may be invalid if page is mapped to I/O device and will March 20, 2012 change often Richard Salomon, Sudipto Mitra I/O) (memory-mapped Copyright Box Hill Institute

(Memory Hierarchy, Quynh Dinh)

Disable caching if set, do not use cached copy

Allocation of Free Frames

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Logical and Physical Addresses Paging

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Virtual Memory
Demand
Do

not require all pages of a process in memory Bring in pages as required

paging

Page

Required

March 20, 2012

page is not in memory Operating System must swap in required page May need to swap out a page to make space Select page to throw out based on recent history
Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

fault

Too

many processes in too little memory Operating System spends all its time swapping Little or no real work is done Disk light is on (flickering) all the time
Solutions
Good

Thrashing

page replacement algorithms Reduce number of processes running Fit more memory
Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

March 20, 2012

Bonus of Demand Paging


We

do not need all of a process in memory for it to run We can swap in pages as required So - we can now run processes that are bigger than total memory available!
Main memory is

called real memory User/programmer sees much bigger memory - virtual memory
March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Translation Look-aside Buffer (TLB)


Every

virtual memory reference causes two physical memory access


Fetch

page table entry Fetch data


Contains

Use special

March 20, 2012

page table entries that have been most recently used By the principle of locality, most references will be to locations in recently Richard Sudipto used pages Salomon,CopyrightMitraHill Institute Box

cache for page table TLB

TLB Operation

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Segments

are multiple address spaces of variable dynamic size Paging is not (usually) visible to the programmer Segmentation is visible to the programmer Usually different segments are allocated to program and data May be have a number of program and data segments
March 20, 2012

Segmentation

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Advantages of Segmentation
Simplifies

handling of growing data structures Allows programs to be altered and recompiled independently, without relinking and re-loading Lends itself to sharing among processes Lends itself to protection Some systems combine segmentation with paging
March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Hardware for segmentation and paging Unsegmented unpaged


Pentium II

Unsegmented paged

virtual address = physical address Low complexity High performance

Segmented unpaged

Memory viewed as paged linear address space Protection and management via paging Berkeley UNIX Collection of local address spaces Protection to single byte level Translation table needed is on chip when segment is in memory

Segmentation used to define logical memory partitions subject to access control Paging manages allocation of memory within partitions Unix System V Richard Salomon, Sudipto Mitra March 20, 2012
Copyright Box Hill Institute

Segmented paged

Pentium II Address Translation Mechanism

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Pentium II Segmentation
Each

virtual address is 16-bit segment and 32-bit offset 2 bits of segment are protection mechanism 14 bits specify segment Unsegmented virtual memory 232 = 4Gbytes Segmented 246=64 terabytes
Can

be larger depends on which process is active Half (8K segments of 4Gbytes) is global Half is local and distinct for each process
Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

March 20, 2012

Pentium II Protection
Protection bits
0

most protected, 3 least Use of levels software dependent Usually level 3 for applications, level 1 for O/S and level 0 for kernel (level 2 not used) Level 2 may be used for apps that have internal security e.g. database Some instructions only work in level 0
Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

give 4 levels of privilege

March 20, 2012

Segmentation
In

Pentium II Paging
may be disabled

which case linear address space is used page directory

Two

First,

level page table lookup

1024

entries max Splits 4G linear memory into 1024 page groups of 4Mbyte Each page table has 1024 entries corresponding to 4Kbyte pages Can use one page directory for all processes, one per process or mixture Page directory for current process always in memory

TLB holding 32 page table entries Two March 20, 2012 page sizes available 4k or 4M Richard Salomon, Sudipto Mitra
Copyright Box Hill Institute

Use

PowerPC Memory Management Hardware


32

bit paging with simple segmentation 64 bit paging with more powerful segmentation both do block address translation Map 4 large blocks of instructions & 4 of memory to bypass paging e.g. OS tables or graphics frame buffers 32 bit effective address 12 bit byte selector 4kbyte pages 16 bit page id 64k pages per segment 4 bits indicate one of 16 segment registers
March 20, 2012

Segment

registers under OS control Richard Salomon, Sudipto Mitra


Copyright Box Hill Institute

PowerPC 32-bit Memory Management Formats

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

PowerPC 32-bit Address Translation

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Current Memory Hierarchy


Processor Control Datapath L2 Cache Main Memory
L1 cache regs

Secondary Memory

Speed(ns): 0.5ns 2ns 6ns 100ns 10,000,000ns Size (MB): 0.0005 0.05 1-4 100-1000 100,000 Cost ($/MB): -$100 $30 $1 $0.05 Technology: Regs SRAM SRAM DRAM Disk
(Lecture 13: Memory HierarchyWays to Reduce Misses DAP Spr.98 UCB)
March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

So you want fast?


It

is possible to build a computer which uses only static RAM This would be very fast This would need no cache
How

can you cache cache?

This

would cost a very large amount Impractical !


March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Cache
Small

amount of fast memory Sits between normal main memory and CPU May be located on CPU chip or module

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Cache/Main Memory Structure

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Cache operation overview


1. 2. 3. 4. 5. 6.

CPU requests contents of memory location Check cache for this data If present, get from cache (fast) If not present, read required block from main memory to cache Then deliver from cache to CPU Cache includes tags to identify which block of main memory is in each cache slot
Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

March 20, 2012

Cache Read Operation - Flowchart

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Cache Design Issues


Size Mapping

Function Replacement Algorithm Write Policy Block Size Number of Caches

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Size does matter


Cost
More More

cache is expensive

Speed

cache is faster (up to a point) Checking cache for data takes time

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Typical Cache Organization

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Processor IBM 360/85 PDP-11/70 VAX 11/780 IBM 3033 IBM 3090 Intel 80486 Pentium PowerPC 601 PowerPC 620 PowerPC G4 IBM S/390 G4 IBM S/390 G6 Pentium 4 IBM SP CRAY MTAb Itanium SGI Origin 2001 Itanium 2 IBM POWER5 CRAY XD-1
a

Type Mainframe Minicomputer Minicomputer Mainframe Mainframe PC PC PC PC PC/server Mainframe Mainframe PC/server High-end server/ supercomputer Supercomputer PC/server High-end server PC/server High-end server Supercomputer

Year of Introduction 1968 1975 1978 1978 1985 1989 1993 1993 1996 1999 1997 1999 2000 2000 2000 2001 2001 2002 2003 2004

L1 cachea 16 to 32 KB 1 KB 16 KB 64 KB 128 to 256 KB 8 KB 8 KB/8 KB 32 KB 32 KB/32 KB 32 KB/32 KB 32 KB 256 KB 8 KB/8 KB 64 KB/32 KB 8 KB 16 KB/16 KB 32 KB/32 KB 32 KB 64 KB 64 KB/64 KB

L2 cache 256 to 512 KB 256 KB to 1 MB 256 KB 8 MB 256 KB 8 MB 2 MB 96 KB 4 MB 256 KB 1.9 MB 1MB

L3 cache 2 MB 2 MB 4 MB 6 MB 36 MB

Two values seperated by a slash refer to instruction and data caches b Both caches are instruction only; no data caches
March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Mapping Function Example


Cache
i.e.

of 64kByte Cache block of 4 bytes


16MBytes

cache is 16k (214) lines of 4 bytes

main memory 24 bit address


(224=16M)

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Direct Mapping
Each

block of main memory maps to only one cache line i.e. if a block is in cache, it must be in one specific place Address is in two parts Least Significant w bits identify unique word Most Significant s bits specify one memory block The MSBs are split into a cache line field r and a tag of s-r (most significant)
March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Direct Mapping Address Structure


Tag s-r 8

Line or Slot r 14

Word w 2

24 bit address 2 bit word identifier (4 byte block) 22 bit block identifier

8 bit tag (=22-14) 14 bit slot or line

No two blocks in the same line have the same Tag field Check contents of cache by finding line and checking Tag
March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Direct Mapping Cache Line Table


Cache line 0 1 . . . m-1
March 20, 2012

Main Memory blocks held 0, m, 2m, 3m2s-m 1,m+1, 2m+12s-m+1 . . . m-1, 2m-1,3m-12s-1

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Block Address

Main Memory

Cache Index

4-Block Direct Mapped Cache

0 1 0010 2 3 4 5 0110 6 7 8 9 1010 10 11 12 13 1110 14 15


March 20, 2012

0 1 2 3
Memory block address tag
index

index

cache index = (address) mod (# blocks) If number of cache blocks is power of 2, then cache index is just the lower n bits of Richard Salomon, Sudipto Mitra memory address [ n = log2(# Copyright Box Hill Institute

determines block in

Direct Mapping Example

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Direct Mapping pros and cons


Simple Fixed
If

Inexpensive

location for given block Problem

a program accesses 2 blocks that map to the same line repeatedly, cache misses are very high

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Another Extreme: Fully Associative


Fully
Omit

cache index; place item in any block! Compare all Cache Tags in parallel

Associative Cache (8 word block)

31

4 0 Cache Tag (27 bits long) Byte Offset


B 31

= = = :
March 20, 2012

= = : :

Cache Tag Valid Cache Data


B1 B0

(Lecture 13: Memory HierarchyWays to Reduce Misses DAP Spr.98 UCB) Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Fully Associative Cache


Must

search all tags in cache, as item can be in any cache block Search for tag must be done by hardware in parallel (other searches too slow) But, the necessary parallel comparator hardware is very expensive Therefore, fully associative placement practical only for a very small cache
(Lecture 13: Memory HierarchyWays to Reduce Misses DAP Spr.98 UCB)
March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Compromise: N-way Set Associative Cache


N-way
Like

set associative: N cache blocks for each Cache Index

having N direct mapped caches operating in parallel Select the one that gets a hit
Cache

Example:

Index selects a set of 2 blocks from the cache The 2 tags in set are compared in parallel Data is selected based on the tag result (which matched the address)
(Lecture 13: Memory HierarchyWays to Reduce Misses DAP Spr.98 UCB) Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

2-way set associative cache

March 20, 2012

Example: 2-way Set Associative Cache


tag index offset address

Valid Cache Tag Cache Data Block 0

Cache Data Cache Tag Valid Block 0

mux Cache Block Richard Salomon, Sudipto Mitra Hit Copyright Box Hill Institute

March 20, 2012

(Lecture 13: Memory HierarchyWays to Reduce Misses DAP Spr.98 UCB)

Set Associative Cache Contd.


Direct

Mapped, Fully Associative can be seen as just variations of Set Associative block placement strategy Direct Mapped = 1-way Set Associative Cache Fully Associative = n-way Set associativity for a cache with exactly n blocks
March 20, 2012

(Lecture 13: Memory HierarchyWays to Reduce Misses DAP Spr.98 UCB) Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Replacement Algorithms (Direct mapping)


No

choice Each block only maps to one line Replace that line

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Two Way Set Associative Mapping Example

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Replacement Algorithms (Associative & Set Associative)


Hardware

implemented algorithm (speed) Least Recently Used (LRU) e.g. in a 2 way set associative, which of the 2 blocks is LRU? First in first out (FIFO) replace block that has been in cache longest Least frequently used replace block which has had fewest hits Random Richard Salomon, Sudipto Mitra
March 20, 2012

Copyright Box Hill Institute

Write Policy
Must

not overwrite a cache block unless main memory is up to date Multiple CPUs may have individual caches I/O may address main memory directly

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Write through
All

writes go to main memory as well as cache Multiple CPUs can monitor main memory traffic to keep local (to CPU) cache up to date Lots of traffic Slows down writes
March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Updates

initially made in cache only Update bit for cache slot is set when update occurs If block is to be replaced, write to main memory only if update bit is set Other caches get out of sync I/O must access main memory through cache N.B. 15% of memory references are writes
March 20, 2012

Write back

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Pentium 4 Cache (1)


80386

no on chip cache 80486 8k using 16 byte lines and four way set associative organization Pentium (all versions) two on chip L1 caches
Pentium
Data

& instructions

III L3 cache added off chip

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Pentium 4
L1
8k

Pentium 4 Cache (2)


bytes 64 byte lines four way set associative
Feeding 256k 128

caches

L2

cache

both L1 caches

L3
March 20, 2012

byte lines 8 way set associative

cache on chip

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Pentium 4 Block Diagram

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

PowerPC Cache Organization (1)


601

single 32kb 8 way set associative 603 16kb (2 x 8kb) two way set associative 604 32kb 620 64kb

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

PowerPC Cache Organization (2)


G3 & G4
64kb
8

way set associative way set associative

L1 cache

256k,
two

512k or 1M L2 cache

G5

32kB

instruction cache 64kB data cache


Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

March 20, 2012

PowerPC G5 Block Diagram

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Summary

Combination of physical memory and disk space to create a memory image for a running application Pages are programs (processes) split into equal sized small chunks A very fast, but expensive memory in close proximity to the CPU In direct mapping, each block of main memory maps to only one cache line In associative mapping, a search of all tags in cache is necessary, as the required item can be in any cache block
Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

March 20, 2012

Reference
Stallings

William, 2003, Computer Organization & Architecture designing for performance, 6th edn, Pearson Education, Inc. ISBN 0-13-049307-4 [chapter 4, 5 & 6] Virtual Memory, CS 105, Tour of the Black Holes of Computing! Lecture 13: Memory HierarchyWays to Reduce Misses DAP Spr.98 UCB Memory Hierarchy, Quynh Dinh
March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Internet Sources
Manufacturer sites
Intel IBM/Motorola

Search

on cache

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Você também pode gostar