Apl Ps

Cache and Memory Hierarchy Design
" Based on the book Cache and Memory Hierarchy

Design: A Performance−Directed Approach by
Steven A. Przybylski Morgan Kaufmann
Publishers, Inc. 1990
" How to build performant caches
" Simulated and analytical approach
" One− and two−level caches
" MicroVAX and 25MHz R2000 are the state−of−
the−art
The Program Traces
" Acquired from MicroVAX and
R2000
" MicroVAX CPU microcode
modified to capture the memory
reference traces
" For R2000, the basic blocks
were instrumented to record the
traces
" Caches simulated with both
traces, geometric mean
presented as results
Traces Continued
" The figures depict the amount

of unique addresses referenced
as a function of total memory
references
" Gives insight only to cache
warming − no temporal affinity
described at all
Program Execution Time
" The program execution time in cycles is:
Nexecute + Nifetch + Nifetch×m(C)×nMMread
+ Nload + Nload×m(C)×nMMread
+ Nstore×nL1write
" Nexecute = 0!
" Cache miss types:
Non−stationary: first access
Intrinsic: cache not large enough
Extrinsic: other processes etc.
Cache Implementation
The Real World
" Trace lengths, physical
organisation etc. dictate the
minimum possible cycle
time
" For example, 8k by 1b
SRAM faster than 1k by 8b
" Different levels of
integration
The Simulated System
" Only a single level of
cache!
" Write buffers, write
policy as inobtrusive as
possible
" Word size 32bit
" Virtually addressed
" TLB not taken into
account
The Two−level Cache
" Much more complicated
" 130 parameters
" 300−400 statistics
" Caches at different levels
interact in non−trivial ways
" Asynchronous memory
Speed−Size Tradeoff
" Larger cache obviously
decreases the miss ratio, but
building larger and larger
caches is not free
" The hardware implementor
has a choice of some cache
sizes and respective cycle
times, which one is the best?
Speed−Size Tradeoff continued
" Anomaly when
synchronizing the
asynchronous main memory
to the synchronous CPU bus
So What Is The Optimum?
" Lines of equal performance
in the cycle time − cache
size design space
" Most practical caches tend
to be between 32kB and
128kB
Speed − Set Size Tradeoff
" Set size 2 gives a clear
benefit, but beyond that
little gain?
" Prevents pathological
situations
" Cache virtually addressed −
the processes lie mostly in
the same parts of the
address space!
For large cache sizes, most
misses are extrinsic
Speed − Set Size Tradeoff cont’d
" The significance of set size
to the execution time
Speed − Set Size Tradeoff cont’d
" Break−even
implementations: how
much can the cycle time be
increased for getting a set
size of 2 or 8 and still have
the same performance
" Associativity requires
multiplexing − less of a
problem in an integrated
solution
Block Size
" 64kB/64kB split I/D cache
" 32W block size optimum
for loads, 64W for ifetches
" The overall performance
suffers because of the
increasing cost of a cache
miss
Block Size continued
Block Size − Cycle Penalty
" What if the larger block
size affects cycle time?
More traces, wider SRAM
chips
" 1% degradation would
already change some
decision points
" 5% degradation
effectively halves the
optimum block size
Fetch Size
" The optimal fetch size is
just slightly larger than
the optimal block size
" With smaller blocks, the
writes issued at a fetch
(now spanning several
blocks) become obtrusive
" In practice, fetch size =
block size, or in a rare
case, fetch size = 2 ×
block size
Putting It All Together: A Cache
Comparison Chart
Another View
Guidelines
" 1: Cache size dominates the other organizational parameters in
determining performance just as it does in determining miss
rates.
" 2: For a single−level cache hierarchy, optimal cache sizes are
likely to be in the 32kB to 128kB range, depending on the
relationship between cache size and cycle time. Also, after any
cycle time penalty is accounted for, set−associative caches
seldom perform better then their direct−mapped counterparts.
" 3: Focus on the fetch size rather than the block size. Typically,
fetch sizes of 4, 8 or occasionally 16 words are best, depending
on the memory characteristics. Simple fetch strategies with the
right fetch size make very good use of the available memory
bandwidth
Guidelines continued
" 4: To clarify design tradeoffs, translate prospective changes in the
associativity and block size into equivalent changes in the cycle
time or cache size.
" 5: To increase the performance beyond that possible with a single−
level cache, or to decrease the size and cycle time of the optimal
first−level cache, investigate multi−level cache hierarchies.
" 6: The final cache in a multi−level cache hierarchy is likely to be
somewhat bigger and more set−associative then the optimal single−
level cache for the same memory system.
" 7: First−level caches smaller than 8kB, with a short cycle time, are
viable only if a second−level cache can be built that has a
significantly lower miss rate and an access time that is no more
than two or three times the CPU cycle time.
" 8: Design the memory hierarchy in tandem with the CPU.
Guidelines
" 9: Mistrust broad, sweeping, guidelines.

Apl Ps

Enviado por

Dados do documento

Descrição original:

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Apl Ps

Enviado por

Direitos autorais:

Formatos disponíveis

Cache and Memory Hierarchy Design

" Based on the book Cache and Memory Hierarchy

" The figures depict the amount

Você também pode gostar