Escolar Documentos
Profissional Documentos
Cultura Documentos
Introduction (cont.)
Cache definition Memory chip part of the Processor
Same technology Speed: same order of magnitude as accessing Registers Relatively small and expensive Acts like an HASH function : holds part of the address spaces.
Introduction (cont.)
Cache memories Main idea
When processor needs instruction or data it first looks for it in the cache. If that fails, it brings the data from the main memory to the cache and uses it from there. Address space is partitioned into blocks Cache holds lines, each line holds a block
A block may not exist in the cache -> cache miss
Introduction (cont.)
Cache aim
Fast access time Fast search mechanism High Hit-Ratio Highly effective replacement mechanism High Adaptability - fast replacement of not need lines Long sighted - estimation if a block will be used in future
VLSI Project Winter 2005/2006 4
Project Objective
Develop an LRFU caching mechanism Implementation of a cache entrance filtering technique Compare and analyze against LRU Researching various configurations of LRFU , on order to achieve maximum hit rate
Project Requirements
Develop for SimpleScalar platform to simulate processor caches Run developed caching & filtering mechanisms on accepted benchmarks C language No hardware components equivalence needed, software implementation only
VLSI Project Winter 2005/2006 6
LFU
Advantage
Long sighted Smarter
Disadvantage
Short sighted
Disadvantages
Cache pollution Requires many cycles More memory needed
8
Goal
A replacement algorithm that allows a flexible trade-off between recency and frequency
Development Stages
1. 2. 3. 4. 5. 6. Studying the background Learning SimpleScalar sim-cache platform Develop LRFU caching algorithm for SimpleScalar Develop filtering policy Benchmarking (smart environment) Analyzing various LRFU configurations and comparison with LRU algorithm
10
Principles
The LRFU policy associates a value with each block. This value quantifies the likelihood that the block will be referenced in the near future. Each reference to a block in the past adds a contribution to this value and its contribution is determined by a weighing function F.
1
Current time tc
time t1 t2
t3
tc - t3
11
Principles (cont)
Weighing function F(x) = (1/2)x
Monotonically decreasing Subsume LRU and LFU When = 0, (i.e. F(x) = 1), then it becomes LFU When = 1, (i.e. F(x) = (1/2)x), then it becomes LRU
When 0 < < 1, it is between LFU and LRU F(x) F(x) = 1 (LFU extreme)
1
X
current time - reference time
12
Principles (cont)
Update of C(block) over time
Only two counters for each block are needed to calculate C(block)
Proof:
t1
2
t2
t2
= (t2 - t1)
time
3
t3
t1
= C t1(b) x F ()
VLSI Project Winter 2005/2006 13
Filter Insert into cache Insert Data removed from cache by LRFU
14
15
Hardware budget
Counters
Each block in cache requires two bounded counters
Previous C(t) Time that passed from previous access
Victims cache
The size will be based on empirical analysis
16
Algorithms
Filtering
We implemented a very simple filtering algorithm, whose single task is to cause less changes in cache.
After a cache miss, the brought block is entered in cache with a probability 0<p<1, p configurable. If the block is not entered in cache , is entered automatically in victims cache.
Replacement
After a cache miss, C(t) is calculated for each block in set and the one with the smallest C(t) is selected for replacement.
17
Results
Hit Rate
Results (cont)
Hit rate
Special Problems
Software simulation of hardware
Utilizing existing data structures of SimpleScalar
20
Conclusions
We implemented a different cache replacement mechanism and received exciting results Hardware implementation of the mechanism is hard, but possible The Implementation achieved the goals
Subsumes both the LRU and LFU algorithms Yields better performance than them (up to 30%!!!)
21
Future Research
Implementation of better filtering techniques Dynamic version of the LRFU algorithm Adjust periodically depending on the evolution of workload Research of hardware needed for LRFU
22