Alex Marcel

VLSI Project
Least Recently Frequently Used Caching Algorithm with Filtering Policies

Alexander Zlotnik Marcel Apfelbaum
Supervised by: Michael Behar, Winter 2005/2006

VLSI Project Winter 2005/2006 1
Introduction (cont.)
Cache definition Memory chip part of the Processor
Same technology Speed: same order of magnitude as accessing Registers Relatively small and expensive Acts like an HASH function : holds part of the address spaces.
VLSI Project Winter 2005/2006
Cache memories Main idea
When processor needs instruction or data it first looks for it in the cache. If that fails, it brings the data from the main memory to the cache and uses it from there. Address space is partitioned into blocks Cache holds lines, each line holds a block
A block may not exist in the cache -> cache miss
If we miss the Cache

Entire block is fetched into a line buffer, and then put into the cache Before putting the new block in the cache, another block may need to be evicted from the cache (to make room for the new block)
Cache aim
Fast access time Fast search mechanism High Hit-Ratio Highly effective replacement mechanism High Adaptability - fast replacement of not need lines Long sighted - estimation if a block will be used in future
Project Objective
Develop an LRFU caching mechanism Implementation of a cache entrance filtering technique Compare and analyze against LRU Researching various configurations of LRFU , on order to achieve maximum hit rate
Project Requirements
Develop for SimpleScalar platform to simulate processor caches Run developed caching & filtering mechanisms on accepted benchmarks C language No hardware components equivalence needed, software implementation only
Background and Theory

Cache Replacement options:
FIFO, LRU, Random, Pseudo LRU, LFU
Currently used algorithms:

LRU(2 ways requires 1 bit per set to mark latest accessed) Pseudo LRU (4 ways and more, Fully associative)
Pseudo LRU (4-way example)

Bit 0 specify if way is (0,1) or (2,3) Bit 1 specify who was between 0 and 1 Bit 2 specify who was between 2 and 3
Bit 0 Bit 1 Bit 2

7
Background and Theory (cont)

LRU
Advantages
High Adaptability 1 cycle algorithm Low memory usage
LFU
Advantage
Long sighted Smarter
Disadvantage
Short sighted
Disadvantages
Cache pollution Requires many cycles More memory needed
8
Background and Theory (cont)

Observation
Both recency and frequency affect the likelihood of future references
Goal
A replacement algorithm that allows a flexible trade-off between recency and frequency
The idea: LRFU (Least Recently/Frequently Used)

Subsumes both LRU and LFU algorithms Overcome the cycles used by LFU by filtering Cache entrances Yields better performance than them
Development Stages
1. 2. 3. 4. 5. 6. Studying the background Learning SimpleScalar sim-cache platform Develop LRFU caching algorithm for SimpleScalar Develop filtering policy Benchmarking (smart environment) Analyzing various LRFU configurations and comparison with LRU algorithm
10
Principles
The LRFU policy associates a value with each block. This value quantifies the likelihood that the block will be referenced in the near future. Each reference to a block in the past adds a contribution to this value and its contribution is determined by a weighing function F.
1
Current time tc
time t1 t2
t3
Ctc(block) = F(||1) + F(2) + F(||3) ||

t c - t1 tc - t2
tc - t3
11
Principles (cont)
Weighing function F(x) = (1/2)x
Monotonically decreasing Subsume LRU and LFU When = 0, (i.e. F(x) = 1), then it becomes LFU When = 1, (i.e. F(x) = (1/2)x), then it becomes LRU
When 0 < < 1, it is between LFU and LRU F(x) F(x) = 1 (LFU extreme)
1
Spectrum (LRU/LFU) F(x) = (1/2)x (LRU extreme)

X
current time - reference time
12
Principles (cont)
Update of C(block) over time
Only two counters for each block are needed to calculate C(block)
Proof:
t1
2
t2
t2
= (t2 - t1)
time
3
t3
t1
C t2(b) = F (1+) + F (2+) + F (3+) = (1/2)(1+ ) +
(1/2) (2+ ) + (1/2) (3+ ) = ((1/2)1 + (1/2)2 + (1/2)3 ) (1/2)
= C t1(b) x F ()
Design and Implementation

Filtering
Data Address In cache END In cache Not in cache
In Victims cache ? Not in cache
END Filter out Insert Data into Victims Cache
Filter Insert into cache Insert Data removed from cache by LRFU
14
Design and Implementation (cont)

Data structure
LRFU uses for each block two BOUNDED counters
15
Hardware budget
Counters
Each block in cache requires two bounded counters
Previous C(t) Time that passed from previous access
Victims cache
The size will be based on empirical analysis
16
Algorithms
Filtering
We implemented a very simple filtering algorithm, whose single task is to cause less changes in cache.
After a cache miss, the brought block is entered in cache with a probability 0<p<1, p configurable. If the block is not entered in cache , is entered automatically in victims cache.
Replacement
After a cache miss, C(t) is calculated for each block in set and the one with the smallest C(t) is selected for replacement.
17
Results
Hit Rate
Cache Size (# of blocks)

Results (cont)
Hit rate
Special Problems
Software simulation of hardware
Utilizing existing data structures of SimpleScalar
Finding the perfect C(t)

Applying mathematical theory into practice
20
Conclusions
We implemented a different cache replacement mechanism and received exciting results Hardware implementation of the mechanism is hard, but possible The Implementation achieved the goals
Subsumes both the LRU and LFU algorithms Yields better performance than them (up to 30%!!!)
21
Future Research
Implementation of better filtering techniques Dynamic version of the LRFU algorithm Adjust periodically depending on the evolution of workload Research of hardware needed for LRFU
22

Alex Marcel

Enviado por

Dados do documento

Descrição original:

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Alex Marcel

Enviado por

Direitos autorais:

Formatos disponíveis

VLSI Project

Least Recently Frequently Used Caching Algorithm with Filtering Policies

Supervised by: Michael Behar, Winter 2005/2006

VLSI Project Winter 2005/2006

If we miss the Cache

VLSI Project Winter 2005/2006

Background and Theory

Currently used algorithms:

Pseudo LRU (4-way example)

Bit 0 Bit 1 Bit 2

Background and Theory (cont)

VLSI Project Winter 2005/2006

Background and Theory (cont)

The idea: LRFU (Least Recently/Frequently Used)

VLSI Project Winter 2005/2006

Ctc(block) = F(||1) + F(2) + F(||3) ||

Spectrum (LRU/LFU) F(x) = (1/2)x (LRU extreme)

C t2(b) = F (1+) + F (2+) + F (3+) = (1/2)(1+ ) +

(1/2) (2+ ) + (1/2) (3+ ) = ((1/2)1 + (1/2)2 + (1/2)3 ) (1/2)

Design and Implementation

In Victims cache ? Not in cache

END Filter out Insert Data into Victims Cache

VLSI Project Winter 2005/2006

Design and Implementation (cont)

LRFU uses for each block two BOUNDED counters

VLSI Project Winter 2005/2006

VLSI Project Winter 2005/2006

VLSI Project Winter 2005/2006

Cache Size (# of blocks)

VLSI Project Winter 2005/2006 19

Finding the perfect C(t)

VLSI Project Winter 2005/2006

VLSI Project Winter 2005/2006

VLSI Project Winter 2005/2006

Você também pode gostar