Você está na página 1de 135

Hashing and Natural Parallism

Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Linked Lists
Ways to make highly-concurrent listbased sets:
Fine-grained locks Optimistic synchronization Lazy synchronization Lock-free synchronization

Whats missing?
Art of Multiprocessor Programming 2

Linear-Time Set Methods


Problem is
add(), remove(), contains() Take time linear in set size

We want
Constant-time methods (at least, on average)

Art of Multiprocessor Programming

Hashing
Hash function
h: items integers

Uniformly distributed
Different item most likely have different hash values

Java hashCode() method

Art of Multiprocessor Programming

Open vs. Closed Addressing


Closed address (Chained hashing)
each item has a fixed bucket in table each bucket contains several items

Open Addressed
each item could end up in different buckets in table Each bucket contains at most one item
Art of Multiprocessor Programming 5

Sequential Closed Hash Map


0 1 2 3 16

buckets

2 Items h(k) = k mod 4

Art of Multiprocessor Programming

Add an Item
0 1 2 3 16

9 7

3 Items h(k) = k mod 4

Art of Multiprocessor Programming

Add Another: Collision


0 1 2 3 16 4

9 7

4 Items h(k) = k mod 4

Art of Multiprocessor Programming

More Collisions
0 1 2 3 16 4

9 7 15

5 Items h(k) = k mod 4

Art of Multiprocessor Programming

More Collisions
0 1 2 3 16 4

9 7 15

Problem: buckets getting too long

5 Items h(k) = k mod 4

Art of Multiprocessor Programming

10

Resizing
0 1 2 3 4 5 6 7 16 4

9 7 15

5 Items h(k) = k mod 8 Grow the array


Art of Multiprocessor Programming 11

Resizing
0 1 2 3 4 5 6 7
Art of Multiprocessor Programming 12

16

9 7 15

Adjust hash function

5 Items h(k) = k mod 8

Resizing
0 1 2 3 4 5 6 7
Art of Multiprocessor Programming 13

16

h(4) = 0 mod 8

9 7 15

h(k) = k mod 8

Resizing
0 1 2 3 4 5 6 7
Art of Multiprocessor Programming 14

16

h(4) = 4 mod 8

9 7
4

15

h(k) = k mod 8

Resizing
0 1 2 3 4 5 6 7
Art of Multiprocessor Programming 15

16

h(15) = 7 mod 8

9 7
4

15

h(k) = k mod 8

Resizing
0 1 2 3 4 5 6 7 7 15
Art of Multiprocessor Programming 16

16

h(15) = 7 mod 8

h(k) = k mod 8

Hash Sets
Implement a Set object
Collection of items, no duplicates add(), remove(), contains() methods You know the drill

Art of Multiprocessor Programming

17

Simple Hash Set


public class SimpleHashSet { protected LockFreeList[] table;
public SimpleHashSet(int capacity) { table = new LockFreeList[capacity]; for (int i = 0; i < capacity; i++) table[i] = new LockFreeList(); }

Art of Multiprocessor Programming

18

Fields
public class SimpleHashSet { protected LockFreeList[] table;
public SimpleHashSet(int capacity) { table = new LockFreeList[capacity]; for (int i = 0; i < capacity; i++) table[i] = new LockFreeList(); }

Array of lock-free lists


19

Art of Multiprocessor Programming

Constructor
public class SimpleHashSet { protected LockFreeList[] table;
public SimpleHashSet(int capacity) { table = new LockFreeList[capacity]; for (int i = 0; i < capacity; i++) table[i] = new LockFreeList(); }

Initial size

Art of Multiprocessor Programming

20

Constructor
public class SimpleHashSet { protected LockFreeList[] table;
public SimpleHashSet(int capacity) { table = new LockFreeList[capacity]; for (int i = 0; i < capacity; i++) table[i] = new LockFreeList(); }

Allocate memory

Art of Multiprocessor Programming

21

Constructor
public class SimpleHashSet { protected LockFreeList[] table;
public SimpleHashSet(int capacity) { table = new LockFreeList[capacity]; for (int i = 0; i < capacity; i++) table[i] = new LockFreeList(); }

Initialization
Art of Multiprocessor Programming 22

Add Method
public boolean add(Object key) { int hash = key.hashCode() % table.length; return table[hash].add(key); }

Art of Multiprocessor Programming

23

Add Method
public boolean add(Object key) { int hash = key.hashCode() % table.length; return table[hash].add(key); }

Use object hash code to pick a bucket


Art of Multiprocessor Programming 24

Add Method
public boolean add(Object key) { int hash = key.hashCode() % table.length; return table[hash].add(key); }

Call buckets add() method


Art of Multiprocessor Programming 25

No Brainer?
We just saw a
Simple Lock-free Concurrent hash-based set implementation

Whats not to like?

Art of Multiprocessor Programming

26

No Brainer?
We just saw a
Simple Lock-free Concurrent hash-based set implementation

Whats not to like? We dont know how to resize


Art of Multiprocessor Programming 27

Is Resizing Necessary?
Constant-time method calls require
Constant-length buckets Table size proportional to set size As set grows, must be able to resize

Art of Multiprocessor Programming

28

Set Method Mix


Typical load
90% contains() 9% add () 1% remove()

Growing is important Shrinking not so much

Art of Multiprocessor Programming

29

When to Resize?
Many reasonable policies. Heres one. Pick a threshold on num of items in a bucket Global threshold
When buckets exceed this value

Bucket threshold
When any bucket exceeds this value
Art of Multiprocessor Programming 30

Coarse-Grained Locking
Good parts
Simple Hard to mess up

Bad parts
Sequential bottleneck

Art of Multiprocessor Programming

31

Fine-grained Locking
0
1 2 4 8 17

11

Each lock associated with one bucket


Art of Multiprocessor Programming 32

Make sure table reference didnt change Resize between resize decisionThis and lock acquisition
0
1 2 4 8 17

11

Acquire locks in ascending order


Art of Multiprocessor Programming 33

Resize This
0
1 2 0 1 2 3 4 8 17

11

4 5 6 7

Allocate new super-sized table

Art of Multiprocessor Programming

34

Resize This
0
1 2 0 1 2 3 4 8 9 8 17 17 11

11 4

4 5 6 7

7
35

Art of Multiprocessor Programming

Striped Locks: each lock now associated with two buckets

Resize This
0
1 2 0 1 2 3 11 4 8 9 17

4 5 6 7

7
37

Art of Multiprocessor Programming

Observations
We grow the table, but not locks
Resizing lock array is tricky

We use sequential lists


Not LockFreeList lists If were locking anyway, why pay?

Art of Multiprocessor Programming

38

Fine-Grained Locks
We can resize the table But not the locks Debatable whether method calls are constant-time in presence of contention

Art of Multiprocessor Programming

53

Insight
The contains() method
Does not modify any fields Why should concurrent contains() calls conflict?

Art of Multiprocessor Programming

54

Read/Write Locks
public interface ReadWriteLock { Lock readLock(); Lock writeLock(); }

Art of Multiprocessor Programming

55

Read/Write Locks
public interface ReadWriteLock { Returns associated Lock readLock(); Lock writeLock(); read lock }

Art of Multiprocessor Programming

56

Read/Write Locks
public interface ReadWriteLock { Returns associated Lock readLock(); Lock writeLock(); read lock }

Returns associated write lock

Art of Multiprocessor Programming

57

Lock Safety Properties


No thread may acquire the write lock
while any thread holds the write lock or the read lock.

No thread may acquire the read lock


while any thread holds the write lock.

Concurrent read locks OK


Art of Multiprocessor Programming 58

Read/Write Lock
Satisfies safety properties
If readers > 0 then writer == false If writer == true then readers == 0

Liveness?
Lots of readers Writers locked out?

Art of Multiprocessor Programming

59

FIFO R/W Lock


As soon as a writer requests a lock No more readers accepted Current readers drain from lock Writer gets in

Art of Multiprocessor Programming

60

The Story So Far


Resizing is the hard part Fine-grained locks
Striped locks cover a range (not resized)

Read/Write locks
FIFO property tricky

Art of Multiprocessor Programming

61

Optimistic Synchronization
Let the contains() method
Scan without locking

If it finds the key


OK to return true Actually requires a proof .

What if it doesnt find the key?

Art of Multiprocessor Programming

62

Optimistic Synchronization
If it doesnt find the key
May be victim of resizing

Must try again


Getting a read lock this time

Makes sense if
Keys are present Resizes are rare
Art of Multiprocessor Programming 63

Stop The World Resizing


Resizing stops all concurrent operations What about an incremental resize? Must avoid locking the table A lock-free table + incremental resizing?
Art of Multiprocessor Programming

64

Lock-Free Resizing Problem


0 1 2 3 4 8

9 7 15

Art of Multiprocessor Programming

65

Lock-Free Resizing Problem


0 1 2 3 7 15 4 9 8 12

Need to extend table

Art of Multiprocessor Programming

66

Lock-Free Resizing Problem


0 1 2 3 4 5 6 7
Art of Multiprocessor Programming 67

4 4 9 7

12

Need to extend table


15

Lock-Free Resizing Problem


0 1 2 3 4 5 6 7
Art of Multiprocessor Programming 68

4 9 7

12

15

12

Lock-Free Resizing Problem


0 1 2 3 4 5 6 7 7 15 4 9 8

12

12

to remove and then add even a single item single location CAS not enough

We need a new idea


Art of Multiprocessor Programming 69

Dont move the items


Move the buckets instead Keep all items in a single lock-free list Buckets become shortcut pointers into the list
16 0 1 2 3
Art of Multiprocessor Programming 70

15

Recursive Split Ordering


0
0 4 2 6 1 5 3 7

Art of Multiprocessor Programming

71

Recursive Split Ordering


1/2
0
0 1 4 2 6 1 5 3 7

Art of Multiprocessor Programming

72

Recursive Split Ordering


1/4
0
0 1 2 3 4 2 6

1/2
1 5

3/4
3 7

Art of Multiprocessor Programming

73

Recursive Split Ordering


1/4
0
0 1 2 3 4 2 6

1/2
1 5

3/4
3 7

List entries sorted in order that allows recursive splitting. How?

Art of Multiprocessor Programming

74

Recursive Split Ordering

0 0

Art of Multiprocessor Programming

75

Recursive Split Ordering


LSB 0

LSB 1

0 0 1

LSB = Least significant Bit


Art of Multiprocessor Programming 76

Recursive Split Ordering


LSB 00 LSB 10 LSB 01 LSB 11

0 0 1 2

Art of Multiprocessor Programming

77

Split-Order
If the table size is 2i,
Bucket b contains keys k
k = b (mod 2i)

bucket index consists of key's i LSBs

Art of Multiprocessor Programming

78

When Table Splits


Some keys stay
Some move
b = k mod(2i+1)

Determined by (i+1)st bit

b+2i = k mod(2i+1)

Key must be accessible from both


Keys that will move must come later
Art of Multiprocessor Programming 79

Counting backwards

A Bit of Magic
Real keys:
0
4 2 6 1 5 3 7

Art of Multiprocessor Programming

80

A Bit of Magic
Real keys:
0 4 2 6 1 5 3 7

Real key 1 is in the 4th location


Split-order:
0 1 2 3 4 5 6 7

Art of Multiprocessor Programming

81

A Bit of Magic
Real keys:
0 4 2 6 1 5 3 7

000

100 010

110

001

101 011

111

Real key 1 is in 4th location


Split-order:
0 1 2 3 4 5 6 7

000

001 010

011

100

101 110

111

Art of Multiprocessor Programming

82

A Bit of Magic
Real keys: 000 100 010 110 001 101 011 111

Split-order:

000

001 010

011

100

101 110

111

Art of Multiprocessor Programming

83

A Bit of Magic
Real keys: 000 100 010 110 001 101 011 111

Split-order:

000

Just reverse the order of the key bits


Art of Multiprocessor Programming

001 010

011

100

101 110

111

84

Split Ordered Hashing


Order according to reversed bits
000
0 0 1 2

001 010
4 2

011
6

100
1

101 110
5 3

111
7

Art of Multiprocessor Programming

85

Parent Always Provides a Short Cut


0 0 1 2 4 2 6 1 5 search 3 7

Art of Multiprocessor Programming

86

Sentinel Nodes
16 0 1 2 3 4 9 7 15

Problem: how to remove a node pointed by 2 sources using CAS


Art of Multiprocessor Programming 87

Sentinel Nodes
0 0 1 2 3 16 4 1 9

15

Solution: use a Sentinel node for each bucket

Art of Multiprocessor Programming

88

Sentinel vs Regular Keys


Want sentinel key for i ordered
before all keys that hash to bucket i after all keys that hash to bucket (i-1)

Art of Multiprocessor Programming

89

Splitting a Bucket
We can now split a bucket In a lock-free manner Using two CAS() calls ...
One to add the sentinel to the list The other to point from the bucket to the sentinel

Art of Multiprocessor Programming

90

Initialization of Buckets
0
0 1 16 4

15

Art of Multiprocessor Programming

91

Initialization of Buckets
0
0 1 2 3 16 4

15

3 list but notbucket connected bucket yet Need to initialize 3 to to split bucket 1 Now 3 in points to sentinel bucket has been split
Art of Multiprocessor Programming 92

Adding 10
10

= 2 mod 4 1
9
3 7

0
0 1 2 3

16

2 2

Must initialize bucket Then can add 10 2


Art of Multiprocessor Programming 93

Recursive Initialization
To add 7 to the list 0
0 1 2 3 8 12 7

= 3 mod 4

= 1 mod 2

But EXPECTED is constant Could be depth log n depth

Must initialize bucket 1

Must initialize bucket 3


Art of Multiprocessor Programming 94

Lock-Free List
int makeRegularKey(int key) { return reverse(key | 0x80000000); } int makeSentinelKey(int key) { return reverse(key); }

Art of Multiprocessor Programming

95

Lock-Free List
int makeRegularKey(int key) { return reverse(key | 0x80000000); } int makeSentinelKey(int key) { return reverse(key); }

Regular key: set high-order bit to 1 and reverse


Art of Multiprocessor Programming 96

Lock-Free List
int makeRegularKey(int key) { return reverse(key | 0x80000000); } int makeSentinelKey(int key) { return reverse(key); }

Sentinel key: simply reverse (high-order bit is 0)


Art of Multiprocessor Programming 97

Main List
Lock-Free List from earlier class With some minor variations

Art of Multiprocessor Programming

98

Lock-Free List
public class LockFreeList { public boolean add(Object object, int key) {...} public boolean remove(int k) {...} public boolean contains(int k) {...} public LockFreeList(LockFreeList parent, int key) {...}; }

Art of Multiprocessor Programming

99

Lock-Free List
public class LockFreeList { public boolean add(Object object, int key) {...} public boolean remove(int k) {...} public boolean contains(int k) {...} public LockFreeList(LockFreeList parent, Change: add takes key int key) {...}; argument }

Art of Multiprocessor Programming

100

Lock-Free List
Inserts with key if public sentinel class LockFreeList { not public boolean add(Object already present object,
int key) {...} public boolean remove(int k) {...} public boolean contains(int k) {...} public LockFreeList(LockFreeList parent, int key) {...};

Art of Multiprocessor Programming

101

Lock-Free List
public returns new LockFreeList list starting{ with class public boolean object, sentinel (sharesadd(Object with parent)
int key) {...} public boolean remove(int k) {...} public boolean contains(int k) {...} public LockFreeList(LockFreeList parent, int key) {...};

Art of Multiprocessor Programming

102

Split-Ordered Set: Fields


public class SOSet { protected LockFreeList[] table; protected AtomicInteger tableSize; protected AtomicInteger setSize;
public SOSet(int capacity) { table = new LockFreeList[capacity]; table[0] = new LockFreeList(); tableSize = new AtomicInteger(2); setSize = new AtomicInteger(0); }
Art of Multiprocessor Programming 103

Fields
public class SOSet { protected LockFreeList[] table; protected AtomicInteger tableSize; protected AtomicInteger setSize;
public SOSet(int capacity) { table = new LockFreeList[capacity]; table[0] = new LockFreeList(); For simplicity treat table as tableSize = new AtomicInteger(2); setSize = new big AtomicInteger(0); array }
Art of Multiprocessor Programming 104

Fields
public class SOSet { protected LockFreeList[] table; protected AtomicInteger tableSize; protected AtomicInteger setSize;
public SOSet(int capacity) { table = new LockFreeList[capacity]; table[0] = new LockFreeList(); In practice, want something tableSize = new AtomicInteger(2); setSize that = newgrows AtomicInteger(0); dynamically }
Art of Multiprocessor Programming 105

Fields
public class SOSet { protected LockFreeList[] table; protected AtomicInteger tableSize; protected AtomicInteger setSize;
public SOSet(int capacity) { table = new LockFreeList[capacity]; table[0] = new LockFreeList(); How much table array are tableSize = new of AtomicInteger(2); setSize = we new actually AtomicInteger(0); using? }
Art of Multiprocessor Programming 106

Fields
public class SOSet { protected LockFreeList[] table; protected AtomicInteger tableSize; protected AtomicInteger setSize;
public SOSet(int capacity) { table = new LockFreeList[capacity]; table[0] = new LockFreeList(); Track set AtomicInteger(2); size so we know tableSize = new setSize = new AtomicInteger(0); when to resize }
Art of Multiprocessor Programming 107

Fields
public class { and size Initially use 1 SOSet bucket protected LockFreeList[] table; isAtomicInteger zero protected tableSize;
protected AtomicInteger setSize; public SOSet(int capacity) { table = new LockFreeList[capacity]; table[0] = new LockFreeList(); tableSize = new AtomicInteger(1); setSize = new AtomicInteger(0); }
Art of Multiprocessor Programming 108

Add() Method
public boolean add(Object object) { int hash = object.hashCode(); int bucket = hash % tableSize.get(); int key = makeRegularKey(hash); LockFreeList list = getBucketList(bucket); if (!list.add(object, key)) return false; resizeCheck(); return true; }
Art of Multiprocessor Programming 109

Add() Method
public boolean add(Object object) { int hash = object.hashCode(); int bucket = hash % tableSize.get(); int key = makeRegularKey(hash); LockFreeList list = getBucketList(bucket); if (!list.add(object, key)) return false; resizeCheck(); return true; Pick a bucket }
Art of Multiprocessor Programming 110

Add() Method
public boolean add(Object object) { int hash = object.hashCode(); int bucket = hash % tableSize.get(); int key = makeRegularKey(hash); LockFreeList list = getBucketList(bucket); if (!list.add(object, key)) return false; resizeCheck(); Non-Sentinel return true; } split-ordered key
Art of Multiprocessor Programming 111

Add() Method
public boolean add(Object object) { int hash = object.hashCode(); int bucket = hash % tableSize.get(); int key = makeRegularKey(hash); LockFreeList list = getBucketList(bucket); if (!list.add(object, key)) return false; resizeCheck(); Get pointer to buckets sentinel, return true; initializing if necessary }
Art of Multiprocessor Programming 112

Add() Method
public boolean add(Object object) { Call buckets add() method with int hash = object.hashCode(); key int bucket reversed = hash % tableSize.get(); int key = makeRegularKey(hash); LockFreeList list = getBucketList(bucket); if (!list.add(object, key)) return false; resizeCheck(); return true; }
Art of Multiprocessor Programming 113

Add() Method
public boolean add(Object object) { No change? Were done. int hash = object.hashCode(); int bucket = hash % tableSize.get(); int key = makeRegularKey(hash); LockFreeList list = getBucketList(bucket); if (!list.add(object, key)) return false; resizeCheck(); return true; }
Art of Multiprocessor Programming 114

Add() Method
public boolean add(Object object) { Time to resize? int hash = object.hashCode(); int bucket = hash % tableSize.get(); int key = makeRegularKey(hash); LockFreeList list = getBucketList(bucket); if (!list.add(object, key)) return false; resizeCheck(); return true; }
Art of Multiprocessor Programming 115

Resize
Divide set size by total number of buckets If quotient exceeds threshold
Double tableSize field Up to fixed limit

Art of Multiprocessor Programming

116

Initialize Buckets
Buckets originally null If you find one, initialize it Go to buckets parent
Earlier nearby bucket Recursively initialize if necessary

Constant expected work


Art of Multiprocessor Programming 117

Recall: Recursive Initialization


To add 7 to the list 0
0 1 2 3 8 12 7

= 3 mod 4 = 1 mod 2

But EXPECTED is constant Could be depth log n depth

Must initialize bucket 1

Must initialize bucket 3


Art of Multiprocessor Programming 118

Initialize Bucket
void initializeBucket(int bucket) { int parent = getParent(bucket); if (table[parent] == null) initializeBucket(parent); int key = makeSentinelKey(bucket); LockFreeList list = new LockFreeList(table[parent], key); }

Art of Multiprocessor Programming

119

Initialize Bucket
void initializeBucket(int bucket) { int parent = getParent(bucket); if (table[parent] == null) initializeBucket(parent); int key = makeSentinelKey(bucket); LockFreeList list = new LockFreeList(table[parent], key); } Find parent, recursively

initialize if needed

Art of Multiprocessor Programming

120

Initialize Bucket
void initializeBucket(int bucket) { int parent = getParent(bucket); if (table[parent] == null) initializeBucket(parent); int key = makeSentinelKey(bucket); LockFreeList list = new LockFreeList(table[parent], key); } Prepare key for new sentinel

Art of Multiprocessor Programming

121

Initialize Bucket
void initializeBucket(int bucket) { Insert sentinel if not present, and int parent = getParent(bucket); get back reference to rest of list if (table[parent] == null) initializeBucket(parent); int key = makeSentinelKey(bucket); LockFreeList list = new LockFreeList(table[parent], key); }

Art of Multiprocessor Programming

122

Correctness
Linearizable concurrent set implementation Theorem: O(1) expected time
No more than O(1) items expected between two dummy nodes on average Lazy initialization causes at most O(1) expected recursion depth in
initializeBucket()
Art of Multiprocessor Programming 123

Closed (Chained) Hashing


Advantages:
with N buckets, M items, Uniform h retains good performance as table density (M/N) increases less resizing

Disadvantages:

dynamic memory allocation bad cache behavior (no locality)

Cache behavior very important on a multicore!

Linear Probing*
h(x)

z
8

x
11 12 13 14 15 16 17 18 19 20

9 10

z H =7

Contains(x) search linearly from h(x)


until last location H noted in bucket.
*Attributed to Amdahl

Linear Probing
z zzz
1 2 3 4 5 6

h(x)
7 8

z z z z zx zz
9 10

zz

11 12 13 14 15 16 17 18 19 20

=3 =6

Add(x) - add in first empty bucket


and update its H.

Linear Probing
Open address means M <= N Expected items in bucket same as Chaining Expected distance till open slot: (1+(1/(1-M/N)) 2

M/N = 0.5 search 2.5 buckets M/N = 0.9 search 50 buckets

Linear Probing
Advantages: Disadvantages:
Good locality less cache misses
As M/N increases more cache misses

* As computation proceeds Contamination * by deleted items more cache misses

searching 10s of unrelated buckets Clustering of keys into neighboring buckets

But cycles Cuckoo can form


z zzz
1 2 3 4 5 6 7

Hashing
zz zz
zz

zy xzzz
8 9 10

h1(x)

11 12 13 14 15 16 17 18 19 20

z zz
1 2 3 4 5

z zz
6 7 8

zw z
9 10

zz

h2(y)

h2(x)

11 12 13 14 15 16 17 18 19 20

Add(x) if h1(x) and h2(x) full evict y and move it


to h2(y) != h2(x). Then place x in its place.

Cuckoo Hashing
Advantages:
Contains() : deterministic 2 buckets No clustering or contamination
2 tables hi(x) are complex As M/N increases relocation cycles Above M/N = 0.5 Add() does not work!

Disadvantages:

Hopscotch Hashing
Single Array, Simple hash function Idea: define neighborhood of original bucket In neighborhood items found quickly Use sequences of displacements to move items into their neighborhood

Hopscotch Hashing*
h(x)

z x
6 7 8

9 10

11 12 13 14 15 16 17 18 19 20

1 0 1 0 H=4

Contains(x) search in at most H buckets

(the hop-range) based on hop-info bitmap. In practice pick H to be 32.


*http://groups.google.com/group/hopscotch-hashing

Hopscotch Hashing
h(x)

uwv z r
6

s
11 12 13 14 15 16 17 18 19 20

10 10 1

Add(x) probe linearly to find open slot.

Move the empty slot via sequence of displacements into the hop-range of h(x).

x
7

9 10

1 0 10 1 0

Hopscotch Hashing
Contains(): In concurrent version operation is wait-free (just look in neighborhood) Add(): Expected distance till open slot same as in linear probing Chances of Resize() because neighborhood is full diminish as H log n, one word hopinfo bitmap, or use smaller H and default to Linear Probing

Hopscotch Hashing
Advantages:
Good locality and cache behavior Good performance as table density (M/N) increases less resizing Pay price in Add() not in frequent

Contains()

Easy to parallelize

Recall: Concurrent Chained Hashing

9 10

11 12 13 14 15 16 17 18 19 20

Striped Locks
1 2 3 4 5

Lock for Add() and unsuccessful

Contains()

Concurrent Simple Hopscotch


h(x)

9 10

11 12 13 14 15 16 17 18 19 20

Striped locking as in Chained hashing Contains() is wait-free

Concurrent Simple Hopscotch


u zv xr
6 7 8 9 10

11 12 13 14 15 16 17 18 19 20

1 0 0 1 ts

Add(x) lock bucket, mark empty

slot using CAS, add x erasing mark

Concurrent Simple Hopscotch


u zv r
6 7 8

9 10

11 12 13 14 15 16 17 18 19 20

1 0 0 1 ts

0 10 10 1 ts+1 ts

Add(x) lock bucket, mark empty

slot using CAS, lock bucket and update timestamp of bucket being displaced before erasing old value

Concurrent Simple Hopscotch


X not found
1 2 3 4 5

u zv s r
6 7 8 9 10

11 12 13 14 15 16 17 18 19 20

1 0 0 1 ts

Wait-free Contains(x) read ts, hopinfo, goto marked buckets, if no x compare ts, if diff repeat, after k attempts search all H buckets

Is performance dominated by cache behavior?


Run algs on state of the art multicores and uniprocessors:
Sun 64 way Niagara II, and Intel 3GHz Xeon

Benchmarks pre-allocated memory to eliminate effects of memory management

5000 4500 4000 3500

Sequential SPARC Throughput 90% contain, 5% insert, 5% remove

ops /ms

3000 2500 2000 1500 1000 500 0 0.1 Hopscotch_D Hopscotch_ND LinearProbing Chained Cuckoo 0.2 0.3 0.4 0.5 0.6

with memory pre-allocated

0.7

0.8

0.9

table density

5000 4500 4000 3500

Sequential SPARC Throughput 90% contain, 5% insert, 5% remove

ops /ms

3000 2500 2000 1500 1000 500 0 0.1 Hopscotch_D Hopscotch_ND LinearProbing Chained Cuckoo 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

table density

8 7

DTBL Cache-Miss Counts Sequential SPARC UnSuccessful Lookup


Hopscotch_D Hopscotch_ND LinearProbing Chained Cuckoo

/ ops miss

6 5 4 3 2 1 0 0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

table density

4000 3500 3000

Sequential SPARC High-Density;Throuthput 90% contain, 5% insert,5% remove

ops /ms

2500 2000 1500 1000 500 0 0.9 Hopscotch_D Hopscotch_ND LinearProbing Chained

0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 table density

14000 12000 10000

Sequential CoreDuo; Throughput 90% contain, 5% insert, 5% remove

Cuckoo stops here

ops /ms

8000 6000 4000 2000 0 0.1


Hopscotch_D Hopscotch_ND LinearProbing Chained Cuckoo

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

table density

14000 12000 10000

Sequential CoreDuo; Throughput 90% contain, 5% insert, 5% remove

ops /ms

8000 6000 4000 2000 0 0.1


Hopscotch_D Hopscotch_ND LinearProbing Chained Cuckoo

0.2

0.3

0.4

0.5

0.6

0.7

0.8

table density

6 5

DTBL Cache-Miss Counts Sequential CoreDuo UnSuccessful Lookup


0.9

miss / ops

4 3 2 1 0 0.1

Hopscotch_D Hopscotch_ND LinearProbing Chained Cuckoo

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

table density

Concurrent SPARC Throughput


160000 140000 120000

90% density; 70% contain, 15% insert, 15% remove


Hopscotch_D Chained_PRE Chained_MTM

ops /ms

100000 80000 60000 40000 20000 0 1 8 16 24 32 CPUs 40 48 56 64

with memory pre-allocated with allocation

Concurrent SPARC Throughput


3 2.5

90% density; Cache-Miss per UnSuccessful-Lookup

miss / ops

2 1.5 1 0.5 0 1

Hopscotch_D Chained_PRE Chained_MTM

16

24 CPUs

32

40

48

56

64

Summary
Chained hash with striped locking is simple and effective in many cases Hopscotch with striped locking great cache behavior If incremental resizing needed go for split-ordered

This work is licensed under a Creative Commons AttributionShareAlike 2.5 License.


You are free: to Share to copy, distribute and transmit the work to Remix to adapt the work Under the following conditions: Attribution. You must attribute the work to The Art of Multiprocessor Programming (but not in any way that suggests that the authors endorse you or your use of the work). Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license. For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to http://creativecommons.org/licenses/by-sa/3.0/. Any of the above conditions can be waived if you get permission from the copyright holder. Nothing in this license impairs or restricts the author's moral rights.

Art of Multiprocessor Programming

150

Você também pode gostar