Escolar Documentos
Profissional Documentos
Cultura Documentos
Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit
Linked Lists
Ways to make highly-concurrent listbased sets:
Fine-grained locks Optimistic synchronization Lazy synchronization Lock-free synchronization
Whats missing?
Art of Multiprocessor Programming 2
We want
Constant-time methods (at least, on average)
Hashing
Hash function
h: items integers
Uniformly distributed
Different item most likely have different hash values
Open Addressed
each item could end up in different buckets in table Each bucket contains at most one item
Art of Multiprocessor Programming 5
buckets
Add an Item
0 1 2 3 16
9 7
9 7
More Collisions
0 1 2 3 16 4
9 7 15
More Collisions
0 1 2 3 16 4
9 7 15
10
Resizing
0 1 2 3 4 5 6 7 16 4
9 7 15
Resizing
0 1 2 3 4 5 6 7
Art of Multiprocessor Programming 12
16
9 7 15
Resizing
0 1 2 3 4 5 6 7
Art of Multiprocessor Programming 13
16
h(4) = 0 mod 8
9 7 15
h(k) = k mod 8
Resizing
0 1 2 3 4 5 6 7
Art of Multiprocessor Programming 14
16
h(4) = 4 mod 8
9 7
4
15
h(k) = k mod 8
Resizing
0 1 2 3 4 5 6 7
Art of Multiprocessor Programming 15
16
h(15) = 7 mod 8
9 7
4
15
h(k) = k mod 8
Resizing
0 1 2 3 4 5 6 7 7 15
Art of Multiprocessor Programming 16
16
h(15) = 7 mod 8
h(k) = k mod 8
Hash Sets
Implement a Set object
Collection of items, no duplicates add(), remove(), contains() methods You know the drill
17
18
Fields
public class SimpleHashSet { protected LockFreeList[] table;
public SimpleHashSet(int capacity) { table = new LockFreeList[capacity]; for (int i = 0; i < capacity; i++) table[i] = new LockFreeList(); }
Constructor
public class SimpleHashSet { protected LockFreeList[] table;
public SimpleHashSet(int capacity) { table = new LockFreeList[capacity]; for (int i = 0; i < capacity; i++) table[i] = new LockFreeList(); }
Initial size
20
Constructor
public class SimpleHashSet { protected LockFreeList[] table;
public SimpleHashSet(int capacity) { table = new LockFreeList[capacity]; for (int i = 0; i < capacity; i++) table[i] = new LockFreeList(); }
Allocate memory
21
Constructor
public class SimpleHashSet { protected LockFreeList[] table;
public SimpleHashSet(int capacity) { table = new LockFreeList[capacity]; for (int i = 0; i < capacity; i++) table[i] = new LockFreeList(); }
Initialization
Art of Multiprocessor Programming 22
Add Method
public boolean add(Object key) { int hash = key.hashCode() % table.length; return table[hash].add(key); }
23
Add Method
public boolean add(Object key) { int hash = key.hashCode() % table.length; return table[hash].add(key); }
Add Method
public boolean add(Object key) { int hash = key.hashCode() % table.length; return table[hash].add(key); }
No Brainer?
We just saw a
Simple Lock-free Concurrent hash-based set implementation
26
No Brainer?
We just saw a
Simple Lock-free Concurrent hash-based set implementation
Is Resizing Necessary?
Constant-time method calls require
Constant-length buckets Table size proportional to set size As set grows, must be able to resize
28
29
When to Resize?
Many reasonable policies. Heres one. Pick a threshold on num of items in a bucket Global threshold
When buckets exceed this value
Bucket threshold
When any bucket exceeds this value
Art of Multiprocessor Programming 30
Coarse-Grained Locking
Good parts
Simple Hard to mess up
Bad parts
Sequential bottleneck
31
Fine-grained Locking
0
1 2 4 8 17
11
Make sure table reference didnt change Resize between resize decisionThis and lock acquisition
0
1 2 4 8 17
11
Resize This
0
1 2 0 1 2 3 4 8 17
11
4 5 6 7
34
Resize This
0
1 2 0 1 2 3 4 8 9 8 17 17 11
11 4
4 5 6 7
7
35
Resize This
0
1 2 0 1 2 3 11 4 8 9 17
4 5 6 7
7
37
Observations
We grow the table, but not locks
Resizing lock array is tricky
38
Fine-Grained Locks
We can resize the table But not the locks Debatable whether method calls are constant-time in presence of contention
53
Insight
The contains() method
Does not modify any fields Why should concurrent contains() calls conflict?
54
Read/Write Locks
public interface ReadWriteLock { Lock readLock(); Lock writeLock(); }
55
Read/Write Locks
public interface ReadWriteLock { Returns associated Lock readLock(); Lock writeLock(); read lock }
56
Read/Write Locks
public interface ReadWriteLock { Returns associated Lock readLock(); Lock writeLock(); read lock }
57
Read/Write Lock
Satisfies safety properties
If readers > 0 then writer == false If writer == true then readers == 0
Liveness?
Lots of readers Writers locked out?
59
60
Read/Write locks
FIFO property tricky
61
Optimistic Synchronization
Let the contains() method
Scan without locking
62
Optimistic Synchronization
If it doesnt find the key
May be victim of resizing
Makes sense if
Keys are present Resizes are rare
Art of Multiprocessor Programming 63
64
9 7 15
65
66
4 4 9 7
12
4 9 7
12
15
12
12
12
to remove and then add even a single item single location CAS not enough
Move the buckets instead Keep all items in a single lock-free list Buckets become shortcut pointers into the list
16 0 1 2 3
Art of Multiprocessor Programming 70
15
71
72
1/2
1 5
3/4
3 7
73
1/2
1 5
3/4
3 7
74
0 0
75
LSB 1
0 0 1
0 0 1 2
77
Split-Order
If the table size is 2i,
Bucket b contains keys k
k = b (mod 2i)
78
b+2i = k mod(2i+1)
Counting backwards
A Bit of Magic
Real keys:
0
4 2 6 1 5 3 7
80
A Bit of Magic
Real keys:
0 4 2 6 1 5 3 7
81
A Bit of Magic
Real keys:
0 4 2 6 1 5 3 7
000
100 010
110
001
101 011
111
000
001 010
011
100
101 110
111
82
A Bit of Magic
Real keys: 000 100 010 110 001 101 011 111
Split-order:
000
001 010
011
100
101 110
111
83
A Bit of Magic
Real keys: 000 100 010 110 001 101 011 111
Split-order:
000
001 010
011
100
101 110
111
84
001 010
4 2
011
6
100
1
101 110
5 3
111
7
85
86
Sentinel Nodes
16 0 1 2 3 4 9 7 15
Sentinel Nodes
0 0 1 2 3 16 4 1 9
15
88
89
Splitting a Bucket
We can now split a bucket In a lock-free manner Using two CAS() calls ...
One to add the sentinel to the list The other to point from the bucket to the sentinel
90
Initialization of Buckets
0
0 1 16 4
15
91
Initialization of Buckets
0
0 1 2 3 16 4
15
3 list but notbucket connected bucket yet Need to initialize 3 to to split bucket 1 Now 3 in points to sentinel bucket has been split
Art of Multiprocessor Programming 92
Adding 10
10
= 2 mod 4 1
9
3 7
0
0 1 2 3
16
2 2
Recursive Initialization
To add 7 to the list 0
0 1 2 3 8 12 7
= 3 mod 4
= 1 mod 2
Lock-Free List
int makeRegularKey(int key) { return reverse(key | 0x80000000); } int makeSentinelKey(int key) { return reverse(key); }
95
Lock-Free List
int makeRegularKey(int key) { return reverse(key | 0x80000000); } int makeSentinelKey(int key) { return reverse(key); }
Lock-Free List
int makeRegularKey(int key) { return reverse(key | 0x80000000); } int makeSentinelKey(int key) { return reverse(key); }
Main List
Lock-Free List from earlier class With some minor variations
98
Lock-Free List
public class LockFreeList { public boolean add(Object object, int key) {...} public boolean remove(int k) {...} public boolean contains(int k) {...} public LockFreeList(LockFreeList parent, int key) {...}; }
99
Lock-Free List
public class LockFreeList { public boolean add(Object object, int key) {...} public boolean remove(int k) {...} public boolean contains(int k) {...} public LockFreeList(LockFreeList parent, Change: add takes key int key) {...}; argument }
100
Lock-Free List
Inserts with key if public sentinel class LockFreeList { not public boolean add(Object already present object,
int key) {...} public boolean remove(int k) {...} public boolean contains(int k) {...} public LockFreeList(LockFreeList parent, int key) {...};
101
Lock-Free List
public returns new LockFreeList list starting{ with class public boolean object, sentinel (sharesadd(Object with parent)
int key) {...} public boolean remove(int k) {...} public boolean contains(int k) {...} public LockFreeList(LockFreeList parent, int key) {...};
102
Fields
public class SOSet { protected LockFreeList[] table; protected AtomicInteger tableSize; protected AtomicInteger setSize;
public SOSet(int capacity) { table = new LockFreeList[capacity]; table[0] = new LockFreeList(); For simplicity treat table as tableSize = new AtomicInteger(2); setSize = new big AtomicInteger(0); array }
Art of Multiprocessor Programming 104
Fields
public class SOSet { protected LockFreeList[] table; protected AtomicInteger tableSize; protected AtomicInteger setSize;
public SOSet(int capacity) { table = new LockFreeList[capacity]; table[0] = new LockFreeList(); In practice, want something tableSize = new AtomicInteger(2); setSize that = newgrows AtomicInteger(0); dynamically }
Art of Multiprocessor Programming 105
Fields
public class SOSet { protected LockFreeList[] table; protected AtomicInteger tableSize; protected AtomicInteger setSize;
public SOSet(int capacity) { table = new LockFreeList[capacity]; table[0] = new LockFreeList(); How much table array are tableSize = new of AtomicInteger(2); setSize = we new actually AtomicInteger(0); using? }
Art of Multiprocessor Programming 106
Fields
public class SOSet { protected LockFreeList[] table; protected AtomicInteger tableSize; protected AtomicInteger setSize;
public SOSet(int capacity) { table = new LockFreeList[capacity]; table[0] = new LockFreeList(); Track set AtomicInteger(2); size so we know tableSize = new setSize = new AtomicInteger(0); when to resize }
Art of Multiprocessor Programming 107
Fields
public class { and size Initially use 1 SOSet bucket protected LockFreeList[] table; isAtomicInteger zero protected tableSize;
protected AtomicInteger setSize; public SOSet(int capacity) { table = new LockFreeList[capacity]; table[0] = new LockFreeList(); tableSize = new AtomicInteger(1); setSize = new AtomicInteger(0); }
Art of Multiprocessor Programming 108
Add() Method
public boolean add(Object object) { int hash = object.hashCode(); int bucket = hash % tableSize.get(); int key = makeRegularKey(hash); LockFreeList list = getBucketList(bucket); if (!list.add(object, key)) return false; resizeCheck(); return true; }
Art of Multiprocessor Programming 109
Add() Method
public boolean add(Object object) { int hash = object.hashCode(); int bucket = hash % tableSize.get(); int key = makeRegularKey(hash); LockFreeList list = getBucketList(bucket); if (!list.add(object, key)) return false; resizeCheck(); return true; Pick a bucket }
Art of Multiprocessor Programming 110
Add() Method
public boolean add(Object object) { int hash = object.hashCode(); int bucket = hash % tableSize.get(); int key = makeRegularKey(hash); LockFreeList list = getBucketList(bucket); if (!list.add(object, key)) return false; resizeCheck(); Non-Sentinel return true; } split-ordered key
Art of Multiprocessor Programming 111
Add() Method
public boolean add(Object object) { int hash = object.hashCode(); int bucket = hash % tableSize.get(); int key = makeRegularKey(hash); LockFreeList list = getBucketList(bucket); if (!list.add(object, key)) return false; resizeCheck(); Get pointer to buckets sentinel, return true; initializing if necessary }
Art of Multiprocessor Programming 112
Add() Method
public boolean add(Object object) { Call buckets add() method with int hash = object.hashCode(); key int bucket reversed = hash % tableSize.get(); int key = makeRegularKey(hash); LockFreeList list = getBucketList(bucket); if (!list.add(object, key)) return false; resizeCheck(); return true; }
Art of Multiprocessor Programming 113
Add() Method
public boolean add(Object object) { No change? Were done. int hash = object.hashCode(); int bucket = hash % tableSize.get(); int key = makeRegularKey(hash); LockFreeList list = getBucketList(bucket); if (!list.add(object, key)) return false; resizeCheck(); return true; }
Art of Multiprocessor Programming 114
Add() Method
public boolean add(Object object) { Time to resize? int hash = object.hashCode(); int bucket = hash % tableSize.get(); int key = makeRegularKey(hash); LockFreeList list = getBucketList(bucket); if (!list.add(object, key)) return false; resizeCheck(); return true; }
Art of Multiprocessor Programming 115
Resize
Divide set size by total number of buckets If quotient exceeds threshold
Double tableSize field Up to fixed limit
116
Initialize Buckets
Buckets originally null If you find one, initialize it Go to buckets parent
Earlier nearby bucket Recursively initialize if necessary
= 3 mod 4 = 1 mod 2
Initialize Bucket
void initializeBucket(int bucket) { int parent = getParent(bucket); if (table[parent] == null) initializeBucket(parent); int key = makeSentinelKey(bucket); LockFreeList list = new LockFreeList(table[parent], key); }
119
Initialize Bucket
void initializeBucket(int bucket) { int parent = getParent(bucket); if (table[parent] == null) initializeBucket(parent); int key = makeSentinelKey(bucket); LockFreeList list = new LockFreeList(table[parent], key); } Find parent, recursively
initialize if needed
120
Initialize Bucket
void initializeBucket(int bucket) { int parent = getParent(bucket); if (table[parent] == null) initializeBucket(parent); int key = makeSentinelKey(bucket); LockFreeList list = new LockFreeList(table[parent], key); } Prepare key for new sentinel
121
Initialize Bucket
void initializeBucket(int bucket) { Insert sentinel if not present, and int parent = getParent(bucket); get back reference to rest of list if (table[parent] == null) initializeBucket(parent); int key = makeSentinelKey(bucket); LockFreeList list = new LockFreeList(table[parent], key); }
122
Correctness
Linearizable concurrent set implementation Theorem: O(1) expected time
No more than O(1) items expected between two dummy nodes on average Lazy initialization causes at most O(1) expected recursion depth in
initializeBucket()
Art of Multiprocessor Programming 123
Disadvantages:
Linear Probing*
h(x)
z
8
x
11 12 13 14 15 16 17 18 19 20
9 10
z H =7
Linear Probing
z zzz
1 2 3 4 5 6
h(x)
7 8
z z z z zx zz
9 10
zz
11 12 13 14 15 16 17 18 19 20
=3 =6
Linear Probing
Open address means M <= N Expected items in bucket same as Chaining Expected distance till open slot: (1+(1/(1-M/N)) 2
Linear Probing
Advantages: Disadvantages:
Good locality less cache misses
As M/N increases more cache misses
Hashing
zz zz
zz
zy xzzz
8 9 10
h1(x)
11 12 13 14 15 16 17 18 19 20
z zz
1 2 3 4 5
z zz
6 7 8
zw z
9 10
zz
h2(y)
h2(x)
11 12 13 14 15 16 17 18 19 20
Cuckoo Hashing
Advantages:
Contains() : deterministic 2 buckets No clustering or contamination
2 tables hi(x) are complex As M/N increases relocation cycles Above M/N = 0.5 Add() does not work!
Disadvantages:
Hopscotch Hashing
Single Array, Simple hash function Idea: define neighborhood of original bucket In neighborhood items found quickly Use sequences of displacements to move items into their neighborhood
Hopscotch Hashing*
h(x)
z x
6 7 8
9 10
11 12 13 14 15 16 17 18 19 20
1 0 1 0 H=4
Hopscotch Hashing
h(x)
uwv z r
6
s
11 12 13 14 15 16 17 18 19 20
10 10 1
Move the empty slot via sequence of displacements into the hop-range of h(x).
x
7
9 10
1 0 10 1 0
Hopscotch Hashing
Contains(): In concurrent version operation is wait-free (just look in neighborhood) Add(): Expected distance till open slot same as in linear probing Chances of Resize() because neighborhood is full diminish as H log n, one word hopinfo bitmap, or use smaller H and default to Linear Probing
Hopscotch Hashing
Advantages:
Good locality and cache behavior Good performance as table density (M/N) increases less resizing Pay price in Add() not in frequent
Contains()
Easy to parallelize
9 10
11 12 13 14 15 16 17 18 19 20
Striped Locks
1 2 3 4 5
Contains()
9 10
11 12 13 14 15 16 17 18 19 20
11 12 13 14 15 16 17 18 19 20
1 0 0 1 ts
9 10
11 12 13 14 15 16 17 18 19 20
1 0 0 1 ts
0 10 10 1 ts+1 ts
slot using CAS, lock bucket and update timestamp of bucket being displaced before erasing old value
u zv s r
6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
1 0 0 1 ts
Wait-free Contains(x) read ts, hopinfo, goto marked buckets, if no x compare ts, if diff repeat, after k attempts search all H buckets
ops /ms
3000 2500 2000 1500 1000 500 0 0.1 Hopscotch_D Hopscotch_ND LinearProbing Chained Cuckoo 0.2 0.3 0.4 0.5 0.6
0.7
0.8
0.9
table density
ops /ms
3000 2500 2000 1500 1000 500 0 0.1 Hopscotch_D Hopscotch_ND LinearProbing Chained Cuckoo 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
table density
8 7
/ ops miss
6 5 4 3 2 1 0 0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
table density
ops /ms
2500 2000 1500 1000 500 0 0.9 Hopscotch_D Hopscotch_ND LinearProbing Chained
0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 table density
ops /ms
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
table density
ops /ms
0.2
0.3
0.4
0.5
0.6
0.7
0.8
table density
6 5
miss / ops
4 3 2 1 0 0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
table density
ops /ms
miss / ops
2 1.5 1 0.5 0 1
16
24 CPUs
32
40
48
56
64
Summary
Chained hash with striped locking is simple and effective in many cases Hopscotch with striped locking great cache behavior If incremental resizing needed go for split-ordered
150