Escolar Documentos
Profissional Documentos
Cultura Documentos
Hash Tables
www.azulsystems.com
Constant-Time Key-Value Mapping Fast arbitrary function Extendable, defined at runtime Used for symbol tables, DB caching, network access, url caching, web content, etc Crucial for Large Business Applications > 1MLOC Used in Very heavily multi-threaded apps > 1000 threads
Java's HashTable
Single threaded; scaling bottleneck
HashMap
Faster but NOT multi-thread safe Striped internal locks; 16-way the default
java.util.concurrent.HashMap
Azul, IBM, Sun sell machines >100cpus Azul has customers using all cpus in same app Becomes a scaling bottleneck!
| 3 2003 Azul Systems, Inc.
Slightly faster than j.u.c for 99% reads < 32 cpus Faster with more cpus (2x faster)
Even with 4096-way striping 10x faster with default striping
3x Faster for 95% reads (30x vs default) 8x Faster for 75% reads (100x vs default) Scales well up to 768 cpus, 75% reads
Approaches hardware bandwidth limits
Agenda
www.azulsystems.com
Motivation Uninteresting Hash Table Details State-Based Reasoning Resize Performance Q&A
Hashtable: A collection of Key/Value Pairs Works with any collection Scaling, locking, bottlenecks of the collection
management responsibility of that collection Must be fast or O(1) effects kill you Must be cache-aware I'll present a sample Java solution But other solutions can work, make sense
Uninteresting Details
www.azulsystems.com
Key & Value on same cache line Hash memoized No allocation on get() or put() Auto-Resize
| 8 2003 Azul Systems, Inc.
idx = hash = key.hashCode(); while( true ) { idx &= (size-1); k = get_key(idx); h = get_hash(idx); // reprobing loop // limit idx to table size // start cache miss early // memoized hash
if( k == key || (h == hash && key.equals(k)) ) return get_val(idx);// return matching value if( k == null ) return null; idx++; } // reprobe
Uninteresting Details
www.azulsystems.com
Agenda
www.azulsystems.com
Motivation Uninteresting Hash Table Details State-Based Reasoning Resize Performance Q&A
State-Based Reasoning
www.azulsystems.com
Define all {Key,Value} states and transitions Don't Care about memory ordering:
get() can read Key, Value in any order put() can change Key, Value in any order put() must use CAS to change Key or Value But not double-CAS (sometimes stronger guarantees are wanted
Valid States
www.azulsystems.com
null empty K some Key; can never change again null empty T tombstone V some Values
State Machine
www.azulsystems.com
{K,V}
idx = hash = key.hashCode(); while( true ) { idx &= (size-1); k = get_key(idx); if( k == null && break; h = get_hash(idx); break; idx++; }
| 16 2003 Azul Systems, Inc.
// Key-Claim stanza // State: {k,?} // {null,?} -> {key,?} // State: {key,?} // get memoized hash // State: {key,?} // reprobe
CAS_key(idx,null,key) )
// State: {key,?} oldval = get_val(idx); // State: {key,oldval} // Transition: {key,oldval} -> {key,newval} if( CAS_val(idx,oldval,newval) ) { // Transition worked ... } else { // Transition failed; oldval has changed // We can act as if our put() worked but // was immediately stomped over } return oldval;
| 17 2003 Azul Systems, Inc.
// Adjust size
No chance of returning Value for wrong Key Means Keys leak; table fills up with dead Keys Fix in a few slides... Bring Your Own Ordering/Synchronization Means reader got an empty key and so missed But possibly prefetched wrong Value
There is no machine-wide coherent State! Nobody guaranteed to read the same State No need for it either Consider degenerate case of a single Key Same guarantees as:
single shared global variable many readers & writers, no synchronization i.e., darned little
Similar to declaring that shared global 'volatile' Things written into a Value before put()
Are guaranteed to be seen after a get() free on Sparc, X86 free on Azul
Requires st/st fence before CAS'ing Value Requires ld/ld fence after loading Value
Agenda
www.azulsystems.com
Motivation Uninteresting Hash Table Details State-Based Reasoning Resize Performance Q&A
Need to resize if table gets full Or just re-probing too often Resize copies live K/V pairs
Doubles as cleanup of dead Keys Resize (cleanse) after any delete Throttled, once per GC cycle is plenty often
Alas, need fencing, 'happens before' Hard bit for concurrent resize & put():
Resizing
www.azulsystems.com
Expand State Machine Side-effect: mid-resize is a valid State Means resize is:
Concurrent readers can help, or just read&go Parallel all can help Incremental partial copy is OK So want to finish the job eventually
Pay an extra indirection while resize in progress Stacked partial resizes OK, expected
| 23 2003 Azul Systems, Inc.
put() or other mod must use new table Must check for new table every time
Copying K/V pairs is independent of get/put Copy has many heuristics to choose from:
background thread(s), etc
Means: not in old table, check new null, K X 'use new table', not any valid Key null K OR null X null, T, V X 'use new table', not any valid Value null {T,V}* X
{X,null}
check newer table kill {K,X} copy {K,V} into newer table
change
{K,V}
alive, but old {K,V1} old table new table {K,V2} {K,V2} {K,X}
Prime'd values in new table copied from old Non-prime in new table is recent put() happens after any prime'd value Prime allows 2-phase commit Engineering: wrapper class (Java), steal bit (C)
null, T, V, X T',V' primed versions of T & V Old things copied into the new table 2-phase commit null {T',V'}* {T,V}* X
{null,null} Empty insert {K,null} copy in from older table {K,T'/V'} {null,T/T'/V/V'/X} Partially inserted K/V pair
| 32 2003 Azul Systems, Inc.
{X,null}
check newer table kill {K,X} copy {K,V} into newer table
{null,null} Empty insert {K,null} copy in from older table {K,T'/V'} {null,T/T'/V/V'/X} Partially inserted K/V pair
| 33 2003 Azul Systems, Inc.
{X,null}
check newer table kill {K,X} copy {K,V} into newer table
{null,null} Empty insert {K,null} copy in from older table {K,T'/V'} {null,T/T'/V/V'/X} Partially inserted K/V pair
| 34 2003 Azul Systems, Inc.
{X,null}
check newer table kill {K,X} copy {K,V} into newer table
{K,V1} read V1 old new read V'x {K,V'x} {K,V'1} partial copy
{K,X}
Fence
Fence
or V' or T' (if nested resize in progress) Means recent put() overwrote any old Value Means either put() or other copy in progress So this copy can quit And CAS to the next state
Skip copy if new Value is not prime'd If CAS into new fails
Agenda
www.azulsystems.com
Motivation Uninteresting Hash Table Details State-Based Reasoning Resize Performance Q&A
Microbenchmark
www.azulsystems.com
Measure insert/lookup/remove of Strings Tight loop: no work beyond HashTable itself and
test harness (mostly RNG) Guaranteed not to exceed numbers All fences; full ConcurrentHashMap semantics Variables: 99% get, 1% put (typical cache) vs 75 / 25 Dual Athalon, Niagara, Azul Vega1, Vega2 Threads from 1 to 800 NonBlocking vs 4096-way ConcurrentHashMap 1K entry table vs 1M entry table
| 38 2003 Azul Systems, Inc.
1K Table
30
1M Table
30
25
25
20
20
M-ops/sec
15
15
10
10
NB CHM
1 2 3 4 5 6 7 8
0 1 2 3 4 5 6 7 8
Threads
Threads
1K Table
80 70 60 80 70 60
1M Table
M-ops/sec
M-ops/sec
50 40 30 20 10 0
CHM-99, NB-99
50 40 30 20 10 0
NB CHM
CHM-75, NB-75
16
24
32
40
48
56
64
16
24
32
40
48
56
64
Threads
Threads
1K Table
500 500
1M Table
NB-99
400
400
M-ops/sec
CHM-99
200
M-ops/sec
300
300
200
NB-75
100 100
NB
CHM-75
0 0 100 200 300 400 0 0 100 200 300
CHM
400
Threads
Threads
1K Table
1200 1200 1000
1M Table
NB-99
1000
800
800
M-ops/sec
600
CHM-99
M-ops/sec
600
400
400
NB-75
200
NB
200
CHM-75
0 0 0 100 200 300 400 500 600 700 800 0 100 200 300 400 500
CHM
600 700 800
Threads
Threads
Summary
www.azulsystems.com
A faster lock-free HashTable Faster for more CPUs Much faster for higher table modification rate State-Based Reasoning: No ordering, no JMM, no fencing Any thread can see any state at any time Must assume values change at each step State graphs really helped coding & debugging Resulting code is small & fast
Summary
www.azulsystems.com
Obvious future work: Tools to check states Tools to write code Seems applicable to other data structures as well Concurrent append j.u.Vector Scalable near-FIFO work queues Code & Video available at:
http://blogs.azulsystems.com/cliff/
WWW.AZULSYSTEMS.COM
Thank You