Você está na página 1de 45

2007 JavaOne Conference

A Lock-Free Hash Table

A Lock-Free Hash Table

Dr. Cliff Click


Chief JVM Architect & Distinguished Engineer blogs.azulsystems.com/cliff Azul Systems May 8, 2007

Hash Tables
www.azulsystems.com

Constant-Time Key-Value Mapping Fast arbitrary function Extendable, defined at runtime Used for symbol tables, DB caching, network access, url caching, web content, etc Crucial for Large Business Applications > 1MLOC Used in Very heavily multi-threaded apps > 1000 threads

2 2003 Azul Systems, Inc.

Popular Java Implementations


www.azulsystems.com

Java's HashTable
Single threaded; scaling bottleneck

HashMap

Faster but NOT multi-thread safe Striped internal locks; 16-way the default

java.util.concurrent.HashMap

Azul, IBM, Sun sell machines >100cpus Azul has customers using all cpus in same app Becomes a scaling bottleneck!
| 3 2003 Azul Systems, Inc.

A Lock-Free Hash Table


www.azulsystems.com

No locks, even during table resize


No spin-locks No blocking while holding locks All CAS spin-loops bounded Make progress even if other threads die.... Requires atomic update instruction: CAS (Compare-And-Swap), LL/SC (Load-Linked/Store-Conditional, PPC only) or similar Uses sun.misc.Unsafe for CAS

4 2003 Azul Systems, Inc.

A Faster Hash Table


www.azulsystems.com

Slightly faster than j.u.c for 99% reads < 32 cpus Faster with more cpus (2x faster)
Even with 4096-way striping 10x faster with default striping

3x Faster for 95% reads (30x vs default) 8x Faster for 75% reads (100x vs default) Scales well up to 768 cpus, 75% reads
Approaches hardware bandwidth limits

5 2003 Azul Systems, Inc.

Agenda
www.azulsystems.com

Motivation Uninteresting Hash Table Details State-Based Reasoning Resize Performance Q&A

6 2003 Azul Systems, Inc.

Some Uninteresting Details


www.azulsystems.com

Hashtable: A collection of Key/Value Pairs Works with any collection Scaling, locking, bottlenecks of the collection

management responsibility of that collection Must be fast or O(1) effects kill you Must be cache-aware I'll present a sample Java solution But other solutions can work, make sense

7 2003 Azul Systems, Inc.

Uninteresting Details
www.azulsystems.com

Closed Power-of-2 Hash Table


Reprobe on collision Stride-1 reprobe: better cache behavior

Key & Value on same cache line Hash memoized No allocation on get() or put() Auto-Resize
| 8 2003 Azul Systems, Inc.

Should be same cache line as K + V But hard to do in pure Java

Example get() code


www.azulsystems.com

idx = hash = key.hashCode(); while( true ) { idx &= (size-1); k = get_key(idx); h = get_hash(idx); // reprobing loop // limit idx to table size // start cache miss early // memoized hash

if( k == key || (h == hash && key.equals(k)) ) return get_val(idx);// return matching value if( k == null ) return null; idx++; } // reprobe

9 2003 Azul Systems, Inc.

Uninteresting Details
www.azulsystems.com

Could use prime table + MOD


Better hash spread, fewer reprobes But MOD is 30x slower than AND

Could use open table


put() requires allocation Follow 'next' pointer instead of reprobe Each 'next' is a cache miss Lousy hash -> linked-list traversal Could put Key/Value/Hash on same cache line Other variants possible, interesting

| 10 2003 Azul Systems, Inc.

Agenda
www.azulsystems.com

Motivation Uninteresting Hash Table Details State-Based Reasoning Resize Performance Q&A

| 11 2003 Azul Systems, Inc.

Ordering and Correctness


www.azulsystems.com

How to show table mods correct?


put, putIfAbsent, change, delete, etc.

Prove via: fencing, memory model, load/store


ordering, happens-before? Instead prove* via state machine Define all possible {Key,Value} states Define Transitions, State Machine Show all states legal
*Warning: hand-wavy proof follows
| 12 2003 Azul Systems, Inc.

State-Based Reasoning
www.azulsystems.com

Define all {Key,Value} states and transitions Don't Care about memory ordering:

get() can read Key, Value in any order put() can change Key, Value in any order put() must use CAS to change Key or Value But not double-CAS (sometimes stronger guarantees are wanted

No fencing required for correctness!


and will need fencing) Proof is simple!
| 13 2003 Azul Systems, Inc.

Valid States
www.azulsystems.com

A Key slot is:

null empty K some Key; can never change again null empty T tombstone V some Values

A Value slot is:

A state is a {Key,Value} pair A transition is a successful CAS


| 14 2003 Azul Systems, Inc.

State Machine
www.azulsystems.com

{null,null} Empty insert {K,null} Partially inserted K/V pair change

{K,T} deleted key delete


insert

{null,T/V} Partially inserted K/V pair Reader-only state

{K,V}

Standard K/V pair

| 15 2003 Azul Systems, Inc.

Example put(key,newval) code:


www.azulsystems.com

idx = hash = key.hashCode(); while( true ) { idx &= (size-1); k = get_key(idx); if( k == null && break; h = get_hash(idx); break; idx++; }
| 16 2003 Azul Systems, Inc.

// Key-Claim stanza // State: {k,?} // {null,?} -> {key,?} // State: {key,?} // get memoized hash // State: {key,?} // reprobe

CAS_key(idx,null,key) )

if( k == key || (h == hash && key.equals(k)) )

Example put(key,newval) code


www.azulsystems.com

// State: {key,?} oldval = get_val(idx); // State: {key,oldval} // Transition: {key,oldval} -> {key,newval} if( CAS_val(idx,oldval,newval) ) { // Transition worked ... } else { // Transition failed; oldval has changed // We can act as if our put() worked but // was immediately stomped over } return oldval;
| 17 2003 Azul Systems, Inc.

// Adjust size

Some Things to Notice


www.azulsystems.com

Once a Key is set, it never changes

No chance of returning Value for wrong Key Means Keys leak; table fills up with dead Keys Fix in a few slides... Bring Your Own Ordering/Synchronization Means reader got an empty key and so missed But possibly prefetched wrong Value

No ordering guarantees provided!

Weird {null,V} state meaningful but uninteresting

| 18 2003 Azul Systems, Inc.

Some Things to Notice


www.azulsystems.com

There is no machine-wide coherent State! Nobody guaranteed to read the same State No need for it either Consider degenerate case of a single Key Same guarantees as:

Except on the same CPU with no other writers

single shared global variable many readers & writers, no synchronization i.e., darned little

| 19 2003 Azul Systems, Inc.

A Slightly Stronger Guarantee


www.azulsystems.com

Probably want happens-before on Values


java.util.concurrent provides this

Similar to declaring that shared global 'volatile' Things written into a Value before put()
Are guaranteed to be seen after a get() free on Sparc, X86 free on Azul

Requires st/st fence before CAS'ing Value Requires ld/ld fence after loading Value

| 20 2003 Azul Systems, Inc.

Agenda
www.azulsystems.com

Motivation Uninteresting Hash Table Details State-Based Reasoning Resize Performance Q&A

| 21 2003 Azul Systems, Inc.

Resizing The Table


www.azulsystems.com

Need to resize if table gets full Or just re-probing too often Resize copies live K/V pairs

Doubles as cleanup of dead Keys Resize (cleanse) after any delete Throttled, once per GC cycle is plenty often

Alas, need fencing, 'happens before' Hard bit for concurrent resize & put():

Must not drop the last update to old table

| 22 2003 Azul Systems, Inc.

Resizing
www.azulsystems.com

Expand State Machine Side-effect: mid-resize is a valid State Means resize is:

Concurrent readers can help, or just read&go Parallel all can help Incremental partial copy is OK So want to finish the job eventually

Pay an extra indirection while resize in progress Stacked partial resizes OK, expected
| 23 2003 Azul Systems, Inc.

get/put during Resize


www.azulsystems.com

get() works on the old table


Unless see a sentinel

put() or other mod must use new table Must check for new table every time

Late writes to old table 'happens before' resize

Copying K/V pairs is independent of get/put Copy has many heuristics to choose from:
background thread(s), etc

All touching threads, only writers, unrelated

| 24 2003 Azul Systems, Inc.

New State: 'use new table' Sentinel


www.azulsystems.com

X: sentinel used during table-copy A Key slot is:

Means: not in old table, check new null, K X 'use new table', not any valid Key null K OR null X null, T, V X 'use new table', not any valid Value null {T,V}* X

A Value slot is:

| 25 2003 Azul Systems, Inc.

State Machine old table


www.azulsystems.com

{null,null} Empty insert {K,null}

kill {K,T} Deleted key delete


insert

{X,null}

check newer table kill {K,X} copy {K,V} into newer table

{null,T/V/X} Partially inserted K/V pair


| 26 2003 Azul Systems, Inc.

change

{K,V}

Standard K/V pair States {X,T/V/X} not possible

State Machine: Copy One Pair


www.azulsystems.com

empty {null,null} {X,null}

| 27 2003 Azul Systems, Inc.

State Machine: Copy One Pair


www.azulsystems.com

empty {null,null} dead or partially inserted {K,T/null} {K,X} {X,null}

| 28 2003 Azul Systems, Inc.

State Machine: Copy One Pair


www.azulsystems.com

empty {null,null} dead or partially inserted {K,T/null} {K,X} {X,null}

alive, but old {K,V1} old table new table {K,V2} {K,V2} {K,X}

| 29 2003 Azul Systems, Inc.

Copying Old To New


www.azulsystems.com

New States V', T' primed versions of V,T

Prime'd values in new table copied from old Non-prime in new table is recent put() happens after any prime'd value Prime allows 2-phase commit Engineering: wrapper class (Java), steal bit (C)

Must be sure to copy late-arriving old-table write Attempt to copy atomically


May fail & copy does not make progress But old, new tables not damaged

| 30 2003 Azul Systems, Inc.

New States: Prime'd


www.azulsystems.com

A Key slot is:


null, K, X

A Value slot is:

null, T, V, X T',V' primed versions of T & V Old things copied into the new table 2-phase commit null {T',V'}* {T,V}* X

State Machine again...

| 31 2003 Azul Systems, Inc.

State Machine new table


www.azulsystems.com

{null,null} Empty insert {K,null} copy in from older table {K,T'/V'} {null,T/T'/V/V'/X} Partially inserted K/V pair
| 32 2003 Azul Systems, Inc.

kill {K,T} Deleted key delete


insert

{X,null}

check newer table kill {K,X} copy {K,V} into newer table

{K,V} Standard K/V pair States {X,T/T'/V/V'/X} not possible

State Machine new table


www.azulsystems.com

{null,null} Empty insert {K,null} copy in from older table {K,T'/V'} {null,T/T'/V/V'/X} Partially inserted K/V pair
| 33 2003 Azul Systems, Inc.

kill {K,T} Deleted key delete


insert

{X,null}

check newer table kill {K,X} copy {K,V} into newer table

{K,V} Standard K/V pair States {X,T/T'/V/V'/X} not possible

State Machine new table


www.azulsystems.com

{null,null} Empty insert {K,null} copy in from older table {K,T'/V'} {null,T/T'/V/V'/X} Partially inserted K/V pair
| 34 2003 Azul Systems, Inc.

kill {K,T} Deleted key delete


insert

{X,null}

check newer table kill {K,X} copy {K,V} into newer table

{K,V} Standard K/V pair States {X,T/T'/V/V'/X} not possible

State Machine: Copy One Pair


www.azulsystems.com

{K,V1} read V1 old new read V'x {K,V'x} {K,V'1} partial copy

K,V' in new table X in old table

{K,X}

{K,V'1} {K,V1} copy complete

| 35 2003 Azul Systems, Inc.

Fence

Fence

Some Things to Notice


www.azulsystems.com

Old value could be V or T

or V' or T' (if nested resize in progress) Means recent put() overwrote any old Value Means either put() or other copy in progress So this copy can quit And CAS to the next state

Skip copy if new Value is not prime'd If CAS into new fails

Any thread can see any state at any time

| 36 2003 Azul Systems, Inc.

Agenda
www.azulsystems.com

Motivation Uninteresting Hash Table Details State-Based Reasoning Resize Performance Q&A

| 37 2003 Azul Systems, Inc.

Microbenchmark
www.azulsystems.com

Measure insert/lookup/remove of Strings Tight loop: no work beyond HashTable itself and

test harness (mostly RNG) Guaranteed not to exceed numbers All fences; full ConcurrentHashMap semantics Variables: 99% get, 1% put (typical cache) vs 75 / 25 Dual Athalon, Niagara, Azul Vega1, Vega2 Threads from 1 to 800 NonBlocking vs 4096-way ConcurrentHashMap 1K entry table vs 1M entry table
| 38 2003 Azul Systems, Inc.

AMD 2.4GHz 2 (ht) cpus


www.azulsystems.com

1K Table
30

1M Table
30

25

NB-99 CHM-99 NB-75 CHM-75


M-ops/sec

25

20

20

M-ops/sec

15

15

10

10

NB CHM
1 2 3 4 5 6 7 8

0 1 2 3 4 5 6 7 8

Threads

Threads

| 39 2003 Azul Systems, Inc.

Niagara 8x4 cpus


www.azulsystems.com

1K Table
80 70 60 80 70 60

1M Table

M-ops/sec

M-ops/sec

50 40 30 20 10 0

CHM-99, NB-99

50 40 30 20 10 0

NB CHM

CHM-75, NB-75

16

24

32

40

48

56

64

16

24

32

40

48

56

64

Threads

Threads

| 40 2003 Azul Systems, Inc.

Azul Vega1 384 cpus


www.azulsystems.com

1K Table
500 500

1M Table
NB-99

400

400

M-ops/sec

CHM-99

200

M-ops/sec

300

300

200

NB-75
100 100

NB

CHM-75
0 0 100 200 300 400 0 0 100 200 300

CHM
400

Threads

Threads

| 41 2003 Azul Systems, Inc.

Azul Vega2 768 cpus


www.azulsystems.com

1K Table
1200 1200 1000

1M Table
NB-99
1000

800

800

M-ops/sec

600

CHM-99

M-ops/sec

600

400

400

NB-75
200

NB

200

CHM-75
0 0 0 100 200 300 400 500 600 700 800 0 100 200 300 400 500

CHM
600 700 800

Threads

Threads

| 42 2003 Azul Systems, Inc.

Summary
www.azulsystems.com

A faster lock-free HashTable Faster for more CPUs Much faster for higher table modification rate State-Based Reasoning: No ordering, no JMM, no fencing Any thread can see any state at any time Must assume values change at each step State graphs really helped coding & debugging Resulting code is small & fast

| 43 2003 Azul Systems, Inc.

Summary
www.azulsystems.com

Obvious future work: Tools to check states Tools to write code Seems applicable to other data structures as well Concurrent append j.u.Vector Scalable near-FIFO work queues Code & Video available at:
http://blogs.azulsystems.com/cliff/

| 44 2003 Azul Systems, Inc.

#1 Platform for Business Critical Java

WWW.AZULSYSTEMS.COM
Thank You

Você também pode gostar