Você está na página 1de 20

History-Independent Cuckoo Hashing

Moni Naor Gil Segev Udi Wieder

Weizmann Institute Israel

Microsoft Research Silicon Valley

Election Day

Elections for class president Each student whispers in Mr. Drews ear Mr. Drew writes down the votes

Carol Alice

Alice Bob

Carol Alice Alice Bob

Problem: Mr. Drews notebook leaks sensitive information First student voted for May compromise Carol the privacy of Second student voted the elections for Alice

Election Day

What about more involved applications? Write-in candidates Votes which are subsets or rankings .

Carol Alice

Alice Bob

Alice 11 Bob 1 Carol 1

A simple solution: Lexicographically sorted list of candidates Unary counters

Learning From History

The two levels of a data structure


Legitimate interface Memory representation

History independence
The memory representation should not reveal information that cannot be obtained using the legitimate interface

A simple example: sorted list


Alic Be Canonical memory representation ob Car Not really efficient... ol

Typical Applications

Incremental cryptography Voting


[MNS08] [MKSW06, MNS07]

[BGG94, Mic97]

Set comparison & reconciliation Computational geometry ...


[BGV08]

Our Contribution
A HI dictionary that simultaneously achieves the following:

Efficiency: Lookup time O(1) worst case Update time O(1) expected amortized Memory utilization 50% (25% with deletions) Strongest notion of history independence Simple and fast
6

Independence

Micciancio (1997): oblivious trees


Motivated by incremental cryptography Only considered the shape of the trees and not their memory representation

Naor and Teague (2001)


Memory representation Weak & strong history independence

Independence

and Teague (2001) following Macciancio (1997)

Weak history independence


Strong history independence

Memory revealed at the end of an activity period Any two sequences of operations S1 and S2 that lead to the same content induce the same distribution on the memory representation Memory revealed several times during an activity period Any two sets of breakpoints along S1 and S2 with the same content at each breakpoint, induce the same distributions on the memory representation at all these points Completely randomizing memory after each

Independence

We consider strong history independence

Canonical representation (up to initial randomness) implies SHI Other direction shown to hold for reversible data structures [HHMPR05]

Weak & strong are not equivalent

WHI for reversible data structures is possible without a canonical representation Provable efficiency gaps [BP06] (in restricted models)

SHI Dictionaries
Deleti ons

Naor & Teague 01 Blelloch & Golovin 07 Blelloch & Golovin 07 This work

Memor y utilizat ion


99%

Updat e time
O(1) expect ed O(1) expect ed O(1) expect ed O(1) expect ed

Looku p time
O(1) worst case O(1) expect ed O(1) worst case O(1) worst case

Practic al?
(mem. util. < 50%) (mem. util. < 50%)

99%

< 9%

< 25% (< 50%)

10

Our Approach

Cuckoo hashing [PR01]: A simple & practical scheme with worst case constant lookup time Force a canonical representation on cuckoo hashing

Avoid rehashing by using a small stash


No significant loss in efficiency

What happens when hash functions fail? Rehashing is problematic in SHI data structures

All hash functions need to be sampled in advance (theoretical problem) When an item is deleted, may need to roll back on previous functions
11

We use a secondary storage to reduces the

Cuckoo Hashing

Tables T1 and T2 with hash functions h1 and h2 Store x in one of T1[h1(x)] and T2[h2(x)] Insert(x): Greedily insert in T or T 1 2

If both are occupied then store x in T1

T1

Repeat in other table with the previous T1 T2 occupant


V Z

T2 V

Y W X

Succes sful insertio n

Z Y X W
12

Cuckoo Hashing

Tables T1 and T2 with hash functions h1 and h2 Store x in one of T1[h1(x)] and T2[h2(x)] Insert(x): Greedily insert in T or T 1 2

If both are occupied then store x in T1

Repeat in other table with the previous T2 occupant T1


V U Z Y X

Failure rehash requir ed


13

The Cuckoo Graph


Set S U containing n keys h1, h2 : U ! {1,...,r} S is successfully stored Every connected component has at most one cycle

Main theorem: If r (1 + )n and h1,h2 are log(n)-wise independent, then failure probability is (1/n)
Bipartite graph with sets of size r Edge (h1(x), h2(x)) for every x2S
14

Representation

Assume that S can be stored using h1 and h2 We force a canonical representation on the cuckoo graph

Suffices to consider a single connected component

Assume that S forms a tree in the cuckoo graph. Typical case One location must be empty. The choice of the empty location uniquely determines the location of Rule: h1 (minimal element) all elements is empty

a b c d e

15

Representation

Assume that S can be stored using h1 and h2 We force a canonical representation on the cuckoo graph

Suffices to consider a single connected component

Assume that S has one cycle Two ways to assign elements in the cycle Each choice uniquely determines the location of Rule: minimal element in all elements cycle lies in T1

a b c d e

16

Representation

Updates efficiently maintain the canonical representation Insertions:


New leaf: check if new element is smaller than current min new cycle: Same component Merging two components Deletions: All cases straight forward Find the new min, split component,

Update time < size of component = expected

Requires connecting all elements in the component with a sorted cyclic list Memory utilization drops to 25% All cases straight forward
17

Rehashing

What if S cannot be stored using h1 and h2 ?

Happens with probability (1/n) Rear, but very bad worst case performance Canonical memory implies we need to sample all hash functions in advance (theoretical problem) Whenever an item is deleted, need to check whether we must role back to previous hash functions A bad item which is repeatedly inserted and deleted would cause a rehash every operation!

Can we simply pick new functions?


18

Using a Stash

Whenever an insert fails, put a bad item in a secondary data structure


Bad item: smallest item that belongs to a cycle Secondary data structure must be SHI in itself

Theorem [KMW08]: Pr[|stash| > s] < n-s


In practice keeping the stash as a sorted list is probably the best solution

Effectively the query time is constant with (very) high probability

In theory the stash could be any SHI with constant lookup time

A deterministic hashing scheme, where the

19

Conclusions and Problems

Cuckoo hashing is a robust and flexible hashing scheme

Easily molded into a history independent data structure

We dont know how to do this for CH with more than 2 hash functions and/or more than 1 element per bucket

Better memory utilization, better performance, but.. Expected size of connected component is not constant

Full performance analysis

20

Você também pode gostar