History-Independent Cuckoo Hashing: Udi Wieder Moni Naor Gil Segev

History-Independent Cuckoo Hashing
Moni Naor Gil Segev Udi Wieder
Weizmann Institute Israel
Microsoft Research Silicon Valley
Election Day

Elections for class president Each student whispers in Mr. Drews ear Mr. Drew writes down the votes
Carol Alice
Alice Bob
Carol Alice Alice Bob
Problem: Mr. Drews notebook leaks sensitive information First student voted for May compromise Carol the privacy of Second student voted the elections for Alice
Election Day
What about more involved applications? Write-in candidates Votes which are subsets or rankings .
Carol Alice
Alice Bob
Alice 11 Bob 1 Carol 1
A simple solution: Lexicographically sorted list of candidates Unary counters
Learning From History
The two levels of a data structure

Legitimate interface Memory representation
History independence
The memory representation should not reveal information that cannot be obtained using the legitimate interface
A simple example: sorted list

Alic Be Canonical memory representation ob Car Not really efficient... ol
Typical Applications

Incremental cryptography Voting

[MNS08] [MKSW06, MNS07]
[BGG94, Mic97]
Set comparison & reconciliation Computational geometry ...

[BGV08]
Our Contribution
A HI dictionary that simultaneously achieves the following:
Efficiency: Lookup time O(1) worst case Update time O(1) expected amortized Memory utilization 50% (25% with deletions) Strongest notion of history independence Simple and fast
6
Independence
Micciancio (1997): oblivious trees

Motivated by incremental cryptography Only considered the shape of the trees and not their memory representation
Naor and Teague (2001)

Memory representation Weak & strong history independence
Independence
and Teague (2001) following Macciancio (1997)
Weak history independence

Strong history independence
Memory revealed at the end of an activity period Any two sequences of operations S1 and S2 that lead to the same content induce the same distribution on the memory representation Memory revealed several times during an activity period Any two sets of breakpoints along S1 and S2 with the same content at each breakpoint, induce the same distributions on the memory representation at all these points Completely randomizing memory after each
Independence
We consider strong history independence
Canonical representation (up to initial randomness) implies SHI Other direction shown to hold for reversible data structures [HHMPR05]
Weak & strong are not equivalent
WHI for reversible data structures is possible without a canonical representation Provable efficiency gaps [BP06] (in restricted models)
SHI Dictionaries
Deleti ons
Naor & Teague 01 Blelloch & Golovin 07 Blelloch & Golovin 07 This work
Memor y utilizat ion

99%
Updat e time
O(1) expect ed O(1) expect ed O(1) expect ed O(1) expect ed
Looku p time
O(1) worst case O(1) expect ed O(1) worst case O(1) worst case
Practic al?
(mem. util. < 50%) (mem. util. < 50%)
99%
< 9%
< 25% (< 50%)
10
Our Approach
Cuckoo hashing [PR01]: A simple & practical scheme with worst case constant lookup time Force a canonical representation on cuckoo hashing
Avoid rehashing by using a small stash

No significant loss in efficiency
What happens when hash functions fail? Rehashing is problematic in SHI data structures
All hash functions need to be sampled in advance (theoretical problem) When an item is deleted, may need to roll back on previous functions
11
We use a secondary storage to reduces the
Cuckoo Hashing

Tables T1 and T2 with hash functions h1 and h2 Store x in one of T1[h1(x)] and T2[h2(x)] Insert(x): Greedily insert in T or T 1 2
If both are occupied then store x in T1
T1
Repeat in other table with the previous T1 T2 occupant

V Z
T2 V
Y W X
Succes sful insertio n
Z Y X W
12
Cuckoo Hashing

Tables T1 and T2 with hash functions h1 and h2 Store x in one of T1[h1(x)] and T2[h2(x)] Insert(x): Greedily insert in T or T 1 2
If both are occupied then store x in T1
Repeat in other table with the previous T2 occupant T1

V U Z Y X
Failure rehash requir ed

13
The Cuckoo Graph

Set S U containing n keys h1, h2 : U ! {1,...,r} S is successfully stored Every connected component has at most one cycle
Main theorem: If r (1 + )n and h1,h2 are log(n)-wise independent, then failure probability is (1/n)
Bipartite graph with sets of size r Edge (h1(x), h2(x)) for every x2S
14
Representation

Assume that S can be stored using h1 and h2 We force a canonical representation on the cuckoo graph
Suffices to consider a single connected component
Assume that S forms a tree in the cuckoo graph. Typical case One location must be empty. The choice of the empty location uniquely determines the location of Rule: h1 (minimal element) all elements is empty
a b c d e
15
Representation

Assume that S can be stored using h1 and h2 We force a canonical representation on the cuckoo graph
Suffices to consider a single connected component
Assume that S has one cycle Two ways to assign elements in the cycle Each choice uniquely determines the location of Rule: minimal element in all elements cycle lies in T1
a b c d e
16
Representation

Updates efficiently maintain the canonical representation Insertions:

New leaf: check if new element is smaller than current min new cycle: Same component Merging two components Deletions: All cases straight forward Find the new min, split component,

Update time < size of component = expected
Requires connecting all elements in the component with a sorted cyclic list Memory utilization drops to 25% All cases straight forward
17
Rehashing
What if S cannot be stored using h1 and h2 ?
Happens with probability (1/n) Rear, but very bad worst case performance Canonical memory implies we need to sample all hash functions in advance (theoretical problem) Whenever an item is deleted, need to check whether we must role back to previous hash functions A bad item which is repeatedly inserted and deleted would cause a rehash every operation!
Can we simply pick new functions?

18
Using a Stash
Whenever an insert fails, put a bad item in a secondary data structure

Bad item: smallest item that belongs to a cycle Secondary data structure must be SHI in itself
Theorem [KMW08]: Pr[|stash| > s] < n-s

In practice keeping the stash as a sorted list is probably the best solution
Effectively the query time is constant with (very) high probability
In theory the stash could be any SHI with constant lookup time
A deterministic hashing scheme, where the
19
Conclusions and Problems
Cuckoo hashing is a robust and flexible hashing scheme
Easily molded into a history independent data structure
We dont know how to do this for CH with more than 2 hash functions and/or more than 1 element per bucket
Better memory utilization, better performance, but.. Expected size of connected component is not constant
Full performance analysis
20

History-Independent Cuckoo Hashing: Udi Wieder Moni Naor Gil Segev

Enviado por

Dados do documento

Descrição original:

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

History-Independent Cuckoo Hashing: Udi Wieder Moni Naor Gil Segev

Enviado por

Direitos autorais:

Formatos disponíveis

History-Independent Cuckoo Hashing

Moni Naor Gil Segev Udi Wieder

Weizmann Institute Israel

Microsoft Research Silicon Valley

Carol Alice Alice Bob

Alice 11 Bob 1 Carol 1

A simple solution: Lexicographically sorted list of candidates Unary counters

Learning From History

The two levels of a data structure

Legitimate interface Memory representation

A simple example: sorted list

Alic Be Canonical memory representation ob Car Not really efficient... ol

Incremental cryptography Voting

Set comparison & reconciliation Computational geometry ...

Micciancio (1997): oblivious trees

Naor and Teague (2001)

Memory representation Weak & strong history independence

and Teague (2001) following Macciancio (1997)

Weak history independence

Strong history independence

We consider strong history independence

Weak & strong are not equivalent

Memor y utilizat ion

< 25% (< 50%)

Avoid rehashing by using a small stash

No significant loss in efficiency

We use a secondary storage to reduces the

If both are occupied then store x in T1

Repeat in other table with the previous T1 T2 occupant

Succes sful insertio n

If both are occupied then store x in T1

Repeat in other table with the previous T2 occupant T1

Failure rehash requir ed

The Cuckoo Graph

Suffices to consider a single connected component

Suffices to consider a single connected component

Updates efficiently maintain the canonical representation Insertions:

Update time < size of component = expected

What if S cannot be stored using h1 and h2 ?

Can we simply pick new functions?

Whenever an insert fails, put a bad item in a secondary data structure

Theorem [KMW08]: Pr[|stash| > s] < n-s

Effectively the query time is constant with (very) high probability

A deterministic hashing scheme, where the

Conclusions and Problems

Cuckoo hashing is a robust and flexible hashing scheme

Easily molded into a history independent data structure

Full performance analysis

Você também pode gostar