Você está na página 1de 3

Computer Science CSC263H March 17, 2011

St. George Campus University of Toronto

Solutions for Homework Assignment #3

Answer to Question 1.
a. Initially, the expected length of the chain in an arbitrary slot T [i] is n/m (this follows from the assump-
tion that keys have a uniform distribution and the properties of the hash function h).
Since the keys in T [0] are redistributed in T1 , we need to determine where they go. Let k be any key
in T [0]. Since k ∈ T [0], k ≡ 0 (mod m). Thus, either k ≡ 0 (mod 2m) or k ≡ m (mod 2m). To see this,
since k ≡ 0 (mod m), we have k = cm for some integer c ≥ 0. If c is even, i.e., c = 2c0 for some integer
c0 ≥ 0, then k = 2c0 m and so k ≡ 0 (mod 2m). If c is odd, i.e., c = 2c0 + 1 for some integer c0 ≥ 0, then
k = 2c0 m + m and so k ≡ m (mod 2m). We conclude that, each key k in T [0] is redistributed to either
T1 [0] or T1 [m].
To determine how many keys end up in T1 [0] and how many in T1 [m], we use again the assumption that
keys have a uniform distribution over 0 to 2256 − 1: we expect of those equivalent to 0 (mod m), about
half will be equivalent to 0 (mod 2m) and about half will be equivalent to m (mod 2m).
Therefore, for 0 ≤ i ≤ m,
(
n
if i = 0 or i = m
E length of chain in T1 [i] = 2m
 
n
m if 0 < i < m

b. Let k be a key and 1 ≤ s ≤ m. There are two possible cases. If h(k) ≥ s, then, by the definition of hs ,
hs (k) = k mod m, and so hs (k) ≤ m − 1.
Now suppose h(k) < s. Then, by the definition of hs , we have hs (k) = k mod 2m. Since h(k) < s,
then, by definition of h, k mod m < s. So k = cm + r for some integers c ≥ 0 and 0 ≤ r < s. There are
two possible subcases.
If c is even, i.e., c = 2c0 for some integer c0 ≥ 0, then k = 2c0 m + r. Since hs (k) = k mod 2m then
hs (k) = r < s.
If c is odd, i.e., c = 2c0 + 1 for some integer c0 ≥ 0, then k = 2c0 m + m + r. Since hs (k) = k mod 2m
then hs (k) = m + r < m + s.
Thus, in all the above cases, hs (k) < m + s.
To show hs (k) ≥ 0, we note that the mod function yields values no less than 0.
c. Suppose we are searching for the key k, which is not present in the hash table.
By our assumption, the value of h(k) is uniformly distributed over the range 0 through m − 1. Likewise,
the value of h0 (k) is uniformly distributed over the range 0 through 2m − 1. However, the value of hs (k)
is not uniformly distributed.
With probability m s
, hs uses h0 to compute the correct slot for k in the hash table, and with probability
m−s 0
m , hs uses h to compute the slot. When hs uses h to compute the proper slot, the expected length of
n n
the chain encountered is 2m . When hs uses h, the expected length of the chain encountered is m .
So,
E[number of comparisons for Search(k)]
= P r[uses h0 ] · E[length of “split-slot” chain] + P r[uses h] · E[length of “non-split” chain]
s n m−s n
= · + ·
m 2m m m
sn n sn
= 2
+ − 2
2m m  m
n 1 s
= 1−
m 2m
Notice that if s = m, this simplifies to 12 n/m, which is what we expect the average chain length would
be if we used a hash table whose size is twice the size of T .

1
Answer to Question 2.
a. We use a hash table T of size m (more about m later).

For each j, 1 ≤ j ≤ n: Insert A[j] in table T


IntersectionSize := 0
For each j, 1 ≤ j ≤ n:
Search B[j] in table T
If B[j] ∈ T then IntersectionSize := IntersectionSize + 1
output IntersectionSize

b. Assume:

1. SUHA (Simple Hashing Uniform Assumption): Distinct elements of A are equally likely to hash into
any of the m slots of T .

2. m (the size of T ) is “proportional” to n, more precisely m is Θ(n) (actually m ∈ Ω(n) would suffice).

3. Computing the hash function on each key takes Θ(1) time.

Each “Insert A[j] in table T ” takes Θ(1) time. So inserting all of A into T takes Θ(n) time.
By Assumption 1., after entering all the n elements into T of size m, the expected length of each chain
is n/m.
So the expected running time for each “Search B[j] in table T ” is Θ(n/m)
By assumption 2, Θ(n/m) = Θ(1).
So the expected running time for each “Search B[j] in table T ” is Θ(1), and the expected time to do
so for all j, 1 ≤ j ≤ n is Θ(n).
So the overall expected time is Θ(n).

Answer to Question 3.
a. Gamberino’s claim is correct.

1. Cost of Finds.
With WU the depth of each tree is O(log n).
So each Find takes O(log n) time.
So m Finds take O(m log n) time.

2. Cost of Unions.
Each Union takes O(1) time, so n − 1 Unions take O(n) time.

So any sequence of n − 1 Unions and m ≥ n Finds take O(m log n) time.


b. Merluzzo’s claim is correct.
General Idea:
There is a sequence σ of n − 1 Unions followed by m ≥ n Finds that takes Ω(m log n) time.
The sequence σ starts with a sequence n − 1 Unions that builds a tree with depth Ω(log n) .
Then it has m ≥ n Finds, each one for the deepest leaf of that tree, so each one takes Ω(log n) time.

2
More detailed argument:

1. Let k = blog2 nc. Note that k = Ω(log n).


The sequence σ starts with 2k − 1 Unions that build an Sk tree with 2k nodes and with depth
k = Ω(log n), as follows:
Do Unions of distinct pairs of elements: this gives S1 trees, with 2 nodes each, and with depth 1.
Do Unions of pairs of S1 trees: this gives S2 trees, with 22 = 4 nodes each, and with depth 2.
Do Unions of pairs of S2 trees: this gives S3 trees with 23 = 8 nodes each, and with depth 3.
.
.
.
Do a Union of a pair of Sk−1 trees: this gives an Sk tree with 2k nodes, and with depth k.

2. Then the sequence σ has m ≥ n Finds, each one for the leaf at depth k of the Sk tree.
Each such Find takes Ω(k) = Ω(log n) time, so m ≥ n Finds that takes Ω(m log n) time.

Você também pode gostar