# Hashing

Concept of Hashing

a hash table, or a hash map, is a data structure that associates keys (names) with values (attributes).

Example A small phone book as a hash table. .

. value) Each pair has a unique key.Dictionaries  Collection of pairs.   (key.

Just An Idea  Hash table :   Collection of pairs. Lookup function (Hash function) .

Hashing  Key-value pairs are stored in a fixed size table called a hash table. Each slot holds one record.  A hash table is partitioned into many buckets. A hash function f(x) transforms the identifier (key) into an address in the hash table    . Each bucket has many slots.

. . b-1 . . . . .Hash table s slots 0 0 1 1 s-1 . . . . . b buckets . . . .

.   Uses a hash function f that converts each key k into an index in the range [0. b-1]. Every dictionary pair (key. A bucket can normally hold only one dictionary pair.   Each position of this array is a bucket.Ideal Hashing  Uses an array table[0:b-1]. element) is stored in its home bucket table[f[key]].

a) [0] [1] [2] [3] [4] [5] [6] [7] 9 .c).d) [4] [5] [6] [7] (72.d).(85.(33. b = 8 (where b is the number of positions in the hash table) Hash function f is key % b = key % 8 Where are the pairs stored?   [0] [1] [2] [3] (3.f) (22.a).(72.Ideal Hashing Example   Pairs are: (22.e) (33.c) (85.(3.e).f) Hash table is ht[0:7].

What Can Go Wrong? .a)   [0] [1] [2] [3] [4] [5] [6] [7] Where does (25.Collision (72.f) (22.e) (33.c)  This situation is called collision  Keys that have the same home bucket are called synonyms  25 and 33 are synonyms with respect to the hash function that is in use 10 .c) (3.g) is already occupied by (33.g) go? The home bucket for (25.d) (85.

e) (33.d) (85.a) [0]  [1] [2] [3] [4] [5] [6] [7] A collision occurs when the home bucket for a new pair is occupied by a pair with different key An overflow occurs when there is no space in the home bucket for the new pair When a bucket can hold only one pair.What Can Go Wrong? Overflow (72.f) (22. collisions and overflows occur together Need a method to handle overflows 11    .c) (3.

Some Issues  Choice of hash function.  Overflow handling method.  .) Size (number of buckets) of hash table. Overflow: there is no space in the bucket for the new pair.   To avoid collision (two different pairs are in the same the same bucket.

Choice of Hash Function  Requirements   easy to compute minimal number of collisions  A good hashing function distributes the key values uniformly throughout the range. .

Some hash functions  Division:  Choose a number m(PRIME number) larger than the number n of keys in K.g. Of Address = 100( 0 -99) Let m =97 H(3205) = 3205 mod 97 . 7148. The Hash function H is defined by H(k) = K(mod m) E. K=3205. 2345 & No.

.203. x=12320324111220. then return the address H(k)=123+203+241+112+20=699 .. +kr e.g.20..112.Some hash functions Folding: Partition the key k into several parts.241. partition k into 123. and add the parts together to obtain the hash address H(k) = k1+k2+ .

k2 = 10 272 025 H(k) = 72 7148.Some hash functions Mid Square: The key k is squared. E. Then the Hsh function H is defined byH(k) = l Where l is obtained by deleting digits from both ends of k2 e.g. 2345 . K= 3205.g.

  Array linear list.Overflow Handling   An overflow occurs when the home bucket for a new pair (key. element) is full. Rehashing  Eliminate overflows by permitting each bucket to keep a list of all pairs for which it is the home bucket. Quadratic probing.    Linear probing (linear open addressing). We may handle overflows by:  Search the hash table in some systematic fashion for a bucket that is not full. . Chain.

Linear probing (linear open addressing)  Open addressing ensures that all elements are stored directly into the hash table.  . thus it attempts to resolve collisions using various methods. Linear Probing resolves collisions by placing the data into the next open slot in the table.

33. 28. 12. 30.Linear Probing – Get And Insert   divisor = b (number of buckets) = 17. 7. 23. 29. 45 . 34. 4 6 8 23 7 12 16 28 12 29 11 30 33 0 34 0 45 • Insert pairs whose keys are 6. 11. 0. Home bucket = key % 17.

if (i == hash_value) { fprintf(stderr. exit(1). } } ht[i] = item.3) void linear_insert(element item.key)) { fprintf(stderr. element ht[]){ int i. “Duplicate entry\n”). exit(1). } i = (i+1)%TABLE_SIZE. “The table is full\n”).key).key.key)) { if (!strcmp(ht[i]. item. while(strlen(ht[i]. hash_value.Linear Probing (program 8. } . i = hash_value = hash(item.

Problem of Linear Probing   Identifiers tend to cluster together Increase the search time .

..1. h+ i2 ....e... h+4 .Quadratic Probing   Quadratic probing uses a quadratic function of i as the increment (H(x)+i2)%b for H(K) = h For i = 0. h+1.. 2 -----i. h..

Hm in sequence if collision occurs. …. then a second hash function is calculated and combined with the first hash function. Double hashing is one of the best methods for dealing with collisions. i) = (H1(k) + i H2(k) ) % m . Here Hi is a hash function.   If the slot is full.Rehashing   Rehashing: Try H1. H2. H(k.

.Data Structure for Chaining The idea of Chaining is to combine the linked list and hash table to solve the overflow problem.

Hashing with Chains    Hash table can handle overflows using chaining Each bucket keeps a chain of all pairs for which it is the home bucket. The chain may or may not be sorted by key 25 .

23.45 Home bucket = key % 17.29.7. 33.0. 28.34.11.12. 26  .Hash Table with Sorted Chains  Put in pairs whose keys are 6.30.