Escolar Documentos
Profissional Documentos
Cultura Documentos
What is Hashing?
Hashing is a method to store data in an array so that storing, searching, inserting and deleting data is efficient
(theoretically it's O(1)). For this every record needs an unique key.
The basic idea is not to search for the correct position of a record with comparisons but to compute the position within
the array. The function that returns the position is called the hash function and the array is called a hash table.
A Hash table is data structure that uses a random access data structure, such as an array, and a mapping function, called
a hash function, to allow average constant time O(1) searches.
Hash function
A hash function is a mapping between a set of input values and a set of integers, known as hash values. It is usually
denoted by H.
Hash of Key
Suppose, 'h' be a hash function and 'K' is a key, then h(K) is called hash-of-key. The hash of key is the index at which a
record with the key value K must be kept.
Direct Method
In direct hashing the key is the address without any algorithmic manipulation.
Direct hashing is limited, but it can be very powerful because it guarantees that there are no synonyms and therefore no
collision.
Modulo-division Method
137456 % 19 + 1 = 11
214562 % 19 + 1 = 15
140145 % 19 + 1 = 2
Digit-extraction Method
Using digit extraction selected digits are extracted from the key and used as the address.
Example:
Using six-digit employee number to hash to a three digit address (000-999), we could select the first, third, and fourth
digits( from the left) and use them as the address.
Folding Method
Fold Shift
In fold shift the key value is divided into parts whose size href="javascript:void(0);" style="color: rgb(0, 15, 255); text-
decoration: underline;" id="Y1685113S2"matches the size of the required address. Then the left and right parts are
shifted and added with the middle part.
Fold boundary
In fold boundary the left and right numbers are folded on a fixed boundary between them and the center number. The
two href="javascript:void(0);" style="color: rgb(0, 15, 255); text-decoration: underline;" id="Y1685113S4"outside
values are thus reversed.
Midsquare Method
In midsquare hashing the key is squared and the address is selected from the middle of the square number.
Limitation is the size of the key.
Example :
94522 = 89340304: address is 3403
Rotation Method
Rotation method is generally not used by itself but rather is incorporated in combination with other hashing methods.
It is most useful when keys are assigned serially.
Pseudo-random Hashing
Example:
In hash tables, there's always a possibility that two data elements will hash to the same integer value. When this
happens, a collision takes place i.e. two data members s try to occupy the same place in the hash table array. There are
methods to deal with such situations like Open Addressing and Chaining .
There are three Open addressing methods, which vary in probe sequence to find the next vacant cell. These are Linear
probing, Quadratic probing and Double hashing.
Linear Probing is resolving a hash collision by sequentially searching a hash table beginning at the location returned by
the hash function.
In this case, hash table is implemented using an array. The program stores the first element that generates a specific
array index at that index. For example, if the hash function generates 79, then you use array index 79 to store the
element. When the hash function generates the key 79 again, the program begins a sequential search starting at location
79, looking for the next available spot. The second element whose key was transformed by hash function into 79 will be
stored at the location 80, the third at 81 and so on. Of course, if 80 and 81 are already occupied, the elements will be
stored farther away from the location generated by hash function.
Quadratic Probing :
Quadratic Probing is a different way of rehashing. In the case of quadratic probing we are still looking for an empty
location. However, instead of incrementing offset by 1 every time, as in linear probing, we will increment the offset by
1, 3, 5, 7, ... We explore a sequence of location until an empty one is found as follows :
index, index + 1, index + 4, index + 9, index + 16, ...
Rehashing :
Chaining :
In open addressing, collisions are resolved by looking for an open cell in the Hash table. A different approach is to
create a linked list at each index in the hash table. A different approach is to create a linked list at each index in the hash
table. A data item's key is hashed to the index in usual way, and the item is inserted into the linked list at that index.
Other items that hashes to the same index are just added to the linked list at that index. There is no need to search for
empty cells i the primary hash table array. This is known as Chaining method.
Lets consider the following example :
The collision avoided by Chaining method is an adjacency list representation. Whenever a collision takes place, we just
add to the adjacency list to the corresponding header where the collision occurred.
In our example, collision has occurred as header node 2, so we just add 9 and 58 to it as an adjacency list. If any further
collision occurs at 2 we add it to our existing list.
Example :