Você está na página 1de 31

CSC508

DATA STRUCTURES
Zulaile Mabni

TOPIC 6 Hashing
1

CHAPTER OBJECTIVES

Learn about hashing


Learn about hash methods Mid-square Folding Division Learn about collision Open addressing Chaining

Malik, Data Structures Using Java

WHY HASHING?
Need a data structure in which finds/searches are very fast Insert and Delete process should be fast too Objects have unique keys

A key may be a single property/attribute value Or may be created from multiple properties/values

HASH TABLES VS. OTHER DATA STRUCTURES

Maximize efficiency: implement the operations Insert(), Delete() and Search()/Find() efficiently. Arrays: not space efficient (assumes we leave empty space for keys not currently in the structure)

Linked List

space efficient
Insert(), Delete() and Search()/Find() not too efficient

Hash Tables:

Better than the above in terms of space and efficiency

HASH TABLES
Very

useful data structure

Good for storing and retrieving key-value pairs Not good for iterating through a list of items

Example

applications:

Storing objects according to ID numbers


When the ID numbers are widely spread out When you dont need to access items in ID order

HASH INDEX/VALUE
A

hash value or hash index is used to index the hash table (array) A hash function takes a key and returns a hash value/index

The hash index is a integer (to index an array)

The

key is specific value associated with a specific object being stored in the hash table

It is important that the key remain constant for the lifetime of the object

Hash Tables Conceptual View


table
7 6 5 4 3 buckets b1
obj1 key=15

hash value/index

b2 b3 b4

Obj3 key=4

Obj2 key=36

2 1 0

Obj4 key=2 Obj5 key=1

HASH TABLES

Hash Tables solve these problems by using a much smaller array and mapping keys with a hash function. Let universe of keys U and an array of size m. A hash function h is a function from U to 0m, that is:

U
k1

h:U
k2

0m

k 3 k4 k6

(universe of keys)

0 1 2 3 4 5 6 7

h (k2)=2 h (k1)=h (k3)=3 h (k6)=5 h (k4)=7

HASH FUNCTION

You want a hash function/algorithm that is:


Fast Easy to compute Minimize the number of collisions Creates a good distribution of hash values so that the items (based on their keys) are distributed evenly through the array Integer key values String key values Multipart key values
Multipart fields, and/or Multiple fields

Hash functions can use as input


CHOOSING A HASH FUNCTION


The performance of the hash table depends on having a hash function that evenly distributes the keys: uniform hashing is the ideal target Choosing a good hash function requires taking into account the kind of data that will be used.

E.g., Choosing the first letter of a last name will likely cause lots of collisions depending on the nationality of the population.

Most programming languages (including java) have hash functions built in.

COMMONLY USED HASH METHODS


Mid-Square

Hash method, h, computed by squaring the identifier Using appropriate number of bits from the middle of the square to obtain the bucket address Middle bits of a square usually depend on all the characters, it is expected that different keys will yield different hash addresses with high probability, even if some of the characters are the same

11

Malik, Data Structures Using Java

COMMONLY USED HASH METHODS

Folding Key X is partitioned into parts such that all the parts, except possibly the last parts, are of equal length Parts then added, in convenient way, to obtain hash address

Malik, Data Structures Using Java

Division (Modular arithmetic) key mod m m is the array size; in general, it should be prime number Key X is converted into an integer iX This integer divided by size of hash table to get remainder, giving address of X in HT

12

THE MOD FUNCTION


Stands for modulo When you divide x by y, you get a result and a remainder Mod is the remainder
8 mod 5 = 3 9 mod 5 = 4 10 mod 5 = 0 15 mod 5 = 0

Thus for key-value mod M, multiples of M give the same result, 0

But multiples of other numbers do not give the same result

COMMONLY USED HASH METHODS

Multiplication method

Floor ((key*someFraction mod 1)*arraySize) Where some fraction is typically 0.618

Java Hash Map method


Create a hash by performing a series of shifts, adds, and xors on the key index = hash mod arraySize

COMMONLY USED HASH METHODS


Suppose that each key is a string. The following Java method uses the division method to compute the address of the key:
Malik, Data Structures Using Java

int hashmethod(String insertKey) { int sum = 0; for(int j = 0; j <= insertKey.length(); j++) sum = sum + (int)(insertKey.charAt(j)); return (sum % HTSize); }//end hashmethod

15

HASH FUNCTIONS & INSERT()


Usage summary: int hashValue = hashFunction (int key);

Or hashValue = hashFunction (String key); Or hashValue = hashFunction (itemType item);

Insert method: public void insert (int key, itemType item) { hashValue = hashFunction (key); table[hashValue] = item; }

HASH TABLES: INSERT EXAMPLE


For example, if we hash keys 01000 into a hash table with 5 entries and use h(key) = key mod 5 , we get the following sequence of events: Insert 2
key data

Insert 21
key data

Insert 34
key data

Insert 54

0 1
2 3 2

0 1 21
2 3 2

0 1 21
2 3 2

There is a collision at array entry #4

4 34

???

COLLISION RESOLUTION
Algorithms to handle collisions Two categories of collision resolution techniques

Malik, Data Structures Using Java

Open addressing (closed hashing) Chaining (open hashing)

18

DEALING WITH COLLISIONS

A problem arises when we have two keys that hash in the same array entry this is called a collision. There are two ways to resolve collision:

Hashing with Chaining (a.k.a. Separate Chaining): every hash table entry contains a pointer to a linked list of keys that hash in the same entry
Hashing with Open Addressing: every hash table entry contains only one key. If a new key hashes to a table entry which is filled, systematically examine other table entries until you find one empty entry to place the new key

HASHING WITH CHAINING

The problem is that keys 34 and 54 hash in the same entry (4). We solve this collision by placing all keys that hash in the same hash table entry in a chain (linked list) or bucket (array) pointed by this entry:

Insert 54 0 1 2 3

other key key data

Insert 101 0 1 2 3

21 2 54
CHAIN

101 2 54

21

34

34

OPEN ADDRESSING

Collisions are resolved by systematically examining other table indexes, i0 , i1 , i2 , until an empty slot is located. The key is first mapped to an array cell using the hash function (e.g. key % array-size) If there is a collision find an available array cell There are different algorithms to find (to probe for) the next array cell Linear probing Quadratic probing Random probing Double Hashing

OPEN ADDRESSING: LINEAR PROBING


Suppose that an item with key X is to be inserted in HT Use hash function to compute index h(X) of item in HT Suppose h(X) = t. If HT[t] is empty, store item into array slot. Suppose HT[t] already occupied by another item; collision occurs Linear probing: starting at location t, search array sequentially to find next available array slot: (t + 1) % HTSize, (t + 2) % HTSize,,(t + j) % HTSize Be sure to wrap around the end of the array! Stop when you have tried all possible array indices If the array is full, you need to throw an exception or, better yet, resize the array

22

Malik, Data Structures Using Java

COLLISION RESOLUTION: OPEN ADDRESSING


Pseudocode implementing linear probing:
hIndex = hashmethod(insertKey); found = false; while(HT[hIndex] != emptyKey && !found) if(HT[hIndex].key == key) found = true; else hIndex = (hIndex + 1) % HTSize; if(found) System.out.println(Duplicate items not allowed); else HT[hIndex] = newItem;
Malik, Data Structures Using Java

23

HASH TABLES OPEN ADDRESSING (LINEAR PROBING)


h(key) = key mod 8

table 7 6 5
obj1 key=15 Obj3 key=4 Obj2 key=28

hash value/index

Index=5 Index=4

4 3 2 1 0

Obj5 key=1

Obj4 key=2

RANDOM PROBING
Uses

a random number generator to find the next available slot ith slot in the probe sequence is: (h(X) + ri) % HTSize where ri is the ith value in a random permutation of the numbers 1 to HTSize 1
Suppose HTSize = 101, for h(X) = 26, and r1 = 2, r2 = 5, r3 = 8. The probe sequence of X has the elements 26, 28,31,34

All

Malik, Data Structures Using Java

insertions and searches use the same sequence of random numbers

25

QUADRATIC PROBING
In Quadratic probing, starting at position t, check the array locations ( t + 1) % HTSize, (t + 2) % HTSize,, (t + i) % HTSize. We do not know if it probes all the positions in the table When HTSize is prime, quadratic probing probes about half the table before repeating the probe sequence

26

Malik, Data Structures Using Java

DOUBLE HASHING
Apply a second hash function after the first The second hash function, like the first, is dependent on the key Secondary hash function must

Be different than the first And, obviously, not generate a zero arrayIndex = (arrayIndex + stepSize) % arraySize; Where stepSize = constant (key % constant) And constant is a prime less than the array size

Good algorithm:

LOAD FACTOR
Understanding the expected load factor will help you determine the efficiency of your hash table implementation and hash functions Load factor = number of items in hash table / array size For Open Addressing:

If < 0.5, wasting space If > 0.8, overflows significant If < 1.0, wasting space If > 2.0, then search time to find a specific item may factor in significantly to the [relative] performance

For Chaining:

HASH TABLES IN JAVA


Java supports a number of hash table classes Hashtable, HashMap, LinkedHashMap, HashSet, See Sun Java API Documentation http://java.sun.com/j2se/1.4.1/docs/api/ As a programmer, you dont see the collision detection, chaining, etc. You can set The initial table size The load factor (Default is .75) hashCode() hash function

HASHING ANIMATION

http://www.cs.auckland.ac.nz/software/AlgAnim/hash_table s.html

REFERENCES

Malik D.S., Nair P.S., Data Structures Using Java, Course Technology, 2003.
Malik, Data Structures Using Java

Weiss Mark Allen, Data Structures & Algorithm Analysis in C++, Pearson Education International Inc, 2003.

31

Você também pode gostar