Escolar Documentos
Profissional Documentos
Cultura Documentos
FILES
UNIT IV
Learning Objectives
Hashing
Indexing Techniques
File Organization
Hashing
1
Hashing
Technique for performing Insertion, Deletion,
Search in constant average time
Hash Function
Transforms a key into a cell/bucket address
Considerations
2
Common hash functions
Mod
Mid Square
Folding
Digit Analysis
Collision Resolution
Separate Chaining
Rehashing
Open Addressing
In case of a collision alternate cells are tried till an
empty cell is not found.
3
Linear Probing
For a table large enough in size to hold all the keys;
free space will always be found
Though the time required will be large
Drawback
Blocks of occupied cells might get formed: PRIMARY
CLUSTERING
i.e a key that hashes into a cluster will require several
attempts to resolve collision
Linear Probing
Say,
The keys to be inserted are 12, 30, 11, 32, 34, 54, 50
The hash function is mod 10
This divisor is chosen just for illustration and is not a good
choice
9as a maximum of 10 resultant cells get generated, thus
collisions will be frequent.
9The divisor should preferably be a prime number
32 3
Try to Add 32 on Cell 32%10= 2; Not available; Try Next
4
Add 34 on Cell 34%10= 4 34
54 5
Try to Add 54 on Cell 54%10= 4; Not available; Try Next
9
© Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63. 12
4
Quadratic Probing
Similar treatment can be given when collisions
occur in case of Quadratic probing;
Here,
instead of choosing the next cell that lies after the ideal
cell i (or a cell given by a linear function of i)
Separate Chaining
Maintains a list of all the keys that hash to the same value
To insert:
Calculate the hash function
Access the corresponding list
Add a link to the list
i.e. A link is added in case of a collision
The new key might be added at either end of the list
Better for large sized records, handles collisions & overflow
efficiently.
Not as efficient when record size is small or domain of keys
values is limited to a small number of entries
Insert
InsertSequence:
Sequence:22,
22,42,
42,30,
30,43,
43,10
10
0
30 10
1
2
22 42
3
43
5
Rehashing
When table gets Too full,
number of collisions increase;
thus, resulting in a degradation in performance while
inserting as well as searching
Rehashing: Illustration
Consider the hash 30 0
table as given in the
11 1
figure:
12 2
50 6
7
8
Rehashing
• New table size 19
• The hash function is mod 23
50 30 54 32 11 12 34
0 1 3 4 5 6 7 9 10 11 12 13 14 15 16 17 18
2 8
6
Indexing Techniques
Indexing Techniques
7
Cylinder Surface Indexing
If a data file takes up c cylinders CI has c entries
Hashed Indexing
Maintains hash table of key values along with the corresponding
record addresses
8
Tree Indexing
Indexing using balanced trees of order m
Tree indexing
Consider a B-Tree of order m=200
Say N<= 2*106
Using N >= 2 * Ceil (m/2) l-1 –1
i.e. 2*106 >= 2 * Ceil (200/2) l-1 –1
We get
106 >= (100) l-1
6 >= 2(l-1)
l <= 4
Thus 2*106 keys can be searched in a maximum of 4
passes
Tree Indexing
A high value of m would result in still lesser number
of passes,
9
FILE ORGANIZATION
File Organization
Inverted Files
Cellular Files
10
Random File Organization
Records are stored at random locations
Direct Addressing
Available disk space is divided into nodes large enough to hold
a record
Directory Lookup
11
Hashed File Organization
Uses same principle as hashed indexes
Inverted Files
Index contains the link information
Inverted Files
Analyst C, E
100 A
101 D Programme A, B, D
110 E r
200 B
220 C
Gender Index
340 F
Male A, E
Female B, C, D
12
Inverted Files
Cellular Partitions
Storage media is divided into cells
A cell could be
• A disk pack; or
• A cylinder
Cellular Partition
In case a cell is a cylinder, all the records placed in
on cell can be accessed without moving the
read/write head
13
What we Studied
9 Hashing
9 Indexing Techniques
9 File Organization
Review Questions
1. What is the criteria behind the design of hash function ?
2. What are the various ways to store the Graphs in Memory?
3. Discuss the application of hash table. Write short note on symbol
table.
4. Compare Sequential and random file organization.
5. What are the advantages of using inverted files?
6. Would you use Quadratic Probing for resolving collisions in
hashed index files? State reasons.
7. Write short note on Structure of direct file
8. Give comparison between sequential file,indexed sequential file
and random access file.
9. Write a short note on Open Address Hashing and Separate
Chaining
10. Discuss Random file Organization and various techniques used
for randomization
11. Explain various techniques for overflow / collision resolution in
case of hashing
References
• “Fundamentals of Data Structures”, E. Horowitz and S. Sahani,
Galgotia Booksource Pvt. Ltd., (1999)
• Data Structures and Algorithm Analysis in C (Second Edition)
by Mark Allen Weiss
• Data Structures: A Pseudocode Approach with C, Second Edition
Richard Gilberg, Behrouz Forouzan
• “Data Structures and program design in C”, R. L. Kruse, B. P.
Leung, C. L. Tondo, PHI.
• “Data Structure”, Schaum’s outline series, TMH, 2002
• “Data Structures using C and C++”, Y. Langsam et. al., PHI (1999).
• “Data Structures”, N. Dale and S.C. Lilly, D.C. Heath and Co. (1995).
• “Data Structure & Algorithms”, R. S. Salaria, Khanna Book
Publishing Co. (P) Ltd., 2002.
14