Escolar Documentos
Profissional Documentos
Cultura Documentos
FILE INDEXING
Submitted to the Department of Information Technology
by
ISHAN MOURYA, RISHIK RAMENA, AMIT JALAN
December, 2016
CERTIFICATE
This is to certify that the work presented in this report entitled File Indexing,
submitted by Ishan Mourya, Rishik Ramena and Amit Jalan, having the
examination roll numbers 510814043, 510814041 and 510814034 respectively
has been carried out under my supervision for the partial fulfilment of the degree of
Dual Degree in Information Technology during the session 2014-19 in the
Department of Information Technology, Indian Institute of Engineering Science and
Technology, Shibpur.
Date: 06/12/2016
Acknowledgements
This project would not have been successful without the kind support of the
Organisation, Department and its faculty members.
Our sincere thanks goes to our project guide and mentor Prof. Surajit Kumar Roy
for his guidance and supervision.
Last but not the least we would like to express our gratitude to some of our
batchmates and friends who have helped us in collecting valuable information
regarding the project and being a constant source of motivation and encouragement.
Date: 06/12/2016
ISHAN MOURYA
Department of Information Technology,
Indian Institute of Engineering Science
and Technology, Shibpur
RISHIK RAMENA
Department of Information Technology,
Indian Institute of Engineering Science
and Technology, Shibpur
AMIT JALAN
Department of Information Technology,
Indian Institute of Engineering Science
and Technology, Shibpur
Abstract
The main objective for designing a database is faster access to any data in the
database and quicker insertion, deletion and search operations on the data, but
when a database is huge, it requires a considerably large amount of time to perform
such operations. In order to reduce time spent in such operations, Indexes are used
to quickly access the required data. It is similar to the concept of an Index Page in a
book, or the concept of book catalogues in a library.
When records are stored in the primary memory, accessing them is very easy and
quick. But generally records are not limited in numbers to store in primary memory.
They are very huge and we have to store them in the secondary memories which
causes a huge increase in the time needed to access them.
These techniques can be used for the quick and efficient retrieval of data from files.
Here we compare the extent to which these different techniques are efficient i.e
their rate of accessing records from data files.
We mainly compare the access times of files with indexed structures with that of non
indexed ones. We also compare the search mechanisms namely binary search and
sequential search.
Contents
1. INTRODUCTION 1
2. RELATED WORKS 2
8. CONCLUSIONS 11
1. INTRODUCTION
"File organization" refers to the logical relationships among the various records that
constitute the file, particularly with respect to the means of identification and access
to any specific record. "File structure" refers to the format of the label and data
blocks and of any logical record control information.
Indexes are used to quickly locate data without having to search every row in a
database table every time a database table is accessed. Indices can be created
using one or more columns of a database table, providing the basis for both rapid
random lookups and efficient access of ordered records.
Record management carries significant importance in the organization. Record
management is concerned with keeping record safely and providing as per the
requirement. Indexing is an instrument of record management which makes possible
to find out the records easily and quickly. Filing without indexing is meaningless for
large database. In this respect, some importance of indexing can be explained as
follows:
Easy location: Indexing points out the required records or file and
facilitates easy location.
Saves time and efforts: Indexing gives the ready reference to the
records and saves the time and efforts of office.
Efficiency: Indexing helps to find out the records easily and quickly
which enhances the efficiency of office.
2. RELATED WORKS
There are several DataBase Management Systems which uses different indexing
methods to make faster access to data, like - Oracle RDBMS, IBM DB2, Microsoft
SQL Server, and many more. They are all good in allowing fast access to data, have
various functionalities, and are secure enough but all these need to be purchased
and require initial setup and an database expert to handle the database
management system.
3
3.2 Including Definitions, Theorems, etc.
4. PROBLEM DEFINITION
In real life scenario databases may be huge in size. So, implementing a Sequential
Search for accessing a record in such database may be very time taking. Generally,
databases are so large that they cant be loaded into the main memory entirely and
therefore are stored in the secondary memory. Accessing data from the secondary
memory is much slower as compared to the main memory. So, we need to use some
kind of indexing for making access faster. If the index is also very large, as in case of
huge databases, we may use multi-level indexing or the B Tree or B+ Tree based
indexing.
5. PROPOSED APPROACH
Here we compare the efficiencies of different approaches of accessing data from
files ranging from sequential non-indexed method to B+Trees. Although the results
might not show the efficiencies of these mechanisms in case of small database files
but in case of large files they make a huge difference. So we take a range of different
file sizes for comparing the access times of these mechanisms.
5
5.1 Including Algorithms
___________________________________________________________________
_______________________________________________________________________
6. WORKING
The application is designed in a client server architecture.
The server is an apache webserver hosted locally using XAMPP.
The client side user interface is an HTML file (UI.html) which has the
database insertion and search forms.
The main databases files are JSON object files which contain records
of students.
The index file is also a JSON file which contains only the key field (roll
no.) and the pointer (array index) of the corresponding record in the main file.
The server side backend pages are the search.php and Student.php.
The search.php processes the search query from the UI (input roll no.).
It is here that the different retrieval algorithms are used.
The Student.php processes the new student entry from the UI (input
details of the student). and stores it in the main data file. It is here that the
index file is created and sorted each time a new entry is made in the
database.
7. EXPERIMENTAL RESULTS
Time Taken in Searching for a record in the Database using different Database
Retrieval methods.
Small Sized
Database
Medium Sized
Database
Large Sized
Database
In case of small database ( records), the average time required to access a record is
more in case of Using File Index then in case of Direct searching from the main
database. This is due to accessing two files (main database file and index file) in
case of Indexing.
10
8. CONCLUSIONS
As shown from the above discussions, indexing is way efficient than direct data
access from files. Although today most database systems use multilevel indexing
and B Tree structures for file accesses, they are just another form of indexing.
These use multilevel indexes which form a tree structure with tree and data pointers
in each of its nodes.
11
REFERENCES
[1] Ramez Elmasri, Shamkant B.Navathe, Fundamentals of Database Systems,
Pearson, 2015.