File Indexing

A Project Report on
FILE INDEXING
Submitted to the Department of Information Technology
For the partial fulfilment of the degree of Dual Degree in

Information Technology
by
ISHAN MOURYA, RISHIK RAMENA, AMIT JALAN
Roll number : HX-20, HY-47, HX-12

Registration number : 510814043, 510814041, 510814034 of 2014-19
Dual Degree, 3rd year
Under the supervision of

PROF. SURAJIT KUMAR ROY
Department of Information Technology

INDIAN INSTITUTE OF ENGINEERING SCIENCE AND
TECHNOLOGY, SHIBPUR
INDIAN INSTITUTE OF ENGINEERING SCIENCE AND
TECHNOLOGY, SHIBPUR
December, 2016
CERTIFICATE
This is to certify that the work presented in this report entitled File Indexing,
submitted by Ishan Mourya, Rishik Ramena and Amit Jalan, having the
examination roll numbers 510814043, 510814041 and 510814034 respectively
has been carried out under my supervision for the partial fulfilment of the degree of
Dual Degree in Information Technology during the session 2014-19 in the
Department of Information Technology, Indian Institute of Engineering Science and
Technology, Shibpur.
Date: 06/12/2016
PROF. SURAJIT KUMAR ROY

Assistant Professor
Indian Institute of Engineering Science
and Technology, Shibpur

DR. AMIT KUMAR DAS
Dean(Academics)
DR. ARINDAM BISWAS

Head of the Department
Indian Institute of Engineering Science and Technology, Shibpur
Acknowledgements
This project would not have been successful without the kind support of the
Organisation, Department and its faculty members.
Our sincere thanks goes to our project guide and mentor Prof. Surajit Kumar Roy
for his guidance and supervision.
Last but not the least we would like to express our gratitude to some of our
batchmates and friends who have helped us in collecting valuable information
regarding the project and being a constant source of motivation and encouragement.
Date: 06/12/2016

ISHAN MOURYA
Department of Information Technology,

RISHIK RAMENA

AMIT JALAN
Abstract
The main objective for designing a database is faster access to any data in the
database and quicker insertion, deletion and search operations on the data, but
when a database is huge, it requires a considerably large amount of time to perform
such operations. In order to reduce time spent in such operations, Indexes are used
to quickly access the required data. It is similar to the concept of an Index Page in a
book, or the concept of book catalogues in a library.
When records are stored in the primary memory, accessing them is very easy and
quick. But generally records are not limited in numbers to store in primary memory.
They are very huge and we have to store them in the secondary memories which
causes a huge increase in the time needed to access them.
Broadly, indexing can be classified in two types:

Single-level indexing
Multi-level indexing
These techniques can be used for the quick and efficient retrieval of data from files.
Here we compare the extent to which these different techniques are efficient i.e
their rate of accessing records from data files.
We mainly compare the access times of files with indexed structures with that of non
indexed ones. We also compare the search mechanisms namely binary search and
sequential search.
Contents
1. INTRODUCTION 1
2. RELATED WORKS 2
3. PRELIMINARIES AND DEFINITIONS 3

3.1 Including Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.2 Including Definitions, Theorems, etc. . . . . . . . . . . . . . . . . . . . . . . . . . . 4
4. PROBLEM DEFINITION 5
5. PROPOSED APPROACH 5
5.1 Including Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
6. WORKING 9
7. EXPERIMENTAL RESULTS 10
8. CONCLUSIONS 11
1. INTRODUCTION
"File organization" refers to the logical relationships among the various records that
constitute the file, particularly with respect to the means of identification and access
to any specific record. "File structure" refers to the format of the label and data
blocks and of any logical record control information.
Indexes are used to quickly locate data without having to search every row in a
database table every time a database table is accessed. Indices can be created
using one or more columns of a database table, providing the basis for both rapid
random lookups and efficient access of ordered records.
Record management carries significant importance in the organization. Record
management is concerned with keeping record safely and providing as per the
requirement. Indexing is an instrument of record management which makes possible
to find out the records easily and quickly. Filing without indexing is meaningless for
large database. In this respect, some importance of indexing can be explained as
follows:
Easy location: Indexing points out the required records or file and
facilitates easy location.
Saves time and efforts: Indexing gives the ready reference to the
records and saves the time and efforts of office.
Efficiency: Indexing helps to find out the records easily and quickly
which enhances the efficiency of office.
2. RELATED WORKS
There are several DataBase Management Systems which uses different indexing
methods to make faster access to data, like - Oracle RDBMS, IBM DB2, Microsoft
SQL Server, and many more. They are all good in allowing fast access to data, have
various functionalities, and are secure enough but all these need to be purchased
and require initial setup and an database expert to handle the database
management system.
We are trying to create a DataBase Management System that would not

require any initial setup nor any database expert to handle data. It provides basic
insert and search operation and uses Sequential Data Retrieval Method on the
Main Data file, Primary Index Data Retrieval Method and B+ Tree Data Retrieval
Method to access the data.
It is easy to use and has a very simple UI. It is good for storing medium sized
databases.
3. PRELIMINARIES AND DEFINITIONS
3.1 Including Figures
Figure 3.1.1: Primary Indexing

Figure 3.1.2: Multi Level Indexing
3
3.2 Including Definitions, Theorems, etc.
Definition 3.2.1 : File Indexing

An indexed file is a computer file with an index that allows easy random access to
any record given its file key. The key must be such that it uniquely identifies a record
(primary index). If more than one index is present the other ones are called alternate
indexes.
Definition 3.2.2 : Primary indexing

A primary index is an index on a set of fields that includes the unique primary key for
the field and is guaranteed not to contain duplicates. If the index is not on a primary
key field of the data file, it is called a Clustering index.
Definition 3.2.3 : Ordered indexes

If the index is created on a set of fields that are ordered in the main data file. This
field is called the ordering key field if it is also a key field in the main data file.
Definition 3.2.4 : B+ Tree
A B+ tree is an n-ary tree with a variable but often large number of children per node.
A B+ tree consists of a root, internal nodes and leaves. The root may be either a leaf
or a node with two or more children.
Definition 3.2.5 : JSON

It stands for JavaScript Object Notation. It is a lightweight data-interchange format. It
is easy for humans to read and write and also easy for machines to parse and
generate.
4. PROBLEM DEFINITION
In real life scenario databases may be huge in size. So, implementing a Sequential
Search for accessing a record in such database may be very time taking. Generally,
databases are so large that they cant be loaded into the main memory entirely and
therefore are stored in the secondary memory. Accessing data from the secondary
memory is much slower as compared to the main memory. So, we need to use some
kind of indexing for making access faster. If the index is also very large, as in case of
huge databases, we may use multi-level indexing or the B Tree or B+ Tree based
indexing.
5. PROPOSED APPROACH
Here we compare the efficiencies of different approaches of accessing data from
files ranging from sequential non-indexed method to B+Trees. Although the results
might not show the efficiencies of these mechanisms in case of small database files
but in case of large files they make a huge difference. So we take a range of different
file sizes for comparing the access times of these mechanisms.
5
5.1 Including Algorithms
Algorithm 1 : Sequential Database Search
___________________________________________________________________
Input : Roll no. of a Student (search key field)

Output : Name of the Student (if found) and the time taken to access it.
Steps of the Algorithm :
1 : Get the contents of the main data file.

2 : Store the records of the file into an object array (JSON array).
3 : Get the length of the array.
4 : Start a timer (using microtime()).
5 : Set flag = 0.
6 : Loop through the array sequentially and search for the input roll no.
7 : If found set flag = 1.
8 : If flag == 0
Print no student found.
9 : Else
Print name of student.
Stop timer.
Calculate time taken for accessing = timer_stop - timer_start.
Display the time taken.
10: End
Algorithm 2 : Binary Search on Sorted Database
_______________________________________________________________________

Output : Name of the Student (if found) and the time taken to access it.
1 : Get the contents of the main data file.

2 : Store the records of the file into an object array (JSON array).
5 : Set flag = 0.
6 : Use binary search to search for the given roll no. in the object array.
7 : If found set flag = 1.
8 : If flag == 0
9 : Else
Stop timer.
10: End
7
Algorithm 3 : Searching Database through the Index File
_________________________________________________________________________
__

Output : Name of the Student if found and the time taken to access it.
1 : Get the contents of the index file.

2 : Store the records (key-value pairs) of the file into an object array (JSON array).
5 : Set flag = 0.
6 : Use binary search in the index file array to search for the given roll no.
7 : If found set flag = 1 and store the index key value in a variable.
8 : If flag == 0
9 : Else
Get the contents of the main data file into an array (JSON).
Get the name of the student from the array using the stored key (from index).
Stop timer.
10: End
8
6. WORKING
The application is designed in a client server architecture.
The server is an apache webserver hosted locally using XAMPP.
The client side user interface is an HTML file (UI.html) which has the
database insertion and search forms.
The main databases files are JSON object files which contain records
of students.
The index file is also a JSON file which contains only the key field (roll
no.) and the pointer (array index) of the corresponding record in the main file.
The server side backend pages are the search.php and Student.php.
The search.php processes the search query from the UI (input roll no.).
It is here that the different retrieval algorithms are used.
The Student.php processes the new student entry from the UI (input
details of the student). and stores it in the main data file. It is here that the
index file is created and sorted each time a new entry is made in the
database.
7. EXPERIMENTAL RESULTS
Time Taken in Searching for a record in the Database using different Database
Retrieval methods.
Sequential Search Binary Search on Search through

on Unsorted Data Sorted Data File Index Table
Small Sized
Database
Medium Sized
Database
Large Sized
Database
In case of small database ( records), the average time required to access a record is
more in case of Using File Index then in case of Direct searching from the main
database. This is due to accessing two files (main database file and index file) in
case of Indexing.
In case of large database ( records), the average time required to access a

record is less in case of Using File Index then in case of Direct searching from the
main database. This is because search operation on the Index File is a Binary
Search and the size of the Index file is smaller than the size of the main database
file.
10
8. CONCLUSIONS
As shown from the above discussions, indexing is way efficient than direct data
access from files. Although today most database systems use multilevel indexing
and B Tree structures for file accesses, they are just another form of indexing.
These use multilevel indexes which form a tree structure with tree and data pointers
in each of its nodes.
DBMS is an intermediate layer between programs and the data. Programs

access the DBMS, which then accesses the data. There are different types of DBMS
ranging from small systems that run on personal computers to huge systems that run
on mainframes.With the development of better indexing algorithms, our databases
systems become more efficient day by day fetching us query results more quickly.
Some commercially available Database management systems in the market are
dbase, FoxPro, IMS and Oracle, MySQL, SQL Servers and DB2 etc.
11
REFERENCES
[1] Ramez Elmasri, Shamkant B.Navathe, Fundamentals of Database Systems,
Pearson, 2015.
[2] URL: http://www.w3schools.com
[3] URL: http://www.wikipedia.com
[4] URL: http://www.tutorialspoint.com
[5] URL: http://php.net/microtime
[6] URL: http://stackoverflow.com

12

File Indexing

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

File Indexing

Enviado por

Direitos autorais:

Formatos disponíveis

A Project Report on

For the partial fulfilment of the degree of Dual Degree in

Roll number : HX-20, HY-47, HX-12

Under the supervision of

Department of Information Technology

PROF. SURAJIT KUMAR ROY

DR. ARINDAM BISWAS

Broadly, indexing can be classified in two types:

3. PRELIMINARIES AND DEFINITIONS 3

We are trying to create a DataBase Management System that would not

3. PRELIMINARIES AND DEFINITIONS

3.1 Including Figures

Figure 3.1.1: Primary Indexing

Definition 3.2.1 : File Indexing

Definition 3.2.2 : Primary indexing

Definition 3.2.3 : Ordered indexes

Definition 3.2.5 : JSON

Algorithm 1 : Sequential Database Search

Input : Roll no. of a Student (search key field)

Steps of the Algorithm :

1 : Get the contents of the main data file.

Algorithm 2 : Binary Search on Sorted Database

Input : Roll no. of a Student (search key field)

Steps of the Algorithm :

1 : Get the contents of the main data file.

Input : Roll no. of a Student (search key field)

Steps of the Algorithm :

1 : Get the contents of the index file.

Sequential Search Binary Search on Search through

In case of large database ( records), the average time required to access a

DBMS is an intermediate layer between programs and the data. Programs

[2] URL: http://www.w3schools.com

[3] URL: http://www.wikipedia.com

[4] URL: http://www.tutorialspoint.com

[5] URL: http://php.net/microtime

[6] URL: http://stackoverflow.com

Você também pode gostar