Você está na página 1de 17

Database Indexes

TOPIC 07
Introduction
 Indexes are one of the most important and useful tools for
achieving high performance in a relational database
 Many database administrators consider indexes to be the single
most critical tool for improving database performance

 An index is a data structure that contains a copy of some of


the data from one or more existing database tables
 Like an index at the back of a textbook, a database index provides
an organizational framework that the DBMS can use to quickly
locate the information that it needs
 This can vastly improve the speed with which SQL queries can be
answered
Indexes – An Intuitive Overview
Row Position Last Name
 Consider the randomly ordered table of 1 Schluter

names shown to the right…


2 Mishra
3 Salehian

 If we start at the top and examine one row


4 Vu
5 Wah
at a time, how many rows will we need to 6 Al Rabeeah

examine before we find a randomly


7 Wong
8 Garcia
selected name? 9 Johnson
10 Flores
 What is the average search time if this 11 Fung

process is repeated many times?


12 Spievak
13 Doshi

 What is the maximum search time?


14 Scruton
15 Winter
16 Beena
17 Ngo
18 Ly
19 Gani
20 Pham
21 Hu
22 Israr
Indexes – An Intuitive Overview
Row Position Last Name
 Consider the alphabetically ordered table of 6 Al Rabeeah
names shown to the right… 16
13
Beena
Doshi
 If our objective is to minimize the number 10 Flores
of rows that need to be examined in order 11
19
Fung
Gani
to find a randomly chosen name, what 8 Garcia
search process could we use? 21 Hu
22 Israr
 What is the average search time if this 9 Johnson
process is repeated many times? 18
2
Ly
Mishra
 What is the maximum search time? 17 Ngo
20 Pham
 Improve the time if arranged in alphabetic 3 Salehian
order 1 Schluter
14 Scruton
 Binary search look at the middle row or 12 Spievak
median. If looking for S go down not at 4
5
Vu
Wah
top. 15 Winter
7 Wong
Index Concepts
 Indexes are created on one or more columns in a table
 For example:
1. An index is created on a PK column
2. The index will contain the PK value for each row in the table, along with each
row’s ordinal position (row number/pointer) within the table
3. When a query involving the PK is run, the DBMS will find the PK value within
the index. The DBMS will then know the position of the row within the table
4. The DBMS can then quickly locate the row in the table that is associated with
the PK value
 Without an index, the DBMS has to perform a table scan in
order to locate the desired row(s)
Index Concepts
 An index can be created on most, but not all, columns
 Whether an index can be created on a column depends on the
column’s data type
 Columns with large object (LOB) data types cannot be indexed without
employing additional mechanisms (such as a hash algorithm). These LOB data
types include:
 text
 ntext
 image
 varchar(max)
 nvarchar(max)
 varbinary(max)
Index Concepts
 Creating an index increases the amount of storage space
required by the database
 This occurs because an index contains a copy of some of the data
in a table
 To estimate the storage space requirements of an index, use the
following formula:
 Number of rows in table x Average number of bytes required per row for the
indexed column(s)
 For example:
 We want to create an index on two columns, lastName and deptId, where:
 Values stored in lastName require an average of 16 bytes/row
 Values stored in deptId require an average of 2 bytes/row
 The table contains 98,000 rows
 The index will require about 98000 x (16+2) = 1764000 bytes of storage
space
B-Tree Index
 The commonest type of index uses a B-tree (balanced tree)
structure(similar to binary tree)
 B-trees use pointers and several layers of nodes in order to
quickly locate desired data

 Root node Less granular


 Intermediate nodes
More granular
 Leaf nodes

 When the DBMS processes a query which includes an


indexed column, it starts at the root node of the B-tree and
navigates downward until it finds the desired leaf
B-Tree Index
Clustered Indexes
 In a clustered index, the actual data rows that comprise the
table are stored at the leaf level of the index
 The indexed values are stored in a sorted order (either
ascending or descending)
 This means that there can be only one clustered index per table
 PK columns are good candidates for clustered indexes
 Effective when using joins
 One clustered index per table
Nonclustered Indexes
 In a nonclustered index, the leaf nodes contain the values from
the indexed column(s), along with a row locator which points to
the location of the actual data row
 The actual data row might be stored in a leaf node of a clustered index or
in a heap
 A heap is just an ordinary table that does not use a clustered index
 Nonclustered indexes are slower than clustered indexes because
the DBMS must follow a pointer to retrieve the actual data row
 Unlike clustered indexes, a table can have more than one
nonclustered index
 The leaf nodes of a nonclustered index can optionally contain
values from non-indexed columns
 Using this approach, the DBMS may be able to answer a query(by looking
at index) without ever needing to look at the actual data row itself!
Other Types of Indexes
 Bitmap Index
 In a bitmap index, a table is created with the values of one
attribute along the horizontal axis and the values of another
attribute along the vertical axis
 A bit value (1 or 0) in each cell within the table indicates whether
the value of one attribute is associated with a value of the other
attribute
 Bitmap indexes work best when one or both of the attributes has
only a small number of unique values
 When used properly, a bitmap index can require only 25% of the
disk space and can be 10 times faster than a tree-based index
Bitmap Index
Student Grade
Student Id A B C D F
101 0 1 0 0 0
102 0 1 0 0 0
103 1 0 0 0 0
104 0 0 1 0 0
105 0 1 0 0 0
106 0 0 0 1 0
107 0 1 0 0 0
108 0 1 0 0 0
109 1 0 0 0 0
110 0 1 0 0 0
111 0 0 1 0 0
112 0 1 0 0 0
113 1 0 0 0 0
114 0 1 0 0 0
115 0 0 0 1 0
116 1 0 0 0 0
117 0 0 0 0 1
118 0 1 0 0 0
119 0 1 0 0 0
120 0 0 1 0 0
Other Types of Indexes
 Hashed Index
 In a hashed index, a hashing algorithm is used in order to convert
an input value into a location within an index (such as a B-tree
index), which in turn contains or points to the actual data row
 Hashed indexes are useful in several situations, including:
 In a parallel processing or distributed database environment
 When a need exists to index complex objects (such as images)
Hashed Index
Index Considerations
 Since an index can consume a lot of storage space, indexes
should only be created on columns that are involved in
common queries
 This means that a database designer must have knowledge of the
queries that the DBMS will commonly process in order to design an
indexing strategy

 Indexes should be used sparingly on tables that are


updated frequently
 Whenever an INSERT, UPDATE, or DELETE operation affects an
indexed column, the index for the column must be rebuilt
 Rebuilding an index takes time, and an index which is rebuilt often can actually
slow the overall performance of the database
Indexing Guidelines
 If a table is heavily updated, index as few columns as possible
 Do not over-index heavily updated tables

 If a table is updated rarely, use as many indexed columns as


necessary to achieve maximum query performance

 Clustered indexes are best used on columns that do not allow null
values and whose values are unique
 PK columns are therefore good targets for clustered indexes

 The performance benefits of an index are related to the uniqueness


of the values in the indexed column
 Index performance is poor when an indexed column contains a large proportion
of duplicate values
 Index performance is best when an indexed column contains unique values

Você também pode gostar