U5 Query Processing

Md.
Saiful Islam
Assistant Professor
Department of Computer Science and Engg.
University of Rajshahi
1.2 What is Query Optimization?
Given a query, there are generally a variety of methods for computing
the answer.
Improving the strategy for processing a query, is called query

optimization.
The first action the system must take on a query is to translate the query
into its internal form.
In the process of generating the internal form of the query, the parser
checks the syntax of the users query, verifies the relation name, etc.
If the query was expressed in terms of a view, the parser replaces all
references to the view name with the relational algebra expression to
compute a view.
After the internal form, the optimization process begins.

1.3 Equivalence of expression:
To illustrate the optimization techniques, consider the following relations.
TID Name HouseNo Street City

1 Md. Bakar 12 North Dhaka
Teacher
2 Md. Sharif 34 Iqbal Road Dhaka
3 Md. Hanif 56 Ring Road Savar
4 Md. Khalil 10 Road 1 Tongi
TID CourseNo CourseNo Title Chour

1 DCA3303 DCA3303 CAD 3
1 DCA3307 DCA3307 AI 3
1 DCA3308 DCA3308 NN 4
2 DCA3304 DCA3304 MS 4
3 DCA3305 DCA3305 DBMS 3
3 DCA3309 DCA3309 DSD 3
4 DCA3306 DCA3306 OS 2
4 DCA3310 DCA3310 CI 2
2 DCA3311 DCA3311 PP 3
Teaches Course
1.3.1 Selection Operation:
Query: Find teacher name, course number and title for all who
live in city Dhaka.
This expression construct a large relation.
But we are concern with only a few tuples of the relation

i.e., those teacher living in city Dhaka only.
Another disadvantage is to keep the intermediate result in

the main memory.
If main memory size is not large enough then it must be

stored in the disk (which requires large no. of disk space).
1.3.1 Selection Operation:

We could process the query more efficiently if there
were a way to reduce the size of the intermediate
result.
Since we are interested for only those teacher of Dhaka

city, we need not to consider other tuples of teacher
relation.
If we use the above technique, the size of the

intermediate result will be reduced.
So the rules is:

Perform selection operations as early as possible.
2.2 Equivalence of Expression: Projection Operation
The projection operation also reduces the size of
relations.
Consider the query: Find teacher name, course

number and title for all who live in city Dhaka.
once again:
We obtain a relation whose scheme is: (TID, Name,

HouseNo, Street, City, CourseNo).
We can eliminate several attributes from the
scheme i.e., the only attributes we must retain are
those that:
Appear in the result of the query or
Are needed to process subsequent queries.
So by eliminating the unneeded attributes, we can

reduce the no. of columns of the intermediate
result.
2.3 Estimation of Query Processing Cost:
The DBMS stores the following statistics for each
relation r.
1. nr, the no. of tuple in the relation r.
2. sr, the size of a record in bytes.
3. V (A, r) the no. of distinct values that appear in the relation r
for attribute A.
Statistics 1 and 2 are used to estimate the size of a

Cartesian product.
The Cartesian product rxs contains nrns tuples.
Each tuple of rxs occupies sr+ss bytes.

The third statistics are used to estimate how many
tuples satisfy a selection predicate of the form.
<attribute-name>=<value>
Suppose r1is a relation on scheme R1and r2 is a relation
on scheme R2.
Case 1: if R1R2=, then r1r2 is the same as r1xr2,

and we can use our estimation technique for Cartesian
products.
Case 2: if R1R2 is a key for R1, then we know that a

tuple r2 will join with exactly one tuple from r1.
Therefore the no. of tuples in r1r2 is not greater than
the no. of tuples in r2.
Case 3: if R1R2 is a key for neither R1 nor R2, we
use the third statistic and assume as before, that
each value appears with equal probability.
Consider a tuple t of r1, and assume R1R2 ={A},

we estimate that there are nr2/V(A, r2) tuples in r2
with an A value of t[A]. So tuple t produces
nr2/V(A,r2) tuples in r1r2.
3.2 What is an Index?
An index for a file works in much the same way as a
catalog in a library.
Index may be classified as dense and sparse index.
Searching is done on some key attribute called search

key.
Dense Index:
An index record appears for every search key value in the file.
Sparse Index:
Index record is created for only some of the records.
Another index classification is clustering and non-
clustering.
Clustering:
In clustering index, the physical order of the index
records is the same as the logical order.
Non-clustering:
In non-clustering index, the physical order of the index
records is not the same as the logical order and hence one
block access is required for one record.
3.3 Estimation of Cost of Access Using Index:
To illustrate, consider the query: Find account number of a customer in
branch RU and balance>1000.
The relation is Deposit(cName, bName, aNumber, balance).
SQL expression for the query is:

select cNumber
from deposit
where bName=RU and cName=Rahim and balance>1000
Assume that we have the following statistical information about the

deposit relation.
1. 20 tuples of deposit fit in one block.

2. V(bName, deposit)=50
3. V(cName, deposit)=200
4. V(balance, deposit)=5000
The deposit relation has 10,000 tuples.
Assumption: The simplifying assumption for the
relation is that the values are uniformly distributed.
Since V(bName, deposit)=50, we expect the

expected number of tuples satisfying the condition
(bName=RU) is 10000/50=200.
Since there is clustering index on bName and

assume that 20 tuples fit in one block, the no. of
block access to read 200 tuples is 200/20=10

U5 Query Processing

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

U5 Query Processing

Enviado por

Direitos autorais:

Formatos disponíveis

Md.

Improving the strategy for processing a query, is called query

After the internal form, the optimization process begins.

TID Name HouseNo Street City

TID CourseNo CourseNo Title Chour

But we are concern with only a few tuples of the relation

Another disadvantage is to keep the intermediate result in

If main memory size is not large enough then it must be

1.3.1 Selection Operation:

Since we are interested for only those teacher of Dhaka

If we use the above technique, the size of the

So the rules is:

Consider the query: Find teacher name, course

We obtain a relation whose scheme is: (TID, Name,

So by eliminating the unneeded attributes, we can

Statistics 1 and 2 are used to estimate the size of a

The Cartesian product rxs contains nrns tuples.

Each tuple of rxs occupies sr+ss bytes.

Case 1: if R1R2=, then r1r2 is the same as r1xr2,

Case 2: if R1R2 is a key for R1, then we know that a

Consider a tuple t of r1, and assume R1R2 ={A},

Index may be classified as dense and sparse index.

Searching is done on some key attribute called search

The relation is Deposit(cName, bName, aNumber, balance).

SQL expression for the query is:

Assume that we have the following statistical information about the

1. 20 tuples of deposit fit in one block.

Since V(bName, deposit)=50, we expect the

Since there is clustering index on bName and

Você também pode gostar