Você está na página 1de 43

169

Relational Algebra

After completing this chapter, you should be able to enumerate and explain the operations of relational algebra (there is a core of 5 relational algebra operators), write relational algebra queries of the type joinselectproject, discuss correctness and equivalence of given relational algebra queries.

170

Relational Algebra Overview 1. Introduction; Selection, Projection 2. 3. 4. 5. Cartesian Product, Join Set Operations Outer Join Formal Denitions, A Bit of Theory

171

Example Database (recap)


STUDENTS FIRST LAST Ann Smith Michael Jones Richard Turner Maria Brown EXERCISES ENO TOPIC 1 Rel.Alg. 2 SQL 1 SQL RESULTS CAT ENO H 1 H 2 M 1 H 1 H 2 M 1 H 1 M 1

SID 101 102 103 104

EMAIL ... (null) ... ...

CAT H H M

MAXPT 10 10 14

SID 101 101 101 102 102 102 103 103

POINTS 10 8 12 9 9 10 5 7

172

Relational Algebra (1)

Relational algebra (RA) is a query language for the relational model with a solid theoretical foundation. Relational algebra is not visible at the user interface level (not in any commercial RDBMS, at least). However, almost any RDBMS uses RA to represent queries internally (for query optimization and execution). Knowledge of relational algebra will help in understanding SQL and relational database systems in general.

173

Relational Algebra (2)

In mathematics, an algebra is a set (the carrier), and operations that are closed with respect to the set. Example: (N, {, +}) forms an algebra. In case of RA, the carrier is the set of all nite relations. We will get to know the operations of RA in the sequel (one such operation is, for example, ).

174

Relational Algebra (3)


Another operation of relational algebra is selection.
In contrast to operations like + in N, the selection is parameterized by a simple predicate.

For example, the operation SID=101 selects all tuples in the input relation which have the value 101 in column SID. Relational algebra: selection SID=101
RESULTS SID CAT ENO POINTS 101 H 1 10 101 H 2 8 101 M 1 12 102 H 1 9 102 H 2 9 102 M 1 10 103 H 1 5 103 M 1 7

SID CAT ENO POINTS 101 H 1 10 101 H 2 8 101 M 1 12

175

Relational Algebra (4)

Since the output of any RA operation is some relation R again, R may be the input for another RA operation.
The operations of RA nest to arbitrary depth such that complex queries can be evaluated. The nal results will always be a relation.

A query is a term (or expression) in this relational algebra. A query FIRST,LAST (STUDENTS CAT=M (RESULTS))

176

Relational Algebra (5)

There are some dierence between the two query languages RA and SQL: Null values are usually excluded in the denition of relational algebra, except when operations like outer join are dened. Relational algebra treats relations as sets, i.e., duplicate tuples will never occur in the input/output relations of an RA operator.
Remember: In SQL, relations are multisets (bags) and may contain duplicates. Duplicate elimination is explicit in SQL (SELECT DISTINCT).

177

Relational Algebra (6)


Relational algebra is the query language when it comes to the study of relational query languages (DB Theory): The semantics of RA is much simpler than that of SQL. RA features ve basic operations (and can be completely dened on a single page, if you will). RA is also a yardstick for measuring the expressiveness of query languages. If a query language QL can express all possible RA queries, then QL is said to be relationally complete.
SQL is relationally complete. Vice versa, every SQL query (without null values, aggregation, and duplicates) can also be written in RA.

178

Selection (1)

Selection The selection selects a subset of the tuples of a relation, namely those which satisfy predicate . Selections acts like a lter on a set. Selection

A 1 A=1 1 2

B 3 = 4 5

A 1 1

B 3 4

179

Selection (2)

A simple selection predicate has the form Term ComparisonOperator Term .

Term is an expression that can be evaluated to a data value for a given tuple: an attribute name, a constant value, an expression built from attributes, constants, and data type operations like +, , , /.

180

Selection (3)

ComparisonOperator is = (equals), = (not equals), < (less than), > (greater than), , , or other data type-dependent predicates (e.g., LIKE). Examples for simple selection predicates: LAST = Smith POINTS 8 POINTS = MAXPT.

181

Selection (4)
(R) may be imlemented as: Naive selection
create a new temporary relation T ; foreach t R do p (t); if p then insert t into T ; fi od return T ;

If index structures are present (e.g., a B-tree index), it is possible to evaluate (R) without reading every tuple of R.

182

Selection (5)
A few corner cases C=1 A=A 1=2
A 1 1 2 A 1 1 2 A 1 1 2

B 3 4 5 B 3 4 5 B 3 4 5

= =

(schema error)
A 1 1 2 B 3 4 5 B

= A

183

Selection (6)

(R) corresponds to the following SQL query: SELECT FROM WHERE * R

A dierent relational algebra operation called projection corresponds to the SELECT clause. Source of confusion.

184

Selection (7)
More complex selection predicates may be performed using the Boolean connectives: 1 2 (and), and 1 2 (or), 1 (not).

Note: 1 2 (R) = 1 (2 (R)). Are the Boolean connectives , strictly needed? The selection predicate must permit evaluation for each input tuple in isolation.
Thus, exists () and for all () or nested relational algebra queries are not permitted in selection predicates. Actually, such predicates do not add to the expressiveness of RA.

185

Projection (1)

Projection The projection L eliminates all attributes (columns) of the input relation but those mentioned in the projection list L. Projection
A 1 A,C 2 3

B 4 5 6

C 7 = 8 9

A 1 2 3

C 7 8 9

186

Projection (2)

The projection Ai1 ,...,Aik (R) produces for each input tuple (A1 : d1 , . . . , An : dn ) an output tuple (Ai1 : di1 , . . . , Aik : dik ). may be used to reorder columns. discards rows, discards columns. DB slang: All attributes not in L are projected away.

187

Projection (3)

In general, the cardinalities of the input and output relations are not equal. Projection eliminates duplicates
A 1 B 2 3 B 4 = 5 4

B 4 5

188

Projection (4)
Ai1 ,...,Aik (R) may be imlemented as: Naive projection
create a new temporary relation T ; foreach t = (A1 : d1 , . . . , An : dn ) R do u (Ai1 : di1 , . . . , Aik : dik ); insert u into T ; od eliminate duplicate tuples in T ; return T ;

The necessary duplicate elimination makes L one of the more costly operations in RDBMSs. Thus, query optimizers try hard to prove that the duplicate eliminaton step is not necessary.

189

Projection (5)
If RA is used to formalize the semantics of SQL, the format of the projection list is often generalized: Attribute renaming: B1 Ai1 ,...,Bk Aik (R) . Computations (e.g., string concatenation via ) to derive the value in new columns, e.g.: SID,NAME FIRST
LAST

(STUDENTS) .

Such generalized operators are also referred to as map operators (as in functional programming languages).

190

Projection (6)

A1 ,...,Ak (R) corresponds to the SQL query: SELECT DISTINCT A1 , . . . ,Ak FROM R B1 A1 ,...,Bk Ak (R) is equivalent to the SQL query: SELECT DISTINCT A1 [AS] B1 , . . . ,Ak [AS] Bk FROM R

191

Selection vs. Projection

Selection vs. Projection Selection


A1 A2 A3 A4 A1

Projection
A2 A3 A4

Filter some rows

Maps all rows

192

Combining Operations (1)


Since the result of any relational algebra operation is a relation again, this intermediate result may be the input of a subsequent RA operation. Example: retrieve the exercises solved by student with ID 102: CAT,ENO (SID=102 (RESULTS)) . We can think of the intermediate result to be stored in a named temporary relation (or as a macro denition): S102 SID=102 (RESULTS); CAT,ENO (S102)

193

Combining Operations (2)

Composite RA expressions are typically depicted as operator trees: CAT,ENO SID=102 RESULTS  ???  ??   y +?  ?? ??   x 2

In these trees, computation proceeds bottom-up. The evaluation order of sibling branches is not pre-determined.

194

Combining Operations (3)

SQL-92 permits the nesting of queries (the result of a SQL query may be used in a place of a relation name): Nested SQL Query SELECT DISTINCT CAT, ENO FROM (SELECT * FROM RESULTS WHERE SID = 102) AS S102 Note that this is not the typical style of SQL querying.

195

Combining Operations (4)


Instead, a single SQL query is equivalent to an RA operator tree containing , , and (multiple) (see below): SELECT-FROM-WHERE Block SELECT DISTINCT CAT, ENO FROM RESULTS WHERE SID = 102 Really complex queries may be constructed step-by-step (using SQLs view mechanism), S102 may be used like a relation: SQL View Denition CREATE VIEW S102 AS SELECT * FROM RESULTS WHERE SID = 102
196

Relational Algebra Overview 1. Introduction; Selection, Projection

2. Cartesian Product, Join 3. 4. 5. Set Operations Outer Join Formal Denitions, A Bit of Theory

197

Cartesian Product (1)

In general, queries need to combine information from several tables. In RA, such queries are formulated using , the Cartesian product. Cartesian Product The Cartesian product R S of two relations R, S is computed by concatenating each tuple t R with each tuple u S. ( denotes tuple concatenation.)

198

Cartesian Product (2)


Cartesian Product
A 1 3 B C 2 6 4 8 D 7 9 A 1 1 3 3 B 2 2 4 4 C 6 8 6 8 D 7 9 7 9

Since attribute names must be unique within a tuple, the Cartesian product may only be applied if R, S do not share any attribute names. (This is no real restriction because we have .)

199

Cartesian Product (3)

If t = (A1 : a1 , . . . , An : an ) and u = (B1 : b1 , . . . , Bm : bm ), then t u = (A1 : a1 , . . . , An : an , B1 : b1 , . . . , Bm : bm ). Cartesian Product: Nested Loops create a new temporary relation T ; foreach t R do foreach u S do insert t u into T ; od od return T ;

200

Cartesian Product and Renaming


R S may be computed by the equivalent SQL query (SQL does not impose the unique column name restriction, a column A of relation R may uniquely be identied by R.A): Cartesian Product in SQL SELECT * FROM R, S

In RA, this is often formalized by means of of a renaming operator X (R). If sch(R) = (A1 : D1 , . . . , An : Dn ), then
X (R)

X.A1 A1 ,...,X.An An (R) .

201

Join (1)
The intermediate result generated by a Cartesian product may be quite large in general (|R| = n, |S| = m |R S| = n m). Since the combination of Cartesian product and selection in queries is common, a special operator join has been introduced. Join The (theta-)join R S between relations R, S is dened as R S (R S). The join predicate may refer to attribute names of R and S.

202

Join (2)

S (STUDENTS)

S.SID=R.SID

R (RESULTS)

S.SID 101 101 101 102 102 102 103 103

S.FIRST S.LAST S.EMAIL R.SID R.CAT R.ENO R.POINTS Ann Smith ... 101 H 1 10 Ann Smith ... 101 H 2 8 Ann Smith ... 101 M 1 12 Michael Jones (null) 102 H 1 9 Michael Jones (null) 102 H 2 9 Michael Jones (null) 102 M 1 10 Richard Turner ... 103 H 1 5 Richard Turner ... 103 M 1 7

Note: student Maria Brown does not appear in the join result.

203

Join (3)
R

S can be evaluated by folding the procedures for , :

Nested Loop Join create a new temporary relation T ; foreach t R do foreach u S do if (t u) then insert t u into T ; fi od od return T ;

204

Join (4)
Join combines tuples from two relations and acts like a lter: tuples without join partner are removed. Note: if the join is used to follow a foreign key relationship, then no tuples are ltered: Join follows a foreign key relationship (dereference) RESULTS
SID=S.SID

S.SIDSID,FIRST,LAST,EMAIL (STUDENTS)

There are join variants which act like lters only: left and right semijoin ( , ): R

sch(R) (R

S) ,

or do not lter at all: outer-join (see below).

205

Natural Join
The natural join provides another useful abbreviation (RA macro). In the natural join R S, the join predicate is dened to be an conjunctive equality comparison of attributes sharing the same name in R, S. Natural join handles the necessary attribute renaming and projection. Natural Join Assume R(A, B, C) and S(B, C, D). Then: R S = A,B,C,D (B=B C=C (R B B,C C,D (S)))

(Note: shared columns occur once in the result.)

206

Joins in SQL (1)


In SQL, R

S is normally written as

Join in SQL (classic and SQL-92) SELECT FROM WHERE R,S or SELECT FROM R JOIN S ON

Note: this left query is exactly the SQL equivalent of (R S) we have seen before. SQL is a declarative language: it is the task of the SQL optimizer to infer that this query may be evaluated using a join instead of a Cartesian product.

207

Algebraic Laws (1)

The join satisfy the associativity condition (R S) T R (S T) .

In join chains, parentheses are thus superuous: R S T .

Join is not commutative unless it is followed by a projection, i.e., a column reordering: L (R S) L (S R) .

208

Algebraic Laws (2)


A signicant number of further algebraic laws hold, which are heavily utilized by the query optimizer. Example: selection push-down. If predicate refers to attributes in S only, then (R Selection push-down Why is selection push-down considered one of the most signicant algebraic optimizations? (Such eciency considerations are the subject of Datenbanken II.) S) R (S) .

209

A Common Query Pattern (1)


The following operator tree structure is very common: A1 ,...,Ak ooo OOO O
1

OOO O

R1

Rn

ooo

n1

O O

R2

Rn1

Join all tables needed to answer the query, select the 1 2 relevant tuples, project away all irrelevant columns. 3

210

A Common Query Pattern (2)


The select-project-join query A1 ,...,Ak ( (R1
1

R2

n1

Rn ))

has the obvious SQL equivalent SELECT DISTINCT A1 , . . . ,Ak FROM R1 , . . . ,Rn WHERE AND 1 AND AND n1 It is a common source of errors to forget a join condition: think of the scenario R(A, B), S(B, C), T (C, D) when attributes A, D are relevant for the query output.

211

Relational Algebra Quiz (Level: Novice)


STUDENTS FIRST LAST Ann Smith Michael Jones Richard Turner Maria Brown EXERCISES ENO TOPIC 1 Rel.Alg. 2 SQL 1 SQL RESULTS CAT ENO H 1 H 2 M 1 H 1 H 2 M 1 H 1 M 1

SID 101 102 103 104

EMAIL ... (null) ... ...

CAT H H M

MAXPT 10 10 14

SID 101 101 101 102 102 102 103 103

POINTS 10 8 12 9 9 10 5 7

212

Relational Algebra Quiz (Level: Novice)

Formulate equivalent queries in RA Print all homework results for Ann Smith (print exercise 1 number and points). Who has got the maximum number of points for a 2 homework? Print full name and homework number. (Who has got the maximum number of points for all 3 homework exercises?)

213

Self Joins (1)

Sometimes it is necesary to refer to more than one tuple of the same relation at the same time. Example: Who got more points than the student with ID 101 for any of the exercises? Two answer this query, we need to compare two tuples t, u of the relation RESULTS: tuple t corresponding to the student with ID 101, 1 tuple u, corresponding to the same exercise as the tuple 2 t, in which u.POINTS > t.POINTS.

214

Self Joins (2)

This requires a generalization of the select-project-join query pattern, in which two instances of the same relation are joined (the attributes in at least one instances must be renamed rst): S :=
X (RESULTS) X.CAT=Y.CAT X.ENO=Y.ENO Y.SID=101 )(S) Y

(RESULTS)

X.SID (X.POINTS>Y.POINTS

Such joins are commonly referred to as self joins.

215

Relational Algebra Overview 1. 2. Introduction; Selection, Projection Cartesian Product, Join

3. Set Operations 4. 5. Outer Join Formal Denitions, A Bit of Theory


216

Set Operations (1)


Relations are sets (of tuples). The usual family of binary set operations can also be applied to relations. It is a requirement, that both input relations have the same schema. Set Operations The set operations of relational algebra are R S, R S, and R S (union, intersection, dierence). A minimal set of operations Which of these set operations is redundant (i.e., may be derived using an alternative RA expression, just like )?

217

Set Operations (2)

RS

RS

RS SR

218

Set Operations (3)


R S may be implemented as follows: Union create a new temporary relation T ; foreach t R do insert t into T ; od foreach t S do insert t into T ; od remove duplicates in T ; return T ;

219

Set Operations (4)


R S may be implemented as follows: Dierence create a new temporary relation T ; foreach t R do remove false; foreach u S do remove remove (t = u); od if remove then insert t into T ; fi od return T ;

220

Union (1)

In RA queries, a typical application for is case analysis. Example: Grading MPOINTS := SID,POINTS (CAT=MENO=1 (RESULTS))) SID,GRADEA (POINTS 12 (MPOINTS)) SID,GRADEB (POINTS 10 POINTS<12 (MPOINTS)) SID,GRADEC (POINTS 7 POINTS<10 (MPOINTS)) SID,GRADEF (POINTS 7 (MPOINTS))

221

Union (2)
In SQL, is directly supported: keyword UNION. UNION may be placed between two SELECT-FROM-WHERE blocks:
SQLs UNION SELECT FROM WHERE UNION SELECT FROM WHERE AND UNION ... SID, A AS GRADE RESULTS CAT = M AND ENO = 1 AND POINTS = 12 SID, B AS GRADE RESULTS CAT = M AND ENO = 1 POINTS = 10 AND POINTS 12

222

Set Dierence (1)


Note: the RA operators , , , , are monotic by denition, e.g.: RS = (R) (S) .

Then it follows that every query Q that exclusively uses the above operators behaves monotonically: Let I1 be a database state, and let I2 = I1 {t} (database state after insertion of tuple t). Then every tuple u contained in the answer to Q in state I1 is also contained in the anser to Q in state I2 . Database insertion never invalidates a correct answer.

223

Set Dierence (2)


If we pose non-monotonic queries, e.g., Which student has not solved any exercise? Who got the most points for Homework 1? Who has solved all exercises in the database? then it is obvious that , , , , are not sucient to formulate the query. Such queries require set dierence (). A non-monotonic query Which student has not solved any exercise? (Print full name (FIRST, LAST). (Example database tables repeated on next slide.)

224

Example Database (recap)


STUDENTS FIRST LAST Ann Smith Michael Jones Richard Turner Maria Brown EXERCISES ENO TOPIC 1 Rel.Alg. 2 SQL 1 SQL RESULTS CAT ENO H 1 H 2 M 1 H 1 H 2 M 1 H 1 M 1

SID 101 102 103 104

EMAIL ... (null) ... ...

CAT H H M

MAXPT 10 10 14

SID 101 101 101 102 102 102 103 103

POINTS 10 8 12 9 9 10 5 7

225

Set Dierence (3)


A correct solution? FIRST,LAST (STUDENTS A correct solution? SID,FIRST,LAST (STUDENTS SID (RESULTS)) Correct solution!
SID=SID2

SID2SID (RESULTS))

226

Set Dierence (4)


A typical RA query pattern involving set dierence is the anti-join. Given R(A, B) and S(B, C), retrieve the tuples of R that do not have a (natural) join partner in S (Note: sch(R) sch(S) = {B}): R (B (R) B (S)) . S).)

(The following is equivalent: R sch(R) (R

There is no common symbol for this anti-join, but R S seems appropriate (complemented semi-join).

227

Set Dierence (5)


Suppose that the relations R, S have been computed as: S := SELECT A1 , . . . , An FROM R1 , . . . , Rm WHERE 1 R := SELECT B1 , . . . , Bn FROM S1 , . . . , Sk WHERE 2 Set dierence R S in SQL4
SELECT FROM WHERE A1 , . . . ,An R1 , . . . ,Rm 1 AND NOT EXISTS (SELECT * FROM S1 , . . . ,Sk WHERE 2 AND B1 =A1 AND AND Bn =An )

4 The

subquery in () returns TRUE if it returns 0 tuples.

228

Set Operations and Complex Selections


Note that the availability of , (and ) renders complex selection predicates superuous: Predicate Simplication Rules 1 2 (Q) = 1 2 (Q) (Q) = =

1 (Q) 2 (Q) 1 (Q) 2 (Q) Q (Q)

RDBMS implement complex selection predicates anyway Why?

229

Relational Algebra Quiz (Level: Intermediate)


The RA quiz below refers to the Homework DB. Schema: RESULTS (SID STUDENTS,(CAT, ENO) EXERCISES, POINTS) STUDENTS (SID,FIRST,LAST,EMAIL) EXERCISES (CAT,ENO,TOPIC,MAXPT) Formulate equivalent queries in RA Who got the most points (of all students) for 1 Homework 1? Which students solved all exercises in the database? 2

230

Union vs. Join


Find RA expressions that translate between the two Two alternative representations of the homework, midterm exam, and nal totals of the students are:
RESULTS 1 STUDENT H M Jim Ford 95 60 Ann Smith 80 90 F 75 95 RESULTS 2 STUDENT CAT Jim Ford H Jim Ford M Jim Ford F Ann Smith H Ann Smith M Ann Smith F PCT 95 60 75 80 90 95

231

Summary
The ve basic operations of relational algebra are: 1 L 2 3 4 5 Selection Projection Cartesian Product Union Dierence

Derived (and thus redundant) operations: Theta-Join , Natural Join , Semi-Join and Intersection .

, Renaming ,

232

Relational Algebra Overview 1. 2. 3. Introduction; Selection, Projection Cartesian Product, Join Set Operations

4. Outer Join 5. Formal Denitions, A Bit of Theory

233

Outer Join (1)


Join ( ) eliminates tuples without partner: A a1 a1 B b1 b2 B b2 b3 C c2 c3 = A a2 B b2 C c2

The left outer join preserves all tuples in its left argument, even if a tuple does not team up with a partner in the join: A a1 a1 B b1 b2 B b2 b3 C c2 c3 = A a1 a2 B b1 b2 C (null) c2

234

Outer Join (2)


The right outer join preserves all tuples in its right argument: A a1 a1 B b1 b2 B b2 b3 C c2 c3 = A a2 (null) B b2 b3 C c2 c3

The full outer join preserves all tuples in both arguments: A a1 a1 B b1 b2 B b2 b3 C c2 c3 A a1 a2 (null) B b1 b2 b3 C (null) c2 c3

235

Outer Join (3)


R

create a new temporary relation T ; foreach t R do haspartner false; foreach u S do if (t u) then insert t u into T ; haspartner true; fi od if haspartner then insert t (null, . . . , null) into T ; | {z } fi od # attributes in S return T ;

236

Outer Join (4)


Example: Prepare a full homework results report, including those students who did not hand in any solution at all: SID SID,ENO,POINTS (CAT=H (RESULTS))
EMAIL SID ... 101 ... 101 (null) 102 (null) 102 ... 103 ... (null) ENO 1 2 1 2 1 (null) POINTS 10 8 9 9 5 (null)

STUDENTS
SID 101 101 102 102 103 104

SID=SID

FIRST Ann Ann Michael Michael Richard Maria

LAST Smith Smith Jones Jones Turner Brown

237

Outer Join (5)


Join vs. Outer Join Is there any dierence between STUDENTS RESULTS and RESULTS? STUDENTS (Can you tell without looking at the table states?) Note: Outer join is a derived operation (like , ), i.e., it can simulated using the ve basic relational algebra operations. Consider R(A, B) and S(B, C). Then R S (R S) (R A,B (R S)) {(C:null)}

SQL-92 provides {FULL, LEFT, RIGHT} OUTER JOIN.


238

Relational Algebra Overview 1. 2. 3. 4. Introduction; Selection, Projection Cartesian Product, Join Set Operations Outer Join

5. Formal Denitions, A Bit of Theory

239

Denitions: Syntax (1)

Let the following be given: A set D of data type names and for each D D a set val(D) of values. A set A of valid attribute names (identiers). Relational Database Schema A relational database schema S consists of a nite set of relation names R, and for every R R a relation schema sch(R). (We ill ignore constraints here.)

240

Denitions: Syntax (2)


The set of syntactically correct RA expressions or queries is dened recursively, together with the resulting schema of each expression. Syntax of RA (Base Cases) R (relation name) 1 For every R R, R is an RA expression with schema sch(R). {(A1 :d1 , . . . , An :dn )} (relation constant) 2 A relation constant is an RA expression if A1 , . . . , An A, di val(Di ) for 1 i n with D1 , . . . , Dn D. The schema of this expression is (A1 :D1 , . . . , An :Dn ).

241

Denitions: Syntax (3)


Let Q be an RA expr. with schema s = (A1 :D1 , . . . , An :Dn ). Syntax of RA (Recursive Cases) Ai =Aj (Q) 3 for i, j {1, . . . , n} is an RA expression with schema s. Ai =d (Q) 4 for i {1, . . . , n} and d val(Di ) is an RA expression with schema s. B1 Ai1 ,...,Bm Aim (Q) 5 for i1 , . . . , im {1, . . . , n} and B1 , . . . , Bm A such that Bj = Bk for j = k is an RA expression with schema (B1 :Di1 , . . . , Bm :Dim ).

242

Denitions: Syntax (4)


Let Q1 , Q2 be an RA expressions with the same schema s. Syntax of RA (Recursive Cases) Q1 Q2 and Q1 Q2 6 7 are RA expressions with schema s. Let Q1 , Q2 be RA expressions with schemas (A1 :D1 , . . . , An :Dn ) and (B1 :E1 , . . . , Bm :Em ), respectively. Syntax of RA (Recursive Cases) Q1 Q2 8 is an RA expression with schema (A1 :D1 , . . . , An :Dn , B1 :E1 , . . . , Bm :Em ) if {A1 , . . . , An } {B1 , . . . , Bm } = .

243

Denitions: Semantics (1)

Database State A database state I (instance) denes a relation I(R) for every relation name R in the database schema S. The result of a query Q, i.e., an RA expression, in a database state I is a relation. This relation is denoted by I[Q] and dened recursively corresponding to the syntactic structure of Q.

244

Denition: Semantics (2)


I[Q] If Q is a relation name R, then I[Q] := I(R). If Q is a constant relation {(A1 :d1 , . . . , An :dn )}, then I[Q] := {(d1 , . . . , dn )}. If Q has the form Ai =Aj (Q1 ), then I[Q] := {(d1 , . . . , dn ) I[Q1 ] | di = dj } If Q has the form Ai =d (Q1 ), then I[Q] := {(d1 , . . . , dn ) I[Q1 ] | di = d} If Q has the form B1 Ai1 ,...,Bm Aim (Q1 ), then I[Q] := {(di1 , . . . , dim ) | (d1 , . . . , dn ) I[Q1 ]}

245

Denition: Semantics (3)

I[Q] (continued) If Q has the form Q1 Q2 , then I[Q] := I[Q1 ] I[Q2 ] If Q has the form Q1 Q2 , then I[Q] := I[Q1 ] I[Q2 ] If Q has the form Q1 Q2 , then I[Q] := { (d1 , . . . , dn , e1 , . . . , em ) | (d1 , . . . , dn ) I[Q1 ], (e1 , . . . , em ) I[Q2 ]} .

246

Monotonicity
Smaller Database State A database state I1 is smaller than (or equal to) a database state I2 , written I1 I2 , i I1 (R) I2 (R) for all relation names R R of schema S. Theorem: RA \{} is monotonic If an RA expression Q does not contain the (set dierence) operator, then the following holds for all database states I1 , I2 : I1 I2 = I1 [Q] I2 [Q] .

Formulate proof by induction on syntactic structure of Q (structural induction).

247

Equivalence (1)
Equivalence of RA Expressions Two RA expressions Q1 and Q2 are equivalent i they have the same (result) schema and for all database states I, the following holds: I[Q1 ] = I[Q2 ] .

Examples: 1 (2 (Q)) = 2 (1 (Q)) (Q1 Q2 ) Q3 = Q1 (Q2 Q3 ) If A is an attribute in the result schema of Q1 , then A=d (Q1 Q2 ) = (A=d (Q1 )) Q2 . Theorem: The equivalence of (arbitrary) relational algebra expressions is undecidable.

248

Limitations of RA (1)
Let R be a relation name and assume sch(R) = (A:D, B:D), i.e., both columns share the same data type D. Let val(D) be innite. The transitive closure of I(R) is the set of all (d, e) val(D) val(D) such that there are n N, n and d0 , d1 , . . . , dn val(D) with d = d0 , e = dn and (di1 , di ) I(R) for i = 1, . . . , n. a + b c
R from to a b b c c d

1,

dl

249

Limitations of RA (2)

Theorem: There is no RA expression Q such that I[Q] is the transitive closure of I(R) for all database states I. In the directed graph example, one self-join (of R with itself) is needed, to follow the edges in the graph: S.from,T .to (
S (R) S.to=T .from T (R))

An n-fold self-join will nd all paths in the graph of length n + 1. To compute the transitive closure for arbitrary graphs, i.e., for all database states I, is impossible in RA.

250

Limitations of RA (3)
This of course implies that relational algebra is not computationally complete. There are functions from database states to relations (query results), for which we could write a program5 , but we will not be able to nd an equivalent RA expression to do the same. However, this would have been truly unexpected and actually unwanted, because want a guarantee that query evaluation always terminates. This is guaranteed for RA.
Otherwise, we would have solved the halting problem.
5 Pick

your favourite programming language.

251

Limitations of RA (4)
All RA queries can be evaluated in time that is polynomically in the size of the database state. This implies that certain complex problems cannot be formulated in relational algebra.
For example, if you nd a way to formulate the Traveling Salesman problem in RA, you have solved the famous P = NP problem. (With a solution that nobody expects; contact me to collect your PhD.)

As the transitive closure example shows, even not all problems of polynomial complexity can be formulated in classical RA.

252

Expressive Power (1)

Relational Completeness A query language L for the relational model is called strong relationally complete if, for every DB schema S and for every RA epxression Q1 with respect to S there is a query Q2 L such that for all database states I with respect to S the two queries produce the same results: I[Q1 ] = I[Q2 ] .

Read as: It is possible to write an RA-to-L query compiler.

253

Expressive Power (2)


SQL is strong relationally complete. If we can even write RA-to-L as well as L-to-RA compilers, both query languages are equivalent. SQL and RA are not equivalent. SQL contains concepts, e.g., the aggregate COUNT, which cannot be simulated in RA. Equivalent Query Languages Relational algebra, 1 SQL without aggregations and with mandatory 2 duplicate elimination, Tuple relational calculus, 3 Datalog (a Prolog variant) without recursion. 4

Você também pode gostar