Escolar Documentos
Profissional Documentos
Cultura Documentos
Ken Moody
Computer Laboratory
University of Cambridge, UK
Lecture notes by Timothy G. Grifn
Lent 2012
Databases
DB 2012
1 / 175
DB 2012
2 / 175
DB vs. IR
Relational Databases
ACID properties
Two fundamental trade-offs
OLTP vs. OLAP
Course outline
Databases
Databases
DB 2012
3 / 175
Databases
DB 2012
4 / 175
IR system
fuzzy query results
optimized for concurrent reads
domain often open-ended
search existing documents
reduce information overload
Databases
DB 2012
5 / 175
Databases
DB 2012
6 / 175
Databases
DB 2012
7 / 175
real-world change
Domain of Interest
represent
Database
represent
database update(s)
Database
Databases
DB 2012
8 / 175
SQL implementation
Databases
DB 2012
9 / 175
Number
Employee
ISA
Does
Mechanic
Salesman
Description
Number
Parts
Price
RepairJob
Buys
Date
License
Cost
Date
Value
Sells
Commission
Value
Work
Repairs
Car
seller
Manufacturer
Model
Year
buyer
Client
Name
Address
ID
Phone
By Pvel Calado,
http://www.texample.net/tikz/examples/entity-relationship-diagram
Ken Moody (cl.cam.ac.uk)
Databases
DB 2012
10 / 175
11 / 175
Supports
Data is
Transactions mostly
optimized for
Normal Forms
OLAP
analysis
historical
reads
query processing
not important
Databases
OLTP
day-to-day operations
current
updates
updates
important
DB 2012
12 / 175
fast updates
Extract
Operational Database
Data Warehouse
Databases
DB 2012
13 / 175
DB 2012
14 / 175
Databases
Databases
DB 2012
15 / 175
NoSQL Movement
Technologies
Key-value store
Directed Graph Databases
Main memory stores
Distributed hash tables
Applications
Facebook
Google
iMDB
...
Always remember to ask : What problem am I solving?
Ken Moody (cl.cam.ac.uk)
Databases
DB 2012
16 / 175
Term Outline
Lecture 02 The relational data model.
Lecture 03 Entity-Relationship (E/R) modelling
Lecture 04 Relational algebra and relational calculus
Lecture 05 SQL
Lecture 06 Case Study - Cancer registry for the NHS - challenges
Lecture 07 Schema renement I
Lecture 08 Schema renement II
Lecture 09 Schema renement III and advanced design
Lecture 10 On-line Analytical Processing (OLAP)
Lecture 11 Case Study - Cancer registry for the NHS experiences
Lecture 12 XML as a data exchange format
Databases
DB 2012
17 / 175
Recommended Reading
Textbooks
SKS Silberschatz, A., Korth, H.F. and Sudarshan, S. (2002).
Database system concepts. McGraw-Hill (4th edition).
(Adjust accordingly for other editions)
Chapters 1 (DBMSs)
2 (Entity-Relationship Model)
3 (Relational Model)
4.1 4.7 (basic SQL)
6.1 6.4 (integrity constraints)
7 (functional dependencies and normal
forms)
22 (OLAP)
UW Ullman, J. and Widom, J. (1997). A rst course in
database systems. Prentice Hall.
CJD Date, C.J. (2004). An introduction to database systems.
Addison-Wesley (8th ed.).
Ken Moody (cl.cam.ac.uk)
Databases
DB 2012
18 / 175
Databases
DB 2012
19 / 175
DB 2012
20 / 175
Databases
Databases
DB 2012
21 / 175
Relational Schema
Databases
DB 2012
22 / 175
Databases
DB 2012
23 / 175
Example
A relational schema
Students(name: string, sid: string, age : integer)
A tabular presentation
name
Fatima
Eva
James
Ken Moody (cl.cam.ac.uk)
sid
fm21
ev77
jj25
Databases
age
20
18
19
DB 2012
24 / 175
Key Concepts
Relational Key
Suppose R(X) is a relational schema with Z X. If for any records u
and v in any instance of R we have
u.[Z] = v .[Z] = u.[X] = v .[X],
then Z is a superkey for R. If no proper subset of Z is a superkey, then
Z is a key for R. We write R(Z, Y) to indicate that Z is a key for
R(Z Y).
Note that this is a semantic assertion, and that a relation can have
multiple keys.
Databases
DB 2012
25 / 175
Databases
DB 2012
26 / 175
Databases
DB 2012
27 / 175
Databases
DB 2012
28 / 175
Keys in SQL
A key is a set of attributes that will uniquely identify any record (row) in
a table.
-- with this create table
create table Students
(sid varchar(10),
name varchar(50),
age int,
primary key (sid));
-- if we try to insert this (fourth) student ...
mysql> insert into Students set
name = Flavia, age = 23, sid = fm21;
ERROR 1062 (23000): Duplicate
entry fm21 for key PRIMARY
Ken Moody (cl.cam.ac.uk)
Databases
DB 2012
29 / 175
Output : a single
relation instance
=
Q(R1 , R2 , , Rk )
Databases
DB 2012
30 / 175
Databases
DB 2012
31 / 175
Relational Calculi
Databases
DB 2012
32 / 175
ANSI: SQL-86
ANSI and ISO : SQL-89, SQL-92, SQL:1999, SQL:2003,
SQL:2006, SQL:2008
Query Language
Data Denition Language
System Administration Language
...
Databases
DB 2012
33 / 175
Selection
R
A
20
11
4
77
B C D
10 0 55
10 0 7
99 17 2
25 4 0
Q(R)
=
A B C D
20 10 0 55
77 25 4 0
RA Q = A>12 (R)
TRC Q = {t | t R t.A > 12}
Databases
DB 2012
34 / 175
Projection
R
A
20
11
4
77
Q(R)
B C D
10 0 55
10 0 7
99 17 2
25 4 0
B C
10 0
99 17
25 4
RA Q = B,C (R)
TRC Q = {t | u R t.[B, C] = u.[B, C]}
Databases
DB 2012
35 / 175
R
A
20
11
4
77
B C D
10 0 55
10 0 7
99 17 2
25 4 0
B C
10 0
10 0
99 17
25 4
SQL is actually based on multisets, not sets. We will look into this
more in Lecture 11.
Ken Moody (cl.cam.ac.uk)
Databases
DB 2012
36 / 175
Outline
Entities
Relationships
Their relational implementations
n-ary relationships
Generalization
On the importance of SCOPE
Databases
DB 2012
37 / 175
Databases
Year
1997
1999
2000
2000
Actor
Mike Myers
Mike Myers
Bill Chott
Marc Lynn
DB 2012
38 / 175
Year
MovieID
Title
FirstName
Movie
Person
LastName
PersonID
Databases
DB 2012
39 / 175
Title
Austin Powers: International Man of Mystery
Austin Powers: The Spy Who Shagged Me
Dude, Wheres My Car?
Year
1997
1999
2000
Person
PersonID
6902836
1757556
5882058
FirstName
Mike
Bill
Marc
LastName
Myers
Chott
Lynn
Databases
DB 2012
40 / 175
Relationships
Title
MovieID
Year
Movie
FirstName
ActsIn
Databases
LastName
PersonID
Person
DB 2012
41 / 175
Foreign Key
Suppose we have R(Z, Y). Furthermore, let S(W) be a relational
schema with Z W. We say that Z represents a Foreign Key in S for R
if for any instance we have Z (S) Z (R). This is a semantic
assertion.
Referential integrity
A database is said to have referential integrity when all foreign key
constraints are satised.
Databases
DB 2012
42 / 175
A relational representation
A relational schema
ActsIn(MovieID, PersonID)
With referential integrity constraints
MovieID (ActsIn) MovieID (Movie)
PersonID (ActsIn) PersonID (Person)
ActsIn
PersonID
6902836
6902836
1757556
5882058
Ken Moody (cl.cam.ac.uk)
MovieID
55871
55873
171771
171771
Databases
DB 2012
43 / 175
DB 2012
44 / 175
Databases
meaning
no constraints
one to many
many to one
s S, t1 , t2 T .(R(s, t1 ) R(s, t2 )) = t1 = t2
one to one
Databases
DB 2012
45 / 175
Relation R is
many to many (M : N)
one to many (1 : M)
many to one (M : 1)
one to one (1 : 1)
Databases
DB 2012
46 / 175
Relation R is
Schema
many to many (M : N)
R(X , Z , U)
one to many (1 : M)
R(X , Z , U)
many to one (M : 1)
R(X , Z , U)
one to one (1 : 1)
Databases
DB 2012
47 / 175
W
w1
w2
w3
Z
z1
R
X
x2
Databases
T
U
u1
X
x1
x2
x3
x4
Y
y1
y2
y3
y4
DB 2012
48 / 175
Title
Austin Powers: International Man of Mystery
Austin Powers: International Man of Mystery
Austin Powers: The Spy Who Shagged Me
Austin Powers: The Spy Who Shagged Me
Austin Powers: The Spy Who Shagged Me
Dude, Wheres My Car?
Dude, Wheres My Car?
Year
1997
1997
1999
1999
1999
2000
2000
Actor
Mike Myers
Mike Myers
Mike Myers
Mike Myers
Mike Myers
Bill Chott
Marc Lynn
Role
Austin Powers
Dr. Evil
Austin Powers
Dr. Evil
Fat Bastard
Big Cult Guard 1
Cop with Whips
Databases
DB 2012
49 / 175
Year
MovieID
Title
Movie
Role
ActsIn
FirstName
Person
LastName
PersonID
Databases
DB 2012
50 / 175
Description
Role
Year
MovieID
Title
Movie
FirstName
ActsIn
Person
LastName
PersonID
Databases
DB 2012
51 / 175
Person
ActsIn
Casting
HasCasting
Movie
RequiresRole
Role
Databases
DB 2012
52 / 175
Branch
Works-On
Job
Employee
Assigned-To
Project
Involves
Branch
Requires
Job
Databases
DB 2012
53 / 175
Generalization
Movie
ISA
Comedy
Drama
Questions
Is every movie either comedy or a drama?
Can a movie be a comedy and a drama?
DB 2012
54 / 175
Databases
DB 2012
55 / 175
Databases
Country
USA
Iceland
UK
Brazil
USA
Iceland
UK
Brazil
USA
Iceland
UK
Brazil
Russia
Day
02
24
05
13
08
02
30
08
10
9
9
9
18
Month
05
10
09
02
06
07
07
10
12
02
02
03
09
Year
1997
1997
1997
1998
1999
1999
1999
1999
2000
2001
2001
2001
2001
DB 2012
56 / 175
Year
MovieID
Title
Movie
Year
MovieID
Year
Country
Title
Movie
Date
Released
Databases
MovieRelease
Month
Day
DB 2012
57 / 175
Outline
Constructing new tuples!
Joins
Limitations of Relational Algebra
Databases
DB 2012
58 / 175
Renaming
Q(R)
R
B C D
10 0 55
10 0 7
99 17 2
25 4 0
A
20
11
4
77
A
20
11
4
77
E C F
10 0 55
10 0 7
99 17 2
25 4 0
RA Q = {BE, DF } (R)
Databases
DB 2012
59 / 175
Union
Q(R, S)
A B
20 10
11 10
4 99
A
B
20 10
77 1000
A
B
20 10
11 10
4
99
77 1000
RA Q = R S
TRC Q = {t | t R t S}
Databases
DB 2012
60 / 175
Intersection
R
A B
20 10
11 10
4 99
Q(R)
S
A
B
20 10
77 1000
A B
20 10
RA Q = R S
TRC Q = {t | t R t S}
Databases
DB 2012
61 / 175
Difference
R
A B
20 10
11 10
4 99
Q(R)
S
A
B
20 10
77 1000
A B
11 10
4 99
RA Q = R S
TRC Q = {t | t R t S}
Databases
DB 2012
62 / 175
Databases
DB 2012
63 / 175
Databases
DB 2012
64 / 175
Product
R
A B
20 10
11 10
4 99
D
C
14 99
77 100
A
20
20
11
11
4
4
Q(R, S)
B C
D
10 14 99
10 77 100
10 14 99
10 77 100
99 14 99
99 77 100
Databases
DB 2012
65 / 175
Product is special!
R AC, BD (R)
R
A B
20 10
4 99
A
20
20
4
4
B C D
10 20 10
10 4 99
99 20 10
99 4 99
Databases
DB 2012
66 / 175
Natural Join
Natural Join
Given R(X, Y) and S(Y, Z), we dene the natural join, denoted
R S, as a relation over attributes X, Y, Z dened as
R S {t | u R, v S, u.[Y] = v .[Y] t = u.[X] u.[Y] v .[Z]}
In the Relational Algebra:
R S = X,Y,Z (Y=Y (R YY (S)))
Databases
DB 2012
67 / 175
Join example
Colleges
Students
name
Fatima
Eva
James
sid
fm21
ev77
jj25
age
20
18
19
cid cname
k
Kings
Clare
cl
q Queens
..
..
.
.
cid
cl
k
cl
Databases
DB 2012
68 / 175
Databases
DB 2012
69 / 175
Division
Given R(X, Y) and S(Y), the division of R by S, denoted R S, is the
relation over attributes X dened as (in the TRC)
R S {x | s S, x s R}.
name
Fatima
Fatima
Eva
Eva
Eva
James
award
writing
music
music
writing
dance
dance
award
music
writing
dance
Databases
name
Eva
DB 2012
70 / 175
Division in RA
R S X (R) X ((X (R) S) R)
Ken Moody (cl.cam.ac.uk)
Databases
DB 2012
71 / 175
Query Safety
A query like Q = {t | t R t S} raises some interesting questions.
Should we allow the following query?
Q = {t | t S}
We want our relations to be nite!
Safety
A (TRC) query
Q = {t | P(t)}
Databases
DB 2012
72 / 175
The expressive power of RA, TRC, and DRC are essentially the
same.
stored procedures
recursive queries
ability to embed SQL in standard procedural languages
Databases
DB 2012
73 / 175
DB 2012
74 / 175
Outline
NULL in SQL
three-valued logic
Multisets and aggregation in SQL
Views
General integrity constraints
Databases
Databases
DB 2012
75 / 175
What is NULL?
NULL is a place-holder, not a value!
NULL is not a member of any domain (type),
For records with NULL for age, an expression like age > 20
must unknown!
This means we need (at least) three-valued logic.
Let represent We dont know!
T
F
T
T
F
F
F
F
F
T
F
T
T
T
T
Databases
F
T
F
v v
T F
F T
DB 2012
76 / 175
select ...
where P
Databases
DB 2012
77 / 175
The select statement only returns those records where the where
predicate evaluates to true.
Databases
DB 2012
78 / 175
Databases
DB 2012
79 / 175
Databases
DB 2012
80 / 175
Databases
DB 2012
81 / 175
Databases
DB 2012
82 / 175
Databases
DB 2012
83 / 175
DB 2012
84 / 175
An Example ...
mysql> select * from marks;
+-------+-----------+------+
| sid
| course
| mark |
+-------+-----------+------+
| ev77 | databases |
92 |
| ev77 | spelling |
99 |
| tgg22 | spelling |
3 |
| tgg22 | databases | 100 |
| fm21 | databases |
92 |
| fm21 | spelling | 100 |
| jj25 | databases |
88 |
| jj25 | spelling |
92 |
+-------+-----------+------+
Databases
... of duplicates
mysql> select mark from marks;
+------+
| mark |
+------+
|
92 |
|
99 |
|
3 |
| 100 |
|
92 |
| 100 |
|
88 |
|
92 |
+------+
Databases
DB 2012
85 / 175
Why Multisets?
Duplicates are important for aggregate functions.
mysql> select min(mark),
max(mark),
sum(mark),
avg(mark)
from marks;
+-----------+-----------+-----------+-----------+
| min(mark) | max(mark) | sum(mark) | avg(mark) |
+-----------+-----------+-----------+-----------+
|
3 |
100 |
666 |
83.2500 |
+-----------+-----------+-----------+-----------+
Databases
DB 2012
86 / 175
Databases
DB 2012
87 / 175
Visualizing group by
sid
ev77
ev77
tgg22
tgg22
fm21
fm21
jj25
jj25
course
mark
databases
92
spelling
99
spelling
3
databases 100
databases
92
spelling
100
databases
88
spelling
92
group by
=
course mark
spelling
99
spelling
3
spelling 100
spelling
92
course
mark
databases
92
databases 100
databases
92
databases
88
Databases
DB 2012
88 / 175
Visualizing group by
course mark
spelling
99
spelling
3
spelling 100
spelling
92
min(mark)
course
mark
databases
92
databases 100
databases
92
databases
88
course
min(mark)
spelling
3
databases
88
Databases
DB 2012
89 / 175
Databases
DB 2012
90 / 175
Databases
DB 2012
91 / 175
Materialized Views
Databases
DB 2012
92 / 175
Example
C = Z W, and FD that was not preserved for relation R(X),
Let QR be a join that reconstructs R,
Databases
DB 2012
93 / 175
DB 2012
94 / 175
Assertions in SQL
Databases
Databases
DB 2012
95 / 175
Outline
ER is for top-down and informal (but rigorous) design
FDs are used for bottom-up and formal design and analysis
update anomalies
Reasoning about Functional Dependencies
Heaths rule
Databases
DB 2012
96 / 175
Update anomalies
Big Table
sid name college
course
part term_name
yy88 Yoni New Hall Algorithms I
IA
Easter
Uri
Kings
Algorithms I
IA
Easter
uu99
bb44
Bin New Hall Databases
IB
Lent
Bin New Hall Algorithms II IB Michaelmas
bb44
Zip
Trinity
Databases
IB
Lent
zz70
Zip
Trinity
Algorithms II IB Michaelmas
zz70
How can we tell if an insert record is consistent with current
records?
Can we record data about a course before students enroll?
Will we wipe out information about a college when last student
associated with the college is deleted?
Ken Moody (cl.cam.ac.uk)
Databases
DB 2012
97 / 175
Big Table
sid name college
course
part term_name
yy88 Yoni New Hall Algorithms I
IA
Easter
uu99
Uri
Kings
Algorithms I
IA
Easter
Bin New Hall Databases
IB
Lent
bb44
Bin New Hall Algorithms II IB Michaelmas
bb44
Zip
Trinity
Databases
IB
Lent
zz70
Zip
Trinity
Algorithms II IB Michaelmas
zz70
Change New Hall to Murray Edwards College
Databases
DB 2012
98 / 175
Databases
DB 2012
99 / 175
Functional Dependency
Databases
DB 2012
100 / 175
Example FDs
Big Table
sid name college
course
part term_name
yy88 Yoni New Hall Algorithms I
IA
Easter
Uri
Kings
Algorithms I
IA
Easter
uu99
Bin New Hall Databases
IB
Lent
bb44
Bin New Hall Algorithms II IB Michaelmas
bb44
zz70
Zip
Trinity
Databases
IB
Lent
Zip
Trinity
Algorithms II IB Michaelmas
zz70
sid name
sid college
course part
course term_name
Ken Moody (cl.cam.ac.uk)
Databases
DB 2012
101 / 175
Keys, revisited
Candidate Key
Let R(X) be a relational schema and Y X. Y is a candidate key if
1
2
Databases
DB 2012
102 / 175
Semantic Closure
Notation
F |= Y Z
Databases
DB 2012
103 / 175
Armstrongs Axioms
Reexivity If Z Y, then F Y Z.
Augmentation If F Y Z then F Y, W Z, W.
Databases
DB 2012
104 / 175
Notation
closure(F , X) = {A | F X A}
Claim 1
If Y W F and Y closure(F , X), then W closure(F , X).
Claim 2
Y W F + if and only if W closure(F , Y).
Databases
DB 2012
105 / 175
DB 2012
106 / 175
Soundness
F f = f F +
Completeness
f F + = F f
Databases
Databases
DB 2012
107 / 175
Closure
By soundness and completeness
closure(F , X) = {A | F X A} = {A | X A F + }
F+
for every subset Y atts(F )
output Y Z
Databases
DB 2012
108 / 175
Y := X
Databases
DB 2012
109 / 175
Brute force!
Lets just consider all possible nonempty sets X there are only 15...
Databases
DB 2012
110 / 175
Example (cont.)
F = {A, B C, C D, D A}
For the single attributes we have
{A}+ = {A},
{B}+ = {B},
{C}+ = {A, C, D},
CD
DA
{D}+ = {A, D}
DA
{D} = {A, D}
The only new dependency we get with a single attribute on the left is
C A.
Ken Moody (cl.cam.ac.uk)
Databases
DB 2012
111 / 175
DB 2012
112 / 175
Example (cont.)
F = {A, B C, C D, D A}
Now consider pairs of attributes.
{A, B}+ = {A, B, C, D},
so A, B D is a new dependency
so A, C D is a new dependency
so nothing new.
so B, C A, D is a new dependency
so B, D A, C is a new dependency
so C, D A is a new dependency
Databases
Example (cont.)
F = {A, B C, C D, D A}
For the triples of attributes:
{A, C, D}+ = {A, C, D},
{A, B, D}+ = {A, B, C, D},
so A, B, D C is a new dependency
so A, B, C D is a new dependency
so B, C, D A is a new dependency
Databases
DB 2012
113 / 175
DB 2012
114 / 175
Example (cont.)
We generated 11 new FDs:
C
A, C
B, C
B, D
A, B, C
B, C, D
A
D
D
C
D
A
A, B
B, C
B, D
C, D
A, B, D
D
A
A
A
C
Databases
Databases
DB 2012
115 / 175
DB 2012
116 / 175
By augmentation we have
F |= Y Z,
F |= Y W.
F |= Y, Y Y, Z,
that is,
F |= Y Y, Z.
F |= Y, Z W, Z.
Therefore, by transitivity we obtain
F |= Y W, Z.
Ken Moody (cl.cam.ac.uk)
Databases
Heaths Rule
Suppose R(A, B, C) is a relational schema with functional
dependency A B, then
R = A,B (R) A A,C (R).
Databases
DB 2012
117 / 175
Databases
DB 2012
118 / 175
Closure Example
R(A, B, C, D, E, F ) with
A, B C
B, C D
DE
C, F B
A,BC
B,CD
DE
{A, B, C}
{A, B, C, D}
{A, B, C, D, E}
Databases
DB 2012
119 / 175
DB 2012
120 / 175
Outline
First Normal Form (1NF)
Second Normal Form (2NF)
3NF and BCNF
Multi-valued dependencies (MVDs)
Fourth Normal Form
Databases
The Plan
Databases
DB 2012
121 / 175
Databases
DB 2012
122 / 175
1NF
A schema R(A1 : S1 , A2 : S2 , , An : Sn ) is in First Normal Form
(1NF) if the domains S1 are elementary their values are atomic.
name
Timothy George Grifn
Databases
DB 2012
123 / 175
X is a superkey for R, or
A is a member of some key, or
X is not a proper subset of any key.
Databases
DB 2012
124 / 175
X is a superkey for R, or
A is a member of some key.
X is a superkey for R.
Is something missing?
Ken Moody (cl.cam.ac.uk)
Databases
DB 2012
125 / 175
Databases
DB 2012
126 / 175
R
A
a
a
a
a
B
b1
b2
b1
b2
A,B (R)
C
c1
c2
c2
c1
A B
a b1
a b2
A,C (R)
A C
a c1
a c2
Clearly A B is not an FD of R.
Databases
DB 2012
127 / 175
A concrete example
course_name lecturer
text
Databases
Tim
Ullman and Widom
Fatima
Date
Databases
Databases
Tim
Date
Fatima Ullman and Widom
Databases
Assuming that texts and lecturers are assigned to courses
independently, then a better representation would in two tables:
course_name lecturer
Databases
Tim
Fatima
Databases
text
course_name
Databases
Ullman and Widom
Date
Databases
Databases
DB 2012
128 / 175
Databases
DB 2012
129 / 175
DB 2012
130 / 175
A few observations
Note 1
Every functional dependency is multivalued dependency,
(Z W) = (Z W).
To see this, just let v = u in the above denition.
Note 2
Let R(Z, W, Y) be a relational schema, then
(Z W) (Z Y),
by symmetry of the denition.
Databases
Databases
DB 2012
131 / 175
Databases
DB 2012
132 / 175
v .Z = t.Z = u.Z,
v .W = t.W,
v .Y = u.Y.
Therefore, Z W holds.
Databases
DB 2012
133 / 175
Z W = {}, or
Y = {}.
4NF
A relational schema R(Z, W, Y) is in 4NF if for every MVD Z W
either
Z W is a trivial MVD, or
Z is a superkey for R.
Note : 4NF BCNF 3NF 2NF
Ken Moody (cl.cam.ac.uk)
Databases
DB 2012
134 / 175
Summary
3NF
Yes
Maybe
Maybe
No
BCNF
Maybe
Maybe
Yes
No
4NF
Maybe
Maybe
Yes
Yes
Databases
DB 2012
135 / 175
Inclusions
Clearly BCNF 3NF 2NF . These are proper inclusions:
Databases
DB 2012
136 / 175
Outline
General Decomposition Method (GDM)
The lossless-join condition is guaranteed by GDM
The GDM does not always preserve dependencies!
FDs vs ER models?
Weak entities
Using FDs and MVDs to rene ER models
Another look at ternary relationships
Databases
DB 2012
137 / 175
3
4
Reminder
For Z W, if we assume Z W = {}, then the conditions are
1
Databases
DB 2012
138 / 175
Databases
DB 2012
139 / 175
GDM++
1
Databases
DB 2012
140 / 175
R(A, B, C, D)
F = {A, B C, C D, D A}
A
D
A
D
A
Databases
DB 2012
141 / 175
Exercise : Try starting with any of the other BCNF violations and see
where you end up.
Databases
DB 2012
142 / 175
Databases
DB 2012
143 / 175
Decomposition 1
Databases
DB 2012
144 / 175
Decomposition 2
Databases
DB 2012
145 / 175
Summary
Databases
DB 2012
146 / 175
MovieID
Title
Movie
MovieID
Country
Title
Movie
Date
Released
MovieRelease
Month
Day
Databases
DB 2012
147 / 175
MovieID
Country
Title
Movie
Released
MovieRelease
Date
Month
Day
Denition
Weak entity sets do not have a primary key.
The existence of a weak entity depends on an identifying entity set
through an identifying relationship.
The primary key of the identifying entity together with the weak
entities discriminators (dashed underline in diagram) identify each
weak entity element.
Ken Moody (cl.cam.ac.uk)
Databases
DB 2012
148 / 175
=
=
=
=
MovieID
Title
Date
Country
Databases
DB 2012
149 / 175
Year
1997
1999
2000
Rating
15
12
15
Scope = Earth
Title
Austin Powers: International Man of Mystery
Austin Powers: International Man of Mystery
Austin Powers: International Man of Mystery
Austin Powers: International Man of Mystery
Austin Powers: The Spy Who Shagged Me
Austin Powers: The Spy Who Shagged Me
Austin Powers: The Spy Who Shagged Me
Dude, Wheres My Car?
Dude, Wheres My Car?
Dude, Wheres My Car?
Ken Moody (cl.cam.ac.uk)
Databases
Year
1997
1997
1997
1997
1999
1999
1999
2000
2000
2000
Country
UK
Malaysia
Portugal
USA
UK
Portugal
USA
UK
USA
Malaysia
Rating
15
18SX
M/12
PG-13
12
M/12
PG-13
15
PG-13
18PL
DB 2012
150 / 175
MovieID
Title
RatingReason
Rating
Movie
to multi-country scope:
Year
MovieID
Title
Movie
Reason
Country
RatingValue
Rating
Rated
Databases
DB 2012
151 / 175
Year
2005
2006
2009
2009
Rating
R
R
R
R
RatingReason
drug use
drug use
drug use
drug use
But
Title {Rating, RatingReason}
Databases
DB 2012
152 / 175
Databases
DB 2012
153 / 175
DB 2012
154 / 175
R1
R3
R2
Databases
R1
R2
Databases
DB 2012
155 / 175
Number
Employee
ISA
Does
Mechanic
Salesman
Description
Number
Parts
Price
RepairJob
Buys
Date
License
Cost
Date
Value
Sells
Commission
Value
Work
Repairs
Car
seller
Manufacturer
Model
Year
buyer
Client
Name
Address
ID
Phone
By Pvel Calado,
http://www.texample.net/tikz/examples/entity-relationship-diagram
Ken Moody (cl.cam.ac.uk)
Databases
DB 2012
156 / 175
Outline
Limits of SQL aggregation
OLAP : Online Analytic Processing
Data cubes
Star schema
Databases
DB 2012
157 / 175
Flat tables are great for processing, but hard for people to read
and understand.
Pivot tables and cross tabulations (spreadsheet terminology) are
very useful for presenting data in ways that people can
understand.
SQL does not handle pivot tables and cross tabulations well.
Databases
DB 2012
158 / 175
Databases
DB 2012
159 / 175
Databases
DB 2012
160 / 175
Databases
DB 2012
161 / 175
DB 2012
162 / 175
Databases
Databases
DB 2012
163 / 175
DB 2012
164 / 175
Databases
Cube Operations
Databases
DB 2012
165 / 175
DB 2012
166 / 175
Databases
Databases
DB 2012
167 / 175
DB 2012
168 / 175
Outline
HTML vs. XML
Using XML to solve the data exchange problem
Domain-specic XML schema
Native XML databases
Databases
HTML vs XML
HTML
HTML
XML
XML
XSL
DTD or XSchema
HTML
XML
XSL
CSS
DTD
:
:
:
:
:
=
=
=
Content
denes presentations
denes schema
Databases
DB 2012
169 / 175
Databases
DB 2012
170 / 175
Databases
DB 2012
171 / 175
DB 2012
172 / 175
Databases
Databases
DB 2012
173 / 175
XML-enabled databases
Relational (XML for exchange)
Data-centric
SQL
http://www.mysql.com/
Ken Moody (cl.cam.ac.uk)
DB 2012
174 / 175
The End
(http://xkcd.com/327)
Databases
DB 2012
175 / 175