00 upvote00 downvote

20 visualizações46 páginasNormalizing DBMS.

Jul 27, 2014

© © All Rights Reserved

PDF, TXT ou leia online no Scribd

Normalizing DBMS.

© All Rights Reserved

20 visualizações

00 upvote00 downvote

Normalizing DBMS.

© All Rights Reserved

Você está na página 1de 46

System

Normalization Process

This Lecture

Schema Refinement

Normalization

Schema Refinement - Review

Conceptual Modeling is a subjective

process

Therefore, the schema after the logical

database design phase may not be very

good (contain redundancies)

However, there are formalisms to

ensure that the schema is good.

This process is called Normalization

Schema Refinement Review

(contd.)

Relational database schema = set of

relations

Relation = set of attributes

How we group the attributes to

relations is very important

Schema Refinement Review

(contd.)

Too many attributes in a relation

Waste space

Anomalies

Decomposing the relation into too

smaller set of relations

Loss-less join property

Dependency preserving property

Schema Refinement Review

(contd.)

Too many attributes

For example,

LECTURER(id, name, address, salary,

deptno,dname building)

Schema Refinement Review

(contd.)

Insertion Anomaly

1. Inserting a new lecturer to the

LECTURER table

- Department information is repeated

(ensure that correct department

information is inserted).

Schema Refinement Review

(contd.)

2. Inserting a department with no

employees

(Impossible b/c null values for id is

not allowed)

Schema Refinement Review

(contd.)

Deletion Anomalies

Deleting the last lecturer from the

department will lose information about

the department

Schema Refinement Review

(contd.)

Update Anomalies

Updating the departments building

needs to be done for all lecturers

working for that department

Schema Refinement Review

(contd.).

When redundancies exists, we should

decompose the relations to smaller

relations

Schema Refinement Review

(contd.)

Decomposing the relation into too

smaller relations

Loss-less join property: we might lose

information if we decompose relations

Dependency-preserving property: The

set of dependencies in S can be verified

by a set of dependencies in R

1

and R

2

Schema Refinement Review

(contd.)

Loss-less join property:

For example,

S P D

S1 P1 D1

S2 P2 D2

S3 P1 D3

S P

S1 P1

S2 P2

S3 P1

P D

P1 D1

P2 D2

P1 D3

S

R

1

R

2

Schema Refinement Review

(contd.)

Joining them together, we get spurious

tuples

S P D

S1 P1 D1

S1 P1 D3

S2 P2 D2

S3 P1 D1

S3 P1 D3

R

1

R

2

Schema Refinement Review

(contd.)

To avoid the above mentioned issues

in the relational schema, we can apply

a formal process called Normalization

Normalization is based on functional

dependencies

Schema Refinement Review

(contd.)

Key points:

Redundancy is based on functional

dependencies

Therefore, normalization is based on

functional dependencies

Schema Refinement Review

(contd.)

Given some FDs, we can usually infer additional FDs:

A B, B C implies A C

An FD f is implied by a set of FDs F if f holds whenever

all FDs in F hold.

F

+

= closure of F is the set of all FDs that are implied by F.

How can we get F

+

?

Schema Refinement Review

(contd.)

Armstrongs Axioms (X, Y, Z are sets of

attributes):

Reflexivity: If X Y, then Y X

Augmentation: If X Y, then XZ YZ

for any Z

Transitivity: If X Y and Y Z, then

X Z

These are sound and complete inference

rules for FDs!

Schema Refinement Review

(contd.)

Couple of additional rules (that follow from AA):

Union: If X Y and X Z, then X YZ

Decomposition: If X YZ, then X Y and X Z

Example: Contracts(cid,sid,jid,did,pid,qty,value), and:

C is the key: C CSJDPQV

Project purchases each part using single contract: JP C

Dept purchases at most one part from a supplier: SD P

JP C, C CSJDPQV imply JP CSJDPQV

SD P implies SDJ JP

SDJ JP, JP CSJDPQV imply SDJ CSJDPQV

Schema Refinement Review

(contd.)

Why is F

+

important?

X RHS in relation R

X is a subset of attributes in relation R. If RHS contains

all attributes of R, then X is a superkey.

If X is not a superkey, then values for X can repeat in

different tuples resulting in redundancy!!!

So determining F

+

can help us find superkeys and

check for any redundancy.

Schema Refinement Review

(contd.)

Computing the closure of a set of FDs can be expensive.

(Size of closure is exponential in # attrs!)

Typically, we just want to check if a given FD X Y is in

the closure of a set of FDs F

+

. An efficient check:

Compute attribute closure of X (denoted X

+

) wrt F:

Set of all attributes A such that X A is in F

+

There is a linear time algorithm to compute this.

Check if Y is in X

+

Schema Refinement Review

(contd.)

Algorithm to find X

+

:

closure = X;

repeat until there is no change: {

If there is an FD U V in F such that U closure

then set closure = closure V

}

Does F = {A B, B C, CD E } imply A E?

i.e, is A E in the closure F

+

? Equivalently, is E in

A

+

?

We can use the attribute closure to find out keys of the

relation. If X

+

contains all attributes of the relation, then X

is a superkey.

Schema Refinement Review

(contd.)

Schema Refinement Steps:

Determine F for relation R

Find all keys in F using attribute closure

Normalize

Schema Refinement Review

(contd.)

There are many Normal Forms

proposed to reduce redundancies

Some of the well-known ones are:

1

st

Normal Form

2

nd

Normal Form

3

rd

Normal Form

Boyce-Codd Normal Form

Schema Refinement Review

(contd.)

Review of some terms

Candidate Key: Each key of a relation is called a

candidate key

Primary Key: A candidate key is chosen to be the

primary key

Prime Attribute: an attribute which is a member of

a candidate key

Nonprime Attribute: An attribute which is not

prime

Schema Refinement Review

(contd.)

1

st

Normal Form

A relation R is in first normal form (1NF)

if domains of all attributes in the

relation are atomic (simple &

indivisible).

Schema Refinement Review

(contd.)

2

nd

Normal Form:

A relation R is in second normal form

(2NF) if every nonprime attribute A in R

is not partially dependent on any key of

R

Schema Refinement Review

(contd.)

Example

EMP_PROJ

NIC PNUM HOURS ENAME PNAME LOC

FD1

FD2

FD3

Schema Refinement Review

(contd.)

NIC PNUM HOURS

NIC ENAME

PNUM PNAME PLOC

EP1

EP2

EP3

Schema Refinement Review

(contd.)

3

rd

Normal Form:

A relation R is in 3

rd

normal form (3NF)

if every

R is in 2NF, and

No nonprime attribute is transitively

dependent on any key

Schema Refinement Review

(contd.)

Example,

ENAME SSN BDATE ADD DNUM DNAME DMGR

EMP_DEPT

Schema Refinement Review

(contd.)

ED1

ED2

ENAME SSN BDATE ADD DNUM

DNUM DNAME DMGR

Schema Refinement Review

(contd.)

Boyce-Codd Normal Form (BCNF):

A relation schema is in Boyce-Codd

Normal Form

If every nontrivial functional dependency

XA hold in R, then X is a superkey of R

Schema Refinement Review

(contd.)

Keys: PropertyID, (County_Name, Lot#)

PROPERTY_

ID

COUNTY

_NAME

LOT# AREA PRICE TAX_

RATE

FD1

FD2

FD3

FD4

FD5

Schema Refinement Review

(contd.)

Decomposition into BCNF:

Consider relation R with FDs F. If X Y violates BCNF,

decompose R into R - Y and XY.

Repeated application of this idea will give us a collection of

relations that are in BCNF; lossless join decomposition, and

guaranteed to terminate.

e.g., CSJDPQV, key C, JP C, SD P, J S

To deal with SD P, decompose into SDP, CSJDQV.

To deal with J S, decompose CSJDQV into JS and CJDQV

In general, several dependencies may cause violation of BCNF.

The order in which we deal with them could lead to very

different sets of relations!

Schema Refinement Review

(contd.)

In general, there may not be a dependency preserving

decomposition into BCNF.

e.g., CSZ, CS Z, Z C

Cant decompose while preserving 1st FD; not in BCNF.

Similarly, decomposition of CSJDQV into SDP, JS and CJDQV is

not dependency preserving (w.r.t. the FDs JP C, SD P

and J S).

However, it is a lossless join decomposition.

In this case, adding JPC to the collection of relations gives

us a dependency preserving decomposition.

JPC tuples stored only for checking FD! (Redundancy!)

Schema Refinement Review

(contd.)

Obviously, the algorithm for lossless join decomp into

BCNF can be used to obtain a lossless join decomp

into 3NF (typically, can stop earlier).

To ensure dependency preservation, one idea:

If X Y is not preserved, add relation XY.

Problem is that XY may violate 3NF! e.g., consider

the addition of CJP to `preserve JP C. What if we

also have J C ?

Refinement: Instead of the given set of FDs F, use a

minimal cover for F.

Schema Refinement Review

(contd.)

Minimal cover G for a set of FDs F:

Closure of F = closure of G.

Right hand side of each FD in G is a single attribute.

If we modify G by deleting an FD or by deleting

attributes from an FD in G, the closure changes.

General alg. to obtain minimal cover:

Put the FDs in a standard form (i.e. single attribute in

RHS).

Minimize the Left side of each FD. For each FD, check

if we can delete attributes in LHS while preserving

equivalence to F

+

.

Delete any redundant FDs.

Schema Refinement Review

(contd.)

Intuitively, every FD in G is needed, and as small as

possible in order to get the same closure as F.

e.g., A B, ABCD E, EF GH, ACDF EG has

the following minimal cover:

A B, ACD E, EF G and EF H

Dependency Preserving 3NF decomposition:

Let R

1

, R

2

, , R

n

be a lossless-join decomposition of R

with a minimal cover F

Let N be dependencies of F which are not preserved

For each FD, X A in N, add XA to the decomposition

of R

Schema Refinement Review

(contd.)

1st diagram translated:

Workers(S,N,L,D,S)

Departments(D,M,B)

Lots associated with

workers.

Suppose all workers in a

dept are assigned the same

lot: D L

Redundancy; fixed by:

Workers2(S,N,D,S)

Dept_Lots(D,L)

Can fine-tune this:

Workers2(S,N,D,S)

Departments(D,M,B,L)

lot

dname

budget did

since

name

Works_In

Departments Employees

ssn

lot

dname

budget

did

since

name

Works_In

Departments Employees

ssn

Before:

After:

Refining an ER Diagram

Exercise

1. Consider the following two sets of functional

dependencies

F= {A ->C, AC ->D,E ->AD, E ->H}

and

G = {A ->CD, E ->AH}

Check whether or not they are equivalent.

To show equivalence, we prove that G is covered by F

and F is covered by G.

Proof that G is covered by F:

{A} + = {A, C, D} (with respect to F),

which covers A ->CD in G

{E} + = {E, A, D, H, C} (with respect to F),

which covers E ->AH in G

Proof that F is covered by G:

{A} + = {A, C, D} (with respect to G),

which covers A ->C in F

{A, C} + = {A, C, D} (with respect to G),

which covers AC ->D in F

{E} + = {E, A, H, C, D} (with respect to G),

which covers E ->AD and E ->H in F

2. Consider the relation schema EMP_DEPT and the following

set F of functional dependencies on EMP_DEPT:

F = {SSN ->{ENAME, BDATE,ADD, DNUM} ,

DNUM ->{DNAME, DMGR} }

Calculate the closures {SSN} + and {DNUM} + with respect to

F.

ENAME SSN BDATE ADD DNUM DNAME DMGR

EMP_DEPT

Answer:

{SSN} + ={SSN, ENAME, BDATE, ADD, DNUM, DNAME, DMGR}

{DNUM} + ={DNUM, DNAME, DMGR}

3. Is the set of functional dependencies F in Exercise 2

minimal? If not, try to find an minimal set of functional

dependencies that is equivalent to F. Prove that your set is

equivalent to F.

Answer:

The set F of functional dependencies in Exercise 2 is not

minimal, because it violates rule 1 of minimality (every FD has

a single attribute for its right hand side).

The set G is an equivalent minimal set:

G= {SSN ->{ENAME}, SSN ->{BDATE},

SSN->{ADD}, SSN ->{DNUM} ,

DNUM ->{DNAME}, DNUM->{DMGR}}

To show equivalence, we prove that F is covered by G

and G is covered by F.

Proof that F is covered by G:

{SSN}+={SSN, ENAME, BDATE, ADD, DNUM,

DNAME, MGR}

(with respect to G), which covers

SSN ->{ENAME, BDATE, ADDRESS, DNUMBER} in F

{DNUM} + ={DNUM, DNAME, DMGR}

(with respect to G), which covers

DNUM ->{DNAME, DMGR} in F

Proof that G is covered by F:

{SSN}+={SSN, ENAME, BDATE, ADD, DNUM, DNAME, DMGR}

(with respect to F), which covers

SSN ->{ENAME}, SSN ->{BDATE}, SSN ->{ADD}, and

SSN ->{DNUM} in G

{DNUM} + ={DNUM, DNAME, DMGR}

(with respect to F), which covers DNUM ->{DNAME} and

DNUM->{DMGR} in G

## Muito mais do que documentos

Descubra tudo o que o Scribd tem a oferecer, incluindo livros e audiolivros de grandes editoras.

Cancele quando quiser.