Escolar Documentos
Profissional Documentos
Cultura Documentos
Chapter
Deborah Costa Oct 18, 2007
8.1 Normalization
Data redundancy and the consequent modification (insertion, deletion, and update) anomalies can be traced to undesirable functional dependencies in a relation schema Desirable FD: is any FD in a relation schema, R where the determinant is a candidate key of R; this will not cause data redundancy. Undesirable FD: is where the determinant of an FD in R is not a candidate key of R and this will cause data redundancy.
Database Normalization was first proposed by Edgar F. Codd. Codd defined the first three Normal Forms, which well look into, of the 7 known Normal Forms.
In order to do normalization we must know what the requirements are for each of the three Normal Forms that well go over. One of the key requirements to remember is that Normal Forms are progressive. That is, in order to have 3rd NF we must have 2nd NF and in order to have 2nd NF we must have 1st NF.
An update anomaly. Employee 519 is shown as having different addresses on different records.
An insertion anomaly. Until the new faculty member is assigned to teach at least one course, his details cannot be recorded.
A deletion anomaly. All information about Dr. Giddens is lost when he temporarily ceases to be assigned to any courses.
Normalization (cont)
In order to eliminate this problem with undesirable FD is to somehow render the undesirable FDs desirable and the process of doing this is called normalization. Normal Forms (NFs) provides a stepwise progression towards the goal of a fully normalized relation schema that is guaranteed to be free of data redundancies that cause modification anomalies from a functional dependency perspective.
Normalization (cont)
A relation schema is said to be in a particular normal form if it satisfies certain prescribed criteria; otherwise the relation is said to violate the normal form. The violation of each of these normal forms signals the presence of a specific type of undesirable FD. It is important to note that the normalization process is anchored to the candidate key of a relation schema, R. We will use the primary key as the basis for evaluating and normalizing a relation schema.
As you can see this is schema violates the 1NF because there are multiple Artirst_nm associated with an Album_no or the domain of Artist_nm does not have atomic values. In fact by definition, ALBUM is not even a relation.
In order to fix ALBUM we must expand the relation so that there is a tuple for each (atomic) Artist_nm for a given Album_no. The primary key for this is {Album_no, Artist_nm} as we all should hopefully know by now.
All requirements for 1st NF must be met. Redundant data across multiple rows of a table must be moved to a separate table.
The resulting tables must be related to each other by use of foreign key.
2nd NF Example
Only Candidate key is (Employee, Skill) Not in 2NF Current Work Location is dependent on Employee Can Cause an Anomaly
Updating Jones Work location for Typing and Shorthand but not Whittling. Then asking What is Jones current work location, can cause a contradictory answer, because there are 2 different locations.
2nd NF Example
Both tables are in 2NF Meets 1NF requirements No non-primary key attribute is dependent on part of a key
All requirements for 2nd NF must be met. Eliminate fields that do not depend on the primary key;
That is, any field that is dependent not only on the primary key but also on another field must be moved to another table
Eliminate Columns Not Dependent On Key i.e. if a column is in a relation, then it must be dependent on the key.
We can see from figure 8.8a that STOCK follows 1NF because there are not composite or multi-valued attributes in it.
Now that we have the primary key for STOCK we can see that:
fd1, fd2, fd3 and fd4 violates 2NF in STOCK fd6 violates 3NF in STOCK. fd7 violates BCNF in STOCK
To fix all of the violations above we must decompose the relational schema D:{R1 R2 R3 R4 R5}
This section is very confusing in my opinion. So for better understanding please read it more then once. After reading a couple of times we should be able to know how to decompose the base relation schema under investigation and know if our decomposition is complete and correct without looking at the same data. A decomposition is complete when it is a dependencypreserving lossless-join decomposition. Preservation of FDs is a verification process and is accomplished by inspecting the decomposition to see if the union of the FDs hold on individual relation schema of D is a cover for F. This is demonstrated in Section 8.1.5.1.You should also test for the lossless-join property, the method for testing is presented in Section 8.1.5.2.