Escolar Documentos
Profissional Documentos
Cultura Documentos
[ 1- 6 ] NF, Boyce-codd.
PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information.
PDF generated at: Mon, 21 Mar 2011 05:14:41 UTC
Contents
Articles
Database normalization 1
First normal form 9
Second normal form 15
Third normal form 19
Fourth normal form 22
Fifth normal form 24
Sixth normal form 27
Boyce–Codd normal form 28
References
Article Sources and Contributors 33
Image Sources, Licenses and Contributors 34
Article Licenses
License 35
Database normalization 1
Database normalization
In the design of a relational database management system (RDBMS), the process of organizing data to minimize
redundancy is called normalization. The goal of database normalization is to decompose relations with anomalies in
order to produce smaller, well-structured relations. Normalization usually involves dividing large, badly-formed
tables into smaller, well-formed tables and defining relationships between them. The objective is to isolate data so
that additions, deletions, and modifications of a field can be made in just one table and then propagated through the
rest of the database via the defined relationships.
Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know
as the First Normal Form (1NF) in 1970.[1] Codd went on to define the Second Normal Form (2NF) and Third
Normal Form (3NF) in 1971,[2] and Codd and Raymond F. Boyce defined the Boyce-Codd Normal Form (BCNF) in
1974.[3] Higher normal forms were defined by other theorists in subsequent years, the most recent being the Sixth
Normal Form (6NF) introduced by Chris Date, Hugh Darwen, and Nikos Lorentzos in 2002.[4]
Informally, a relational database table (the computerized representation of a relation) is often described as
"normalized" if it is in the Third Normal Form.[5] Most 3NF tables are free of insertion, update, and deletion
anomalies, i.e. in most cases 3NF tables adhere to BCNF, 4NF, and 5NF (but typically not 6NF).
A standard piece of database design guidance is that the designer should create a fully normalized design; selective
denormalization can subsequently be performed for performance reasons.[6] However, some modeling disciplines,
such as the dimensional modeling approach to data warehouse design, explicitly recommend non-normalized
designs, i.e. designs that in large part do not adhere to 3NF.[7]
Objectives of normalization
A basic objective of the first normal form defined by Codd in 1970 was to permit data to be queried and manipulated
using a "universal data sub-language" grounded in first-order logic.[8] (SQL is an example of such a data
sub-language, albeit one that Codd regarded as seriously flawed.)[9]
The objectives of normalization beyond 1NF (First Normal Form) were stated as follows by Codd:
1. To free the collection of relations from undesirable insertion, update and deletion dependencies;
2. To reduce the need for restructuring the collection of relations as new types of data are introduced,
and thus increase the life span of application programs;
3. To make the relational model more informative to users;
4. To make the collection of relations neutral to the query statistics, where these statistics are liable to
change as time goes by.
—E.F. Codd, "Further Normalization of the Data Base Relational Model"[10]
The sections below give details of each of these objectives.
Database normalization 2
Example
Querying and manipulating the data within an unnormalized data structure, such as the following non-1NF
representation of customers' credit card transactions, involves more complexity than is really necessary:
Customer Jones Wilkins Stevens Transactions
To each customer there corresponds a repeating group of transactions. The automated evaluation of any query
relating to customers' transactions therefore would broadly involve two stages:
1. Unpacking one or more customers' groups of transactions allowing the individual transactions in a group to be
examined, and
2. Deriving a query result based on the results of the first stage
For example, in order to find out the monetary sum of all transactions that occurred in October 2003 for all
customers, the system would have to know that it must first unpack the Transactions group of each customer, then
sum the Amounts of all transactions thus obtained where the Date of the transaction falls in October 2003.
One of Codd's important insights was that this structural complexity could always be removed completely, leading to
much greater power and flexibility in the way queries could be formulated (by users and applications) and evaluated
(by the DBMS). The normalized equivalent of the structure above would look like this:
Now each row represents an individual credit card transaction, and the DBMS can obtain the answer of interest,
simply by finding all rows with a Date falling in October, and summing their Amounts. The data structure places all
of the values on an equal footing, exposing each to the DBMS directly, so each can potentially participate directly in
queries; whereas in the previous situation some values were embedded in lower-level structures that had to be
handled specially. Accordingly, the normalized design lends itself to general-purpose query processing, whereas the
unnormalized design does not.
Address}.
Full functional dependency
An attribute is fully functionally dependent on a set of attributes X if it is
• functionally dependent on X, and
• not functionally dependent on any proper subset of X. {Employee Address} has a functional dependency on
{Employee ID, Skill}, but not a full functional dependency, because it is also dependent on {Employee ID}.
Transitive dependency
A transitive dependency is an indirect functional dependency, one in which X→Z only by virtue of X→Y and
Y→Z.
Multivalued dependency
A multivalued dependency is a constraint according to which the presence of certain rows in a table implies
the presence of certain other rows.
Join dependency
A table T is subject to a join dependency if T can always be recreated by joining multiple tables each having a
subset of the attributes of T.
Superkey
A superkey is a combination of attributes that can be used to uniquely identify a database record. A table
might have many superkeys.
Candidate key
A candidate key is a special subset of superkeys that do not have any extraneous information in them.
Examples: Imagine a table with the fields <Name>, <Age>, <SSN> and <Phone Extension>. This table has many
possible superkeys. Three of these are <SSN>, <Phone Extension, Name> and <SSN, Name>. Of those listed, only
<SSN> is a candidate key, as the others contain information not necessary to uniquely identify records ('SSN' here
refers to Social Security Number, which is unique to each person).
Non-prime attribute
A non-prime attribute is an attribute that does not occur in any candidate key. Employee Address would be a
non-prime attribute in the "Employees' Skills" table.
Primary key
Most DBMSs require a table to be defined as having a single unique key, rather than a number of possible
unique keys. A primary key is a key which the database designer has designated for this purpose.
Normal forms
The normal forms (abbrev. NF) of relational database theory provide criteria for determining a table's degree of
vulnerability to logical inconsistencies and anomalies. The higher the normal form applicable to a table, the less
vulnerable it is to inconsistencies and anomalies. Each table has a "highest normal form" (HNF): by definition, a
table always meets the requirements of its HNF and of all normal forms lower than its HNF; also by definition, a
table fails to meet the requirements of any normal form higher than its HNF.
The normal forms are applicable to individual tables; to say that an entire database is in normal form n is to say that
all of its tables are in normal form n.
Newcomers to database design sometimes suppose that normalization proceeds in an iterative fashion, i.e. a 1NF
design is first normalized to 2NF, then to 3NF, and so on. This is not an accurate description of how normalization
typically works. A sensibly designed table is likely to be in 3NF on the first attempt; furthermore, if it is 3NF, it is
overwhelmingly likely to have an HNF of 5NF. Achieving the "higher" normal forms (above 3NF) does not usually
Database normalization 6
require an extra expenditure of effort on the part of the designer, because 3NF tables usually need no modification to
meet the requirements of these higher normal forms.
The main normal forms are summarized below.
First normal form [11] Table faithfully represents a relation and has no repeating
Two versions: E.F. Codd (1970), C.J. Date (2003)
(1NF) groups
Fourth normal form [14] Every non-trivial multivalued dependency in the table is a
Ronald Fagin (1977)
(4NF) dependency on a superkey
Fifth normal form [15] Every non-trivial join dependency in the table is implied
Ronald Fagin (1979)
(5NF) by the superkeys of the table
Sixth normal form [4] Table features no non-trivial join dependencies at all (with
C.J. Date, Hugh Darwen, and Nikos Lorentzos (2002)
(6NF) reference to generalized join operator)
Denormalization
Databases intended for online transaction processing (OLTP) are typically more normalized than databases intended
for online analytical processing (OLAP). OLTP applications are characterized by a high volume of small
transactions such as updating a sales record at a supermarket checkout counter. The expectation is that each
transaction will leave the database in a consistent state. By contrast, databases intended for OLAP operations are
primarily "read mostly" databases. OLAP applications tend to extract historical data that has accumulated over a
long period of time. For such databases, redundant or "denormalized" data may facilitate business intelligence
applications. Specifically, dimensional tables in a star schema often contain denormalized data. The denormalized or
redundant data must be carefully controlled during extract, transform, load (ETL) processing, and users should not
be permitted to see the data until it is in a consistent state. The normalized alternative to the star schema is the
snowflake schema. In many cases, the need for denormalization has waned as computers and RDBMS software have
become more powerful, but since data volumes have generally increased along with hardware and software
performance, OLAP databases often still use denormalized schemas.
Denormalization is also used to improve performance on smaller computers as in computerized cash-registers and
mobile devices, since these may use the data for look-up only (e.g. price lookups). Denormalization may also be
used when no RDBMS exists for a platform (such as Palm), or no changes are to be made to the data and a swift
response is crucial.
Database normalization 7
Bob blue
Bob red
Jane green
Jane yellow
Jane red
Assume a person has several favorite colors. Obviously, favorite colors consist of a set of colors modeled by the
given table. To transform a 1NF into an NF² table a "nest" operator is required which extends the relational algebra
of the higher normal forms. Applying the "nest" operator to the 1NF table yields the following NF² table:
Bob
Favorite Color
blue
red
Jane
Favorite Color
green
yellow
red
To transform this NF² table back into a 1NF an "unnest" operator is required which extends the relational algebra of
the higher normal forms. The unnest, in this case, would make "colors" into its own table.
Although "unnest" is the mathematical inverse to "nest", the operator "nest" is not always the mathematical inverse
of "unnest". Another constraint required is for the operators to be bijective, which is covered by the Partitioned
Normal Form (PNF).
Database normalization 8
• Paper: "Non First Normal Form Relations" by G. Jaeschke, H. -J Schek ; IBM Heidelberg Scientific Center. ->
Paper studying normalization and denormalization operators nest and unnest as mildly described at the end of this
wiki page.
Further reading
• Litt's Tips: Normalization (http://www.troubleshooters.com/littstip/ltnorm.html)
• Date, C. J. (1999), An Introduction to Database Systems (http://www.aw-bc.com/catalog/academic/product/
0,1144,0321197844,00.html) (8th ed.). Addison-Wesley Longman. ISBN 0-321-19784-4.
• Kent, W. (1983) A Simple Guide to Five Normal Forms in Relational Database Theory (http://www.bkent.net/
Doc/simple5.htm), Communications of the ACM, vol. 26, pp. 120–125
• Date, C.J., & Darwen, H., & Pascal, F. Database Debunkings (http://www.dbdebunk.com)
• H.-J. Schek, P. Pistor Data Structures for an Integrated Data Base Management and Information Retrieval System
Database normalization 9
External links
• Database Normalization Basics (http://databases.about.com/od/specificproducts/a/normalization.htm) by
Mike Chapple (About.com)
• Database Normalization Intro (http://www.databasejournal.com/sqletc/article.php/1428511), Part 2 (http://
www.databasejournal.com/sqletc/article.php/26861_1474411_1)
• An Introduction to Database Normalization (http://dev.mysql.com/tech-resources/articles/
intro-to-normalization.html) by Mike Hillyer.
• Normalization (http://www.utexas.edu/its/windows/database/datamodeling/rm/rm7.html) by ITS,
University of Texas.
• A tutorial on the first 3 normal forms (http://phlonx.com/resources/nf3/) by Fred Coulson
• DB Normalization Examples (http://www.dbnormalization.com/)
• Description of the database normalization basics (http://support.microsoft.com/kb/283878) by Microsoft
• Database Normalization and Design Techniques (http://www.barrywise.com/2008/01/
database-normalization-and-design-techniques/) by Barry Wise, recommended reading for the Harvard MIS.
• A Simple Guide to Five Normal Forms in Relational Database Theory (http://www.bkent.net/Doc/simple5.
htm)
• A view whose definition mandates that results be returned in a particular order, so that the row-ordering is an
intrinsic and meaningful aspect of the view.[5] This violates condition 1. The tuples in true relations are not
ordered with respect to each other.
• A table with at least one nullable attribute. A nullable attribute would be in violation of condition 4, which
requires every field to contain exactly one value from its column's domain. It should be noted, however, that this
aspect of condition 4 is controversial. It marks an important departure from Codd's later vision of the relational
model,[6] which made explicit provision for nulls.[7]
Repeating groups
Date's fourth condition, which expresses "what most people think of as the defining feature of 1NF",[8] is concerned
with repeating groups. The following scenario illustrates how a database design might incorporate repeating groups,
in violation of 1NF.
Customer
Customer ID First Name Surname Telephone Number
The designer then becomes aware of a requirement to record multiple telephone numbers for some customers. He
reasons that the simplest way of doing this is to allow the "Telephone Number" field in any given record to contain
more than one value:
Customer
Customer ID First Name Surname Telephone Number
Assuming, however, that the Telephone Number column is defined on some Telephone Number-like domain (e.g.
the domain of strings 12 characters in length), the representation above is not in 1NF. 1NF (and, for that matter, the
RDBMS) prevents a single field from containing more than one value from its column's domain.
First normal form 11
Customer
Customer ID First Name Surname Tel. No. 1 Tel. No. 2 Tel. No. 3
This representation, however, makes use of nullable columns, and therefore does not conform to Date's definition of
1NF. Even if the view is taken that nullable columns are allowed, the design is not in keeping with the spirit of 1NF.
Tel. No. 1, Tel. No. 2., and Tel. No. 3. share exactly the same domain and exactly the same meaning; the splitting of
Telephone Number into three headings is artificial and causes logical problems. These problems include:
• Difficulty in querying the table. Answering such questions as "Which customers have telephone number X?" and
"Which pairs of customers share a telephone number?" is awkward.
• Inability to enforce uniqueness of Customer-to-Telephone Number links through the RDBMS. Customer 789
might mistakenly be given a Tel. No. 2 value that is exactly the same as her Tel. No. 1 value.
• Restriction of the number of telephone numbers per customer to three. If a customer with four telephone numbers
comes along, we are constrained to record only three and leave the fourth unrecorded. This means that the
database design is imposing constraints on the business process, rather than (as should ideally be the case)
vice-versa.
Customer
Customer ID First Name Surname Telephone Numbers
This design is consistent with 1NF, but still presents several design issues. The Telephone Number heading becomes
semantically non-specific, as it can now represent either a telephone number, a list of telephone numbers, or indeed
anything at all. A query such as "Which pairs of customers share a telephone number?" is more difficult to
formulate, given the necessity to cater for lists of telephone numbers as well as individual telephone numbers.
Meaningful constraints on telephone numbers are also very difficult to define in the RDBMS with this design.
First normal form 12
Customer Name
Customer ID First Name Surname
123 555-861-2025
456 555-403-1659
456 555-776-4100
789 555-808-9633
Repeating groups of telephone numbers do not occur in this design. Instead, each Customer-to-Telephone Number
link appears on its own record. With Customer ID as key fields, a "parent-child" or one-to-many (1:M) relationship
exists between the two tables, since a customer record (in the "parent" table) can have many telephone number
records (in the "child" table), but each telephone number usually has one, and only one customer. It is worth noting
that this design meets the additional requirements for second and third normal form (3NF).
Atomicity
Some definitions of 1NF, most notably that of Edgar F. Codd, make reference to the concept of atomicity. Codd
states that the "values in the domains on which each relation is defined are required to be atomic with respect to the
DBMS."[9] Codd defines an atomic value as one that "cannot be decomposed into smaller pieces by the DBMS
(excluding certain special functions)."[10] Meaning a field should not be divided into parts with more than one kind
of data in it such that what one part means to the DBMS depends on another part of the same field.
Hugh Darwen and Chris Date have suggested that Codd's concept of an "atomic value" is ambiguous, and that this
ambiguity has led to widespread confusion about how 1NF should be understood.[11] [12] In particular, the notion of a
"value that cannot be decomposed" is problematic, as it would seem to imply that few, if any, data types are atomic:
• A character string would seem not to be atomic, as the RDBMS typically provides operators to decompose it into
substrings.
• A fixed-point number would seem not to be atomic, as the RDBMS typically provides operators to decompose it
into integer and fractional components.
Date suggests that "the notion of atomicity has no absolute meaning":[13] a value may be considered atomic for some
purposes, but may be considered an assemblage of more basic elements for other purposes. If this position is
accepted, 1NF cannot be defined with reference to atomicity. Columns of any conceivable data type (from string
types and numeric types to array types and table types) are then acceptable in a 1NF table—although perhaps not
always desirable (For example, it would be more desirable to separate a Customer Name field into two separate
fields as First Name, Surname). Date argues that relation-valued attributes, by means of which a field within a table
can contain a table, are useful in rare cases.[14]
First normal form 13
References
[1] "[T]he overriding requirement, to the effect that the table must directly and faithfully represent a relation, follows from the fact that 1NF was
originally defined as a property of relations, not tables." Date, C. J. "What First Normal Form Really Means" (http:/ / www. dbdebunk. com/
page/ page/ 629796. htm) in Date on Database: Writings 2000-2006 (Springer-Verlag, 2006), p. 128.
[2] "First normal form excludes variable repeating fields and groups." Kent, William. "A Simple Guide to Five Normal Forms in Relational
Database Theory" (http:/ / www. bkent. net/ Doc/ simple5. htm), Communications of the ACM 26 (2), Feb. 1983, pp. 120–125.
[3] Elmasri, Ramez and Navathe, Shamkant B. Fundamentals of Database Systems, Fourth Edition (Addison-Wesley, 2003), p. 315.
[4] Date, C. J. "What First Normal Form Really Means" (http:/ / www. dbdebunk. com/ page/ page/ 629796. htm) pp. 127–128.
[5] Such views cannot be created using SQL that conforms to the SQL:2003 standard.
[6] "Codd first defined the relational model in 1969 and didn't introduce nulls until 1979" Date, C. J. SQL and Relational Theory (O'Reilly,
2009), Appendix A.2.
[7] The third of Codd's 12 rules states that "Null values ... [must be] supported in a fully relational DBMS for representing missing information
and inapplicable information in a systematic way, independent of data type." Codd, E. F. "Is Your DBMS Really Relational?" Computerworld,
October 14, 1985.
[8] Date, C. J. "What First Normal Form Really Means" (http:/ / www. dbdebunk. com/ page/ page/ 629796. htm) p. 128.
[9] Codd, E. F. The Relational Model for Database Management Version 2 (Addison-Wesley, 1990).
First normal form 14
[10] Codd, E. F. The Relational Model for Database Management Version 2 (Addison-Wesley, 1990), p. 6.
[11] Darwen, Hugh. "Relation-Valued Attributes; or, Will the Real First Normal Form Please Stand Up?", in C. J. Date and Hugh Darwen,
Relational Database Writings 1989-1991 (Addison-Wesley, 1992).
[12] "[F]or many years," writes Date, "I was as confused as anyone else. What's worse, I did my best (worst?) to spread that confusion through
my writings, seminars, and other presentations." Date, C. J. "What First Normal Form Really Means" (http:/ / www. dbdebunk. com/ page/
page/ 629796. htm) in Date on Database: Writings 2000-2006 (Springer-Verlag, 2006), p. 108
[13] Date, C. J. "What First Normal Form Really Means" (http:/ / www. dbdebunk. com/ page/ page/ 629796. htm) p. 112.
[14] Date, C. J. "What First Normal Form Really Means" (http:/ / www. dbdebunk. com/ page/ page/ 629796. htm) pp. 121–126.
Further reading
• Litt's Tips: Normalization (http://www.troubleshooters.com/littstip/ltnorm.html)
• Rules Of Data Normalization (http://www.datamodel.org/NormalizationRules.html)
• Date, C. J., & Lorentzos, N., & Darwen, H. (2002). Temporal Data & the Relational Model (http://www.
elsevier.com/wps/product/cws_home/680662) (1st ed.). Morgan Kaufmann. ISBN 1-55860-855-9.
• Date, C. J. (1999), An Introduction to Database Systems (http://www.aw-bc.com/catalog/academic/product/
0,1144,0321197844,00.html) (8th ed.). Addison-Wesley Longman. ISBN 0-321-19784-4.
• Kent, W. (1983) A Simple Guide to Five Normal Forms in Relational Database Theory (http://www.bkent.net/
Doc/simple5.htm), Communications of the ACM, vol. 26, pp. 120–125
• Date, C. J., & Darwen, H., & Pascal, F. Database Debunkings (http://www.dbdebunk.com)
External links
• Database Normalization Basics (http://databases.about.com/od/specificproducts/a/normalization.htm) by
Mike Chapple (About.com)
• An Introduction to Database Normalization (http://dev.mysql.com/tech-resources/articles/
intro-to-normalization.html) by Mike Hillyer.
• Normalization (http://www.utexas.edu/its/windows/database/datamodeling/rm/rm7.html) by ITS,
University of Texas.
• Rules of Data Normalization (http://www.datamodel.org/NormalizationRules.html) by Data Model.org
• A tutorial on the first 3 normal forms (http://phlonx.com/resources/nf3/) by Fred Coulson
• Description of the database normalization basics (http://support.microsoft.com/kb/283878) by Microsoft
Second normal form 15
Example
Consider a table describing employees' skills:
Employees' Skills
Employee Skill Current Work Location
Neither {Employee} nor {Skill} is a candidate key for the table. This is because a given Employee might need to
appear more than once (he might have multiple Skills), and a given Skill might need to appear more than once (it
might be possessed by multiple Employees). Only the composite key {Employee, Skill} qualifies as a candidate key
for the table.
The remaining attribute, Current Work Location, is dependent on only part of the candidate key, namely Employee.
Therefore the table is not in 2NF. Note the redundancy in the way Current Work Locations are represented: we are
told three times that Jones works at 114 Main Street, and twice that Ellis works at 73 Industrial Way. This
redundancy makes the table vulnerable to update anomalies: it is, for example, possible to update Jones' work
location on his "Typing" and "Shorthand" records and not update his "Whittling" record. The resulting data would
imply contradictory answers to the question "What is Jones' current work location?"
A 2NF alternative to this design would represent the same information in two tables: an "Employees" table with
candidate key {Employee}, and an "Employees' Skills" table with candidate key {Employee, Skill}:
Second normal form 16
Employees
Employee Current Work Location
Employees' Skills
Employee Skill
Jones Typing
Jones Shorthand
Jones Whittling
Ellis Alchemy
Ellis Flying
Tournament Winners
Tournament Year Winner Winner Date of Birth
Even though Winner and Winner Date of Birth are determined by the whole key {Tournament / Year} and not part
of it, particular Winner / Winner Date of Birth combinations are shown redundantly on multiple records. This leads
to an update anomaly: if updates are not carried out consistently, a particular winner could be shown as having two
different dates of birth.
The underlying problem is the transitive dependency to which the Winner Date of Birth attribute is subject. Winner
Date of Birth actually depends on Winner, which in turn depends on the key Tournament / Year.
This problem is addressed by third normal form (3NF).
Second normal form 17
Even if the designer has specified the primary key as {Model Full Name}, the table is not in 2NF. {Manufacturer,
Model} is also a candidate key, and Manufacturer Country is dependent on a proper subset of it: Manufacturer. To
make the design conform to 2NF, it is necessary to have two tables:
Forte Italy
Dent-o-Fresh USA
Kobayashi Japan
Hoch Germany
References
[1] Codd, E.F. "Further Normalization of the Data Base Relational Model." (Presented at Courant Computer Science Symposia Series 6, "Data
Base Systems," New York City, May 24th-25th, 1971.) IBM Research Report RJ909 (August 31st, 1971). Republished in Randall J. Rustin
(ed.), Data Base Systems: Courant Computer Science Symposia Series 6. Prentice-Hall, 1972.
Further reading
• Litt's Tips: Normalization (http://www.troubleshooters.com/littstip/ltnorm.html)
• Date, C. J., & Lorentzos, N., & Darwen, H. (2002). Temporal Data & the Relational Model (http://www.
elsevier.com/wps/product/cws_home/680662) (1st ed.). Morgan Kaufmann. ISBN 1-55860-855-9.
• C.J.Date (2004). Introduction to Database Systems (8th ed.). Boston: Addison-Wesley. ISBN 9780321197849.
• Kent, W. (1983) A Simple Guide to Five Normal Forms in Relational Database Theory (http://www.bkent.net/
Doc/simple5.htm), Communications of the ACM, vol. 26, pp. 120–125
• Date, C.J., & Darwen, H., & Pascal, F. Database Debunkings (http://www.dbdebunk.com)
External links
• Database Normalization Basics (http://databases.about.com/od/specificproducts/a/normalization.htm) by
Mike Chapple (About.com)
• An Introduction to Database Normalization (http://dev.mysql.com/tech-resources/articles/
intro-to-normalization.html) by Mike Hillyer.
• Normalization (http://www.utexas.edu/its/windows/database/datamodeling/rm/rm7.html) by ITS,
University of Texas.
• A tutorial on the first 3 normal forms (http://phlonx.com/resources/nf3/) by Fred Coulson
• Description of the database normalization basics (http://support.microsoft.com/kb/283878) by Microsoft
Third normal form 19
Tournament Winners
Tournament Year Winner Winner Date of Birth
Because each row in the table needs to tell us who won a particular Tournament in a particular Year, the composite
key {Tournament, Year} is a minimal set of attributes guaranteed to uniquely identify a row. That is, {Tournament,
Year} is a candidate key for the table.
The breach of 3NF occurs because the non-prime attribute Winner Date of Birth is transitively dependent on the
candidate key {Tournament, Year} via the non-prime attribute Winner. The fact that Winner Date of Birth is
functionally dependent on Winner makes the table vulnerable to logical inconsistencies, as there is nothing to stop
the same person from being shown with different dates of birth on different records.
In order to express the same facts without violating 3NF, it is necessary to split the table into two:
Tournament Winners
Tournament Year Winner
Update anomalies cannot occur in these tables, which are both in 3NF.
References
[1] Codd, E.F. "Further Normalization of the Data Base Relational Model." (Presented at Courant Computer Science Symposia Series 6, "Data
Base Systems," New York City, May 24th–25th, 1971.) IBM Research Report RJ909 (August 31st, 1971). Republished in Randall J. Rustin
(ed.), Data Base Systems: Courant Computer Science Symposia Series 6. Prentice-Hall, 1972.
[2] Codd, p. 43.
[3] Codd, p. 45–46.
[4] Zaniolo, Carlo. "A New Normal Form for the Design of Relational Database Schemata." ACM Transactions on Database Systems 7(3),
September 1982.
[5] Kent, William. "A Simple Guide to Five Normal Forms in Relational Database Theory" (http:/ / www. bkent. net/ Doc/ simple5. htm),
Communications of the ACM 26 (2), Feb. 1983, pp. 120–125.
[6] The author of a 1989 book on database management credits one of his students with coming up with the "so help me Codd" addendum. Diehr,
George. Database Management (Scott, Foresman, 1989), p. 331.
[7] Date, C.J. An Introduction to Database Systems (7th ed.) (Addison Wesley, 2000), p. 379.
[8] Zaniolo, p. 494.
Further reading
• Date, C. J. (1999), An Introduction to Database Systems (http://www.aw-bc.com/catalog/academic/product/
0,1144,0321197844,00.html) (8th ed.). Addison-Wesley Longman. ISBN 0-321-19784-4.
• Kent, W. (1983) A Simple Guide to Five Normal Forms in Relational Database Theory (http://www.bkent.net/
Doc/simple5.htm), Communications of the ACM, vol. 26, pp. 120–125
External links
• Litt's Tips: Normalization (http://www.troubleshooters.com/littstip/ltnorm.html)
• Database Normalization Basics (http://databases.about.com/od/specificproducts/a/normalization.htm) by
Mike Chapple (About.com)
• An Introduction to Database Normalization (http://dev.mysql.com/tech-resources/articles/
intro-to-normalization.html) by Mike Hillyer.
• Normalization (http://www.utexas.edu/its/windows/database/datamodeling/rm/rm7.html) by ITS,
University of Texas.
• A tutorial on the first 3 normal forms (http://phlonx.com/resources/nf3/) by Fred Coulson
• Description of the database normalization basics (http://support.microsoft.com/kb/283878) by Microsoft
Fourth normal form 22
Multivalued dependencies
If the column headings in a relational database table are divided into three disjoint groupings X, Y, and Z, then, in the
context of a particular row, we can refer to the data beneath each group of headings as x, y, and z respectively. A
multivalued dependency X →→ Y signifies that if we choose any x actually occurring in the table (call this choice
xc), and compile a list of all the xcyz combinations that occur in the table, we will find that xc is associated with the
same y entries regardless of z.
A trivial multivalued dependency X →→ Y is one in which Y consists of all columns belonging to X. That is, a
subset of attributes in a table has a trivial multivalued dependency on the remaining subset of attributes.
A functional dependency is a special case of multivalued dependency. In a functional dependency X → Y, every x
determines exactly one y, never more than one.
Example
Consider the following example:
Each row indicates that a given restaurant can deliver a given variety of pizza to a given area.
The table has no non-key attributes because its only key is {Restaurant, Pizza Variety, Delivery Area}. Therefore it
meets all normal forms up to BCNF. If we assume, however, that pizza varieties offered by a restaurant are not
affected by delivery area, then it does not meet 4NF. The problem is that the table features two non-trivial
multivalued dependencies on the {Restaurant} attribute (which is not a superkey). The dependencies are:
Fourth normal form 23
Varieties By Restaurant
Restaurant Pizza Variety
A1 Pizza Springfield
A1 Pizza Shelbyville
In contrast, if the pizza varieties offered by a restaurant sometimes did legitimately vary from one delivery area to
another, the original three-column table would satisfy 4NF.
Ronald Fagin demonstrated that it is always possible to achieve 4NF.[2] Rissanen's theorem is also applicable on
multivalued dependencies.
Fourth normal form 24
4NF in practice
A 1992 paper by Margaret S. Wu notes that the teaching of database normalization typically stops short of 4NF,
perhaps because of a belief that tables violating 4NF (but meeting all lower normal forms) are rarely encountered in
business applications. This belief may not be accurate, however. Wu reports that in a study of forty organizational
databases, over 20% contained one or more tables that violated 4NF while meeting all lower normal forms.[3]
References
[1] "A relation schema R* is in fourth normal form (4NF) if, whenever a nontrivial multivalued dependency X →→ Y holds for R*, then so does
the functional dependency X → A for every column name A of R*. Intuitively all dependencies are the result of keys." Fagin, Ronald
(September 1977). "Multivalued Dependencies and a New Normal Form for Relational Databases" (http:/ / www. almaden. ibm. com/ cs/
people/ fagin/ tods77. pdf). ACM Transactions on Database Systems 2 (1): 267. doi:10.1145/320557.320571. .
[2] Fagin, p. 268
[3] Wu, Margaret S. (March 1992). "The Practical Need for Fourth Normal Form". ACM SIGCSE Bulletin 24 (1): 19–23.
doi:10.1145/135250.134515.
Further reading
• Date, C. J. (1999), An Introduction to Database Systems (http://www.aw-bc.com/catalog/academic/product/
0,1144,0321197844,00.html) (8th ed.). Addison-Wesley Longman. ISBN 0-321-19784-4.
• Kent, W. (1983) A Simple Guide to Five Normal Forms in Relational Database Theory (http://www.bkent.net/
Doc/simple5.htm), Communications of the ACM, vol. 26, pp. 120–125
• Date, C.J., & Darwen, H., & Pascal, F. Database Debunkings (http://www.dbdebunk.com)
• Advanced Normalization (http://www.utexas.edu/its/windows/database/datamodeling/rm/rm8.html) by
ITS, University of Texas.
Example
Consider the following example:
Fifth normal form 25
The table's predicate is: Products of the type designated by Product Type, made by the brand designated by Brand,
are available from the travelling salesman designated by Travelling Salesman.
In the absence of any rules restricting the valid possible combinations of Travelling Salesman, Brand, and Product
Type, the three-attribute table above is necessary in order to model the situation correctly.
Suppose, however, that the following rule applies: A Travelling Salesman has certain Brands and certain Product
Types in his repertoire. If Brand B is in his repertoire, and Product Type P is in his repertoire, then (assuming
Brand B makes Product Type P), the Travelling Salesman must offer products of Product Type P made by Brand B.
In that case, it is possible to split the table into three:
Acme Breadbox
Robusto Breadbox
Robusto Telescope
Note how this setup helps to remove redundancy. Suppose that Jack Schneider starts selling Robusto's products. In
the previous setup we would have to add two new entries since Jack Schneider is able to sell two Product Types
covered by Robusto: Breadboxes and Vacuum Cleaners. With the new setup we need only add a single entry (in
Brands By Travelling Salesman).
Usage
Only in rare situations does a 4NF table not conform to 5NF. These are situations in which a complex real-world
constraint governing the valid combinations of attribute values in the 4NF table is not implicit in the structure of that
table. If such a table is not normalized to 5NF, the burden of maintaining the logical consistency of the data within
the table must be carried partly by the application responsible for insertions, deletions, and updates to it; and there is
a heightened risk that the data within the table will become inconsistent. In contrast, the 5NF design excludes the
possibility of such inconsistencies.
Fifth normal form 27
References
[1] Analysis of normal forms for anchor-tables (http:/ / www. anchormodeling. com/ tiedostot/ 6nf. pdf)
Further reading
• Kent, W. (1983) A Simple Guide to Five Normal Forms in Relational Database Theory (http://www.bkent.net/
Doc/simple5.htm), Communications of the ACM, vol. 26, pp. 120–125
• Date, C.J., & Darwen, H., & Pascal, F. Database Debunkings (http://www.dbdebunk.com)
• Advanced Normalization (http://www.utexas.edu/its/windows/database/datamodeling/rm/rm8.html)
DKNF
Some authors use the term sixth normal form differently, namely, as a synonym for Domain/key normal form
(DKNF). This usage predates Date et al.'s work.[6]
Usage
The sixth normal form is currently being used in some data warehouses where the benefits outweigh the
drawbacks,[7] for example using Anchor Modeling. Although using 6NF leads to an explosion of tables, modern
databases can prune the tables from select queries (using a process called 'table elimination') where they are not
required. Queries that only access several attributes will then be faster than similar queries in databases modelled in
the Third normal form.
Sixth normal form 28
References
[1] Date et al., 2003
[2] op. cit., chapter 9: Generalizing the relational operators
[3] op. cit., section 10.4, p. 176
[4] Zimanyi 2005
[5] Snodgrass, Richard T. TSQL2 Temporal Query Language (http:/ / www. cs. arizona. edu/ ~rts/ tsql2. html). Describes history, gives
references to standard and original book.
[6] See www.dbdebunk.com for a discussion on this topic (http:/ / www. dbdebunk. com/ page/ page/ 621935. htm)
[7] See the Anchor Modeling website (http:/ / www. anchormodeling. com) for a website that describes a data warehouse modelling method
based on the sixth normal form
Further reading
• Date, C.J. (2006). The relational database dictionary: a comprehensive glossary of relational terms and concepts,
with illustrative examples. O'Reilly Series Pocket references. O'Reilly Media, Inc.. p. 90. ISBN 9780596527983.
• Date, Chris J.; Hugh Darwen, Nikos A. Lorentzos (January 2003). Temporal Data and the Relational Model: A
Detailed Investigation into the Application of Interval and Relation Theory to the Problem of Temporal Database
Management. Oxford: Elsevier LTD. ISBN 1558608559.
• Zimanyi,, E. (June 2006). "Temporal Aggregates and Temporal Universal Quantification in Standard SQL" (http:/
/www.sigmod.org/publications/sigmod-record/0606/sigmod-record.june2006.pdf) (PDF). ACM SIGMOD
Record, volume 35, number 2, page 16. ACM.
• Date, Chris J.. "ON DK/NF NORMAL FORM" (http://www.dbdebunk.com/page/page/621935.htm).
• Each row in the table represents a court booking at a tennis club that has one hard court (Court 1) and one grass
court (Court 2)
• A booking is defined by its Court and the period for which the Court is reserved
• Additionally, each booking has a Rate Type associated with it. There are four distinct rate types:
• SAVER, for Court 1 bookings made by members
• STANDARD, for Court 1 bookings made by non-members
• PREMIUM-A, for Court 2 bookings made by members
• PREMIUM-B, for Court 2 bookings made by non-members
The table's candidate keys are:
• {Court, Start Time}
• {Court, End Time}
• {Rate Type, Start Time}
• {Rate Type, End Time}
Recall that 2NF prohibits partial functional dependencies of non-prime attributes on candidate keys, and that 3NF
prohibits transitive functional dependencies of non-prime attributes on candidate keys. In the Today's Court
Bookings table, there are no non-prime attributes: that is, all attributes belong to candidate keys. Therefore the table
adheres to both 2NF and 3NF.
The table does not adhere to BCNF. This is because of the dependency Rate Type → Court, in which the
determining attribute (Rate Type) is neither a candidate key nor a superset of a candidate key.
Dependency Rate Type → Court is respected as a Rate Type should only ever apply to a single Court.
The design can be amended so that it meets BCNF:
Rate Types
Rate Type Court Member Flag
SAVER 1 Yes
STANDARD 1 No
PREMIUM-A 2 Yes
PREMIUM-B 2 No
Boyce–Codd normal form 30
Today's Bookings
Rate Type Start Time End Time
The candidate keys for the Rate Types table are {Rate Type} and {Court, Member Flag}; the candidate keys for the
Today's Bookings table are {Rate Type, Start Time} and {Rate Type, End Time}. Both tables are in BCNF. Having
one Rate Type associated with two different Courts is now impossible, so the anomaly affecting the original table
has been eliminated.
Achievability of BCNF
In some cases, a non-BCNF table cannot be decomposed into tables that satisfy BCNF and preserve the
dependencies that held in the original table. Beeri and Bernstein showed in 1979 that, for example, a set of functional
dependencies {AB → C, C → B} cannot be represented by a BCNF schema.[5] Thus, unlike the first three normal
forms, BCNF is not always achievable.
Consider the following non-BCNF table whose functional dependencies follow the {AB → C, C → B} pattern:
Nearest Shops
Person Shop Type Nearest Shop
For each Person / Shop Type combination, the table tells us which shop of this type is geographically nearest to the
person's home. We assume for simplicity that a single shop cannot be of more than one type.
The candidate keys of the table are:
• {Person, Shop Type}
• {Person, Nearest Shop}
Because all three attributes are prime attributes (i.e. belong to candidate keys), the table is in 3NF. The table is not in
BCNF, however, as the Shop Type attribute is functionally dependent on a non-superkey: Nearest Shop.
The violation of BCNF means that the table is subject to anomalies. For example, Eagle Eye might have its Shop
Type changed to "Optometrist" on its "Fuller" record while retaining the Shop Type "Optician" on its "Davidson"
record. This would imply contradictory answers to the question: "What is Eagle Eye's Shop Type?" Holding each
shop's Shop Type only once would seem preferable, as doing so would prevent such anomalies from occurring:
Boyce–Codd normal form 31
Davidson Snippets
Fuller Doughy's
Shop
Shop Shop Type
Snippets Hairdresser
Doughy's Bakery
In this revised design , the "Shop Near Person" table has a candidate key of {Person, Shop}, and the "Shop" table has
a candidate key of {Shop}. Unfortunately, although this design adheres to BCNF, it is unacceptable on different
grounds: it allows us to record multiple shops of the same type against the same person. In other words, its candidate
keys do not guarantee that the functional dependency {Person, Shop Type} → {Shop} will be respected.
A design that eliminates all of these anomalies (but does not conform to BCNF) is possible.[6] This design consists of
the original "Nearest Shops" table supplemented by the "Shop" table described above.
Nearest Shops
Person Shop Type Nearest Shop
Shop
Shop Shop Type
Snippets Hairdresser
Doughy's Bakery
If a referential integrity constraint is defined to the effect that {Shop Type, Nearest Shop} from the first table must
refer to a {Shop Type, Shop} from the second table, then the data anomalies described previously are prevented.
References
[1] Codd, E. F. "Recent Investigations into Relational Data Base Systems." IBM Research Report RJ1385 (April 23rd, 1974). Republished in
Proc. 1974 Congress (Stockholm, Sweden, 1974). New York, N.Y.: North-Holland (1974).
[2] Heath, I. "Unacceptable File Operations in a Relational Database." Proc. 1971 ACM SIGFIDET Workshop on Data Description, Access, and
Control, San Diego, Calif. (November 11th–12th, 1971).
[3] Date, C.J. Database in Depth: Relational Theory for Practitioners. O'Reilly (2005), p. 142.
[4] Vincent, M.W. and B. Srinivasan. "A Note on Relation Schemes Which Are in 3NF But Not in BCNF." Information Processing Letters
48(6), 1993, pp. 281–83.
[5] Beeri, Catriel and Bernstein, Philip A. "Computational problems related to the design of normal form relational schemas." ACM Transactions
on Database Systems 4(1), March 1979, p. 50.
[6] Zaniolo, Carlo. "A New Normal Form for the Design of Relational Database Schemata." ACM Transactions on Database Systems 7(3),
September 1982, pp. 493.
Bibliography
• Date, C. J. (1999). An Introduction to Database Systems (8th ed.). Addison-Wesley Longman. ISBN
0-321-19784-4.
External links
• Rules Of Data Normalization (http://web.archive.org/web/20080805014412/http://www.datamodel.org/
NormalizationRules.html)
• Advanced Normalization (http://web.archive.org/web/20080423014733/http://www.utexas.edu/its/
archive/windows/database/datamodeling/rm/rm8.html) by ITS, University of Texas.
Article Sources and Contributors 33
First normal form Source: http://en.wikipedia.org/w/index.php?oldid=417434759 Contributors: Aeonx, Alansohn, Alxndr, Ambuj.Saxena, Bernard Ladenthin, BillyPreset, Boson, Brianga,
Brick Thrower, Burner0718, Closedmouth, Davidhorman, Dfass, Dreftymac, Eallik, Ebraminio, Eibcga, General Wesc, GermanX, GregorB, Gwernol, Hamidrizeh, Heathcliff, Isnow, Jacobolus,
Jason Quinn, Jgzheng, Kwetal, LarRan, Lordmwesh, M.r santosh kumar., Mfpinhal, Montchav, Mystagogue, Nabav, NawlinWiki, ReformatMe, Vegpuff, VictorAnyakin, VinceBowdren, 파핀,
139 anonymous edits
Second normal form Source: http://en.wikipedia.org/w/index.php?oldid=417759225 Contributors: Ak786, Apugazh, Benjamin.Cramphorn, Bernard Ladenthin, Boson, Btilm, Carlhoerberg,
Chrislk02, DARTH SIDIOUS 2, DVdm, ESkog, Ebraminio, GermanX, GregorB, Haffasoul, Ijliao, Jason Quinn, Javert16, JianzhouZhou, JimpsEd, Mordashov, Nabav, Sanchitideas,
Shreyasjoshis, SqlPac, Uncle Dick, VinceBowdren, 파핀, 59 anonymous edits
Third normal form Source: http://en.wikipedia.org/w/index.php?oldid=414029712 Contributors: Alvin-cs, Amalthea, Anabus, Arcturus, Azrich, Bernard Ladenthin, Blahma, Boson, Bxn1358,
CapitalR, Centrx, Codeculturist, DVdm, Dorfl, Ebraminio, Edward Z. Yang, Furrykef, GermanX, Gingerjoos, Ijliao, Jason Quinn, Jcsalterego, Jitse Niesen, Joseph Dwayne, Jswhitten,
Kitkatbeard, Leasabp, MsHyde, Nabav, Natural Cut, Ollie, Pinethicket, Semaphorite, Shreyasjoshis, Sleske, Someusername222, THEN WHO WAS PHONE?, Thingg, Toyota prius 2, Unara,
Vegpuff, VinceBowdren, Vlad2000Plus, Wikimiro, Willking1979, Wyadbb, 80 anonymous edits
Fourth normal form Source: http://en.wikipedia.org/w/index.php?oldid=412588144 Contributors: Akerans, Bernard Ladenthin, Britannica, Ebraminio, Fetchcomms, Geeoharee, GermanX,
Jason Quinn, Jmabel, Nabav, Patrick, Savh, Selfworm, VinceBowdren, Vjosullivan, WikHead, Winterst, 25 anonymous edits
Fifth normal form Source: http://en.wikipedia.org/w/index.php?oldid=414829532 Contributors: Andy M. Wang, Bernard Ladenthin, Brick Thrower, Cool Blue, Dugo, Ebraminio, FineganCJ,
GermanX, Jason Quinn, Libcub, MarcosWozniak, Nabav, Quarl, RonaldKunenborg, Siryendor, SqlPac, Stamfest, Systemparadox, VinceBowdren, 31 anonymous edits
Sixth normal form Source: http://en.wikipedia.org/w/index.php?oldid=414422499 Contributors: Boson, DePiep, Emurphy42, Esran, Favonian, GregorB, Jason Quinn, Nabav, Quarl,
Roenbaeck, RonaldKunenborg, Rp, 9 anonymous edits
Boyce–Codd normal form Source: http://en.wikipedia.org/w/index.php?oldid=411396753 Contributors: Anugrah atreya, Bernard Ladenthin, Briangregory2000, Chitransh saxena,
CiudadanoGlobal, Ebraminio, Eggman64, Fctseng, Fieldday-sunday, JForget, Jgzheng, JimpsEd, Leflyman, Mikeblas, Nabav, Nay Min Thu, NeerajKawathekar, Niddriesteve, Njsg, Obradovic
Goran, Oxymoron83, Quarl, Raztus, Simetrical, Smurfix, Solomon423, SqlPac, Su30, Torzsmokus, Uzume, VinceBowdren, Yachtsman1, ZenSaohu, 62 anonymous edits
Image Sources, Licenses and Contributors 34
License
Creative Commons Attribution-Share Alike 3.0 Unported
http:/ / creativecommons. org/ licenses/ by-sa/ 3. 0/