Você está na página 1de 11

COMP200

Database Normalization

Prepared by: Marvin De Leon

By now some of you are familiar with the basics of using databases in your cgi scripts. Many of your databases will be small, with one or two tables. But as you become braver, tackling bigger pro ects, you may start finding that the design of your tables is proving problematic. !he "#L you write starts to become unwieldy, and data anomalies start to creep in. $t is time to learn about database normali%ation, or the optimi%ation of tables. Example 1 Let&s begin by creating a sample set of data. $magine we are working on a system to keep track of employees working on certain pro ects. Project Number '()* '()* '()* '(1/ '()* Project Name Madagascar travel site Madagascar travel site Madagascar travel site 4nline estate agency Madagascar travel site Employee Number '' ') '/ '' '5 Employee Rate Name Category +incent ,adebe Pauline B 0ames 2harles 2 ,amora% +incent ,adebe Moni6ue B 7illiams ourly Rate ./( .1( .3( ./( .1(

!ables in relational databases, which would include most databases you&ll work with, are in a simple grid, or table format. 8ere, each pro ect has a set of employees. "o we couldn&t even enter the data into this kind of table. -nd if we tried to use null fields to cater for the fields that have no value, then we cannot use the pro ect number, or any other field, as a primary key 9a primary key is a field, or list of fields, that uni6uely identify one record:. !here is not much use in having a table if we can&t uni6uely identify each record in it.

Page 1 o! 11

Database Normalization

COMP200
Project Number 1023 1023 1023 1056 1056 Project Name Madagascar travel site Madagascar travel site Madagascar travel site 4nline estate agency 4nline estate agency Employee Number 11 12 16 11 17

Prepared by: Marvin De Leon

Employee Rate Name Category +incent ,adebe Pauline B 0ames 2harles 2 ,amora% +incent ,adebe Moni6ue B 7illiams

ourly Rate ./( .1( .3( ./( .1(

;otice that the pro ect number cannot be a primary key on its own. $t does not uni6uely identify a row of data. "o, our primary key must be a combination of pro ect number and employee number. !ogether these two fields uni6uely identify one row of data. 9!hink about it. <ou would never add the same employee more than once to a pro ect. $f for some reason this could occur, you&d need to add something else to the key to make it uni6ue:. $f the Ma"agascar is misspell into Ma"agascat in the *rd record. ;ow imagine trying to spot this error in a table with thousands of records= By using the structure above, the chances of the data being corrupted increases drastically. !he solution is simply to take out the duplication. 7hat we are doing formally is looking for partial dependencies, i.e. fields that are dependent on a part of a key, and not the entire key. "ince both pro ect number and employee number make up the key, we look for fields that are dependent only on pro ect number, or on employee number. 7e identify two fields. Pro ect name is dependent on pro ect number only 9employee>number is irrelevant in determining pro ect name:, and the same applies to employee name, hourly rate and rate category, which are dependent on employee number. "o, we take out these fields, as follows: Employee#Project $able Project Number Employee Number '()* '' '()* ') '()* '/ '(1/ '' '(1/ '5
Page 2 o! 11 Database Normalization

COMP200

Prepared by: Marvin De Leon

2learly we can&t simply take out the data and leave it out of our database. 7e take it out, and put it into a new table, consisting of the field that has the partial dependency, and the field it is dependent on. "o, we identified employee name, hourly rate and rate category as being dependent on employee number. !he new table will consist of employee number as a key, and employee name, rate category and hourly rate, as follows: Employee $able Employee Number Employee Name Rate Category '' +incent ,adebe ') Pauline 0ames B '/ 2harles ,amora% 2 '5 Moni6ue 7illiams B Project $able Project Number Project Name '()* Madagascar travel site '(1/ 4nline estate agency ;ote the reduction of duplication. !he te?t @Madagascar travel site@ is stored once only, not for each occurrence of an employee working on that pro ect. !he link is made through the key, the pro ect number. 4bviously there is no way to remove the duplication of this number without losing the relation altogether, but it is far more efficient storing a short number repeatedly, than a large piece of te?t. 7e&re still not perfect. !here is still room for anomalies in the data. Look carefully at the data below. Employee Number Employee Name Rate Category '' +incent ,adebe ') Pauline 0ames B '/ 2harles ,amora% 2 '5 Moni6ue 7illiams B ourly Rate ./( .1( .3( .3( ourly Rate ./( .1( .3( .1(

!he problem above is that Moni6ue 7illiams has been awarded an hourly rate of .3(, when she is actually category B, and should be earning .1( 9$n the case of this company, the rate category A hourly rate relationship is fi?ed. !his may not always be the case:. 4nce again we are storing data redundantly: the hourly rate A rate category relationship is being stored in its entirety for each employee.
Page % o! 11 Database Normalization

COMP200

Prepared by: Marvin De Leon

!he solution, as before, is to remove this e?cess data into its own table. Bormally, what we are doing is looking for transitive relationships or relationships where a nonAkey attribute is dependent on another nonAkey relationship. 8ourly rate, while being in one sense dependent on Cmployee number 9we probably identified this dependency earlier, when looking for partial dependencies: is actually dependent on ,ate category. "o, we remove it, and place it in a new table, with its actual key, as follows. Employee $able Employee Number '' ') '/ '5 Employee Name Rate Category +incent ,adebe Pauline 0ames B 2harles ,amora% 2 Moni6ue 7illiams B

Rate $able Rate Category ourly Rate ./( B .1( 2 .3( 7e&ve cut down once again. $t is now impossible to mistakenly assume rate category @B@ is associated with an hourly rate of anything but .1(. !hese relationships are only stored in once place A our new table, where it can be ensured they are accurate. Let&s run again through the e?ample we&ve ust done, this time without the data tables to guide us. -fter all, when you&re designing a system, you usually won&t have test data available at this stage. !he tables were there to show you the conse6uences of storing data in unnormali%ed tables, but without them we can focus on dependency issues, which is the key to database normali%ation. $n the beginning, the data structure we had was as follows: Project number Project name 'An Cmployee numbers 9'An indicates that there are many occurrences of this field A it is a repeating group: 'An Cmployee names 'An ,ate categories 'An 8ourly rates
Page & o! 11 Database Normalization

COMP200

Prepared by: Marvin De Leon

"o, to begin the normali%ation process, we start by moving from %ero normal form to 'st normal form. $'e "e!inition o! 1st normal !orm there are no repeating groups all the key attributes are defined all attributes are dependent on the primary key "o far, we have no keys, and there are repeating groups. "o we remove the repeating groups, and define the primary key, and are left with the following: Employee project table Pro ect number A primary key Pro ect name Cmployee number A primary key Cmployee name ,ate category 8ourly rate

!his table is in 'st normal form. ( table is in 2n" normal !orm i! $t&s in 'st normal form $t includes no partial dependencies 9where an attribute is dependent on only a part of a primary key:. "o, we go through all the fields. Pro ect name is only dependent on Pro ect number. Cmployee name, ,ate category and 8ourly rate are dependent only on Cmployee number. "o we remove them, and place these fields in a separate table, with the key being that part of the original key they are dependent on. "o, we are left with the following * tables: Employee#Project table Pro ect number A primary key Cmployee number A primary key Employee table Cmployee number A primary key Cmployee name

Page ) o! 11

Database Normalization

COMP200
,ate category 8ourly rate Project table Pro ect number A primary key Pro ect name

Prepared by: Marvin De Leon

!he table is now in )nd normal form. $s it in *rd normal formD $'e "e!inition o! %r" normal !orm $t&s in )nd normal form $t contains no transitive dependencies 9where a nonAkey attribute is dependent on another nonAkey attribute:. 7e can narrow our search down to the Cmployee table, which is the only one with more than one nonAkey attribute. Cmployee name is not dependent on either ,ate category or 8ourly rate, the same applies to ,ate category, but 8ourly rate is dependent on ,ate category. "o, as before, we remove it, placing it in its own table, with the attribute it was dependent on as key, as follows: Employee#Project table Pro ect number A primary key Cmployee number A primary key Employee table Cmployee number A primary key Cmployee name ,ate 2ategory Rate table ,ate category A primary key 8ourly rate Project table Pro ect number A primary key Pro ect name $able Notation* CMPL4<CC>P,40C2! 9pro ect>number, employee>number: CMPL4<CC 9employee>number, employee>name, rate>category: ,-!C 9rate>category, hourly>rate:
Page + o! 11 Database Normalization

COMP200
P,40C2! 9pro ect>number, pro ect>name:

Prepared by: Marvin De Leon

!hese tables are all now in *rd normal form, and ready to be implemented. !here are other normal forms A BoyceA2odd normal form, and 3th normal form, but these are very rarely used for business applications. $n most cases, tables in *rd normal form are already in these normal forms anyway. Example 2 Basically, the ,ules of ;ormali%ation are enforced by eliminating redundancy and inconsistent dependency in your table designs. Let&s say we want to create a table of user information, and we want to store each userEs ;ame, 2ompany, 2ompany -ddress, and some personal bookmarks, or urls. <ou might start by defining a table structure like this: ,ero -orm ./ER/ Name Company 0oe -B2 0ill F<G Company (""ress ' 7ork Lane ' 0ob "treet .R01 abc.com abc.com .R02 ?y%.com ?y%.com

7e would say this table is in Gero Borm because none of our rules of normali%ation have been applied yet. ;otice the url1 and url2 fields AA what do we do when our application needs to ask for a third urlD Do you want to keep adding columns to your table and hardAcoding that form input field into your P8P codeD 4bviously not, you would want to create a functional system that could grow with new development re6uirements. Let&s look at the rules for the Birst ;ormal Borm, and then apply them to this table. -irst Normal -orm '. Climinate repeating groups in individual tables. ). 2reate a separate table for each set of related data. *. $dentify each set of related data with a primary key. ;otice how we&re breaking that first rule by repeating the url1 and url2 fieldsD -nd what about ,ule !hree, primary 1eysD ,ule !hree basically means we want to put some form of uni6ue, autoAincrementing integer value into every one of our records. 4therwise, what would happen if we had two users named 0oe and we wanted to tell them apartD 7hen we apply the rules of the Birst ;ormal Borm we come up with the following table:
Page 2 o! 11 Database Normalization

COMP200

Prepared by: Marvin De Leon

./ER/ .ser 3D 1 1 2 2

Name 0oe 0oe 0ill 0ill

Company -B2 -B2 F<G F<G

Company (""ress ' 7ork Lane ' 7ork Lane ' 0ob "treet ' 0ob "treet

.R0 abc.com ?y%.com abc.com ?y%.com

;ow our table is said to be in the -irst Normal -orm /econ" Normal -orm '. 2reate separate tables for sets of values that apply to multiple records. ). ,elate these tables with a !oreign 1ey. ./ER/ .ser 3D Name 1 0oe 2 0ill .R0/ .R0 3D 1 2 % & Company Company (""ress -B2 ' 7ork Lane F<G ' 0ob "treet .ser 3D ' ' ) ) .R0 abc.com ?y%.com abc.com ?y%.com

4k, we&ve created separate tables and the primary key in the ./ER/ table, .ser 3D is now related to the foreign key in the .R0/ table, .ser 3D. 7e&re in much better shape. But what happens when we want to add another employee of company -B2D 4r )(( employeesD ;ow we&ve got company names and addresses duplicating themselves all over the place, a situation ust rife for introducing errors into our data. "o we&ll want to look at applying the !hird ;ormal Borm: $'ir" Normal -orm '. Climinate fields that do not depend on the key.

Page 4 o! 11

Database Normalization

COMP200

Prepared by: Marvin De Leon

4ur 2ompany ;ame and -ddress have nothing to do with the Hser $d, so they should have their own 2ompany $d: ./ER/ .ser 3D Name Company 3D ' 0oe ' ) 0ill ) COMP(N3E/ Company 3D Company Company (""ress 1 -B2 ' 7ork Lane 2 F<G ' 0ob "treet .R0/ .R0 3D ' ) * 3 .ser 3D ' ' ) ) .R0 abc.com ?y%.com abc.com ?y%.com

;ow we&ve got the primary key Company 3D in the Companies table related to the foreign key in the .sers table called Company 3D, and we can add )(( users while still only inserting the name @-B2@ once. 4ur .sers and .R0/ tables can grow as large as they want without unnecessary duplication or corruption of data. Most developers will say the $'ir" Normal -orm is far enough, and our data schema could easily handle the load of an entire enterprise, and in most cases they would be correct. But look at our .R0 fields A do you notice the duplication of dataD !his is perfectly acceptable if we are not preAdefining these fields. $f the 8!ML input page which our users are filling out to input this data allows a freeAform te?t input there&s nothing we can do about this, and it&s ust a coincidence that 0oe and 0ill both input the same bookmarks. But what if it&s a dropAdown menu which we know only allows those two .R0/, or maybe )( or even more. 7e can take our database schema to the ne?t level, the Bourth Borm, one which many developers overlook because it depends on a very specific type of relationship, the manyAtoAmany relationship, which we have not yet encountered in our application. Data Relations'ips

Page 5 o! 11

Database Normalization

COMP200

Prepared by: Marvin De Leon

Before we define the Bourth ;ormal Borm, let&s look at the three basic data relationships: oneAtoAone, oneAtoAmany, and manyAtoAmany. Look at the ./ER/ table in the Birst ;ormal Borm e?ample above. Bor a moment let&s imagine we put the H,L fields in a separate table, and every time we input one record into the ./ER/ table we would input one row into the .R0/ table. 7e would then have a oneAtoAone relationship: each row in the ./ER/ table would have e?actly one corresponding row in the .R0/ table. Bor the purposes of our application this would neither be useful nor normali%ed. ;ow look at the tables in the "econd ;ormal Borm e?ample. 4ur tables allow one user to have many H,L" associated with his user record. !his is a oneAtoAmany relationship, the most common type, and until we reached the dilemma presented in the !hird ;ormal Borm, the only kind we needed. !he manyAtoAmany relationship, however, is slightly more comple?. ;otice in our !hird ;ormal Borm e?ample we have one user related to many H,L". -s mentioned, we want to change that structure to allow many users to be related to many H,L", and thus we want a manyAtoAmany relationship. Let&s take a look at what that would do to our table structure before we discuss it: ./ER/ .ser 3D Name Company 3D 1 0oe ' 2 0ill ) COMP(N3E/ Company 3D Company Company (""ress 1 -B2 ' 7ork Lane 2 F<G ' 0ob "treet .R0/ .R0 3D .R0 1 abc.com 2 ?y%.com .R0#RE0($3ON/ Relation 3D .R0 3D 1 1 2 1 % 2 & 2 .ser 3D 1 2 1 2

Page 10 o! 11

Database Normalization

COMP200

Prepared by: Marvin De Leon

$n order to decrease the duplication of data 9and in the process bring ourselves to the Bourth Borm of ;ormali%ation:, we&ve created a table full of nothing but primary and foreign keys in .R0#RE0($3ON/. 7e&ve been able to remove the duplicate entries in the .R0/ table by creating the .R0#RE0($3ON/ table. 7e can now accurately e?press the relationship that both 0oe and 0ill are related to each one of, and both of, the H,L". $able Notation H"C, 9user>id, name, company>id: 24MP-;$C" 9company>id, company, company>address: H,L" 9url>id, url: H,L>,CL-!$4;" 9relation>id, url>id, user>id:

Page 11 o! 11

Database Normalization

Você também pode gostar