Você está na página 1de 28

SUPPORTING DATA MANAGEMENT INFRASTRUCTURE FOR THE HUMANITIES (SUDAMIH)

Relational Databases: A Beginners Guide

What is a relational database?


A relational database is a means of storing, manipulating, and analysing structured data. Databases store data in tables: A table consists of a series of records: o o o A record is a collection of information about a given entity. The entity might be a person, an object, a text, an image, a transaction, an event, or almost anything else you care to name. ach ro! of a database table contains one record.

The table columns are "no!n as fields: o ach field contains one piece of information about the entity being described.

A simple database table might loo" li"e this: ID # ) 0 Title Author %mith, &eth &ro!n, Adam 1ones, 1ane Date #'(( ),,# ),,2 Publisher &lac"!ell -./ -./

$ntroduction to thics A *istory of +ondon /olitical Theory: A /rimer

This table contains three records, and has five fields. A flat file database has just a single table. $n some cases, this may be all that3s needed. A relational database, ho!ever, can have multiple tables, !ith 4 as the name suggests 4 relationships bet!een them. This allo!s you to record information about multiple types of entity, and to sho! ho! these are connected to each other.

Features The use of keys

ach ro! of data in a table is identified by a uni5ue 6"ey6, called the primary "ey. The primary "ey is often an automatically incrementing number li"e #, ), 0 7.... etc. .sing the %tructured 8uery +anguage 9%8+:, data that from different tables that are lin"ed by "eys can be selected at once.

Avoiding data redundancy

$n a database design that adheres to the rules of the relational model, each data item, a username for example, is stored only once, that is, in one location. This avoids having to maintain

Computing Services | Relational Databases: A Beginners Guide

This material as produced as part o! the "#SC$!unded Sudamih %ro&ect 'http:((sudamih)oucs)o*)ac)u+(,- and is made available under a Creative Commons Attribution .on$Commercial Share Ali+e /icense: http:((creativecommons)org(licenses(b0$nc$sa(1)2(u+(

the same data in multiple locations. The duplication of data is called data redundancy and this should be avoided in a good database design.

onstraining the in!ut

.sing a relational database you can specify !hat sort of data a database column is allo!ed to contain. ;ou can create fields that contain numbers, decimal numbers, small texts, large texts, dates, etc.

&esides data types, database systems allo! you to apply further constraints li"e length constraints and li"e enforcing the uni"ueness of a certain field. The uni5ue constraint is often used for fields that contain usernames and email addresses. These constraints give you control over the integrity of your data. They prevent situations li"e entering an address 9text: in a field !here you !ere expecting a number entering a <ip code of one hundred characters ending up !ith t!o users !ith the same username ending up !ith t!o users !ith the same email address entering a !eight 9number: in a birthday 9date: field

#aintaining data integrity

&y setting field properties, by lin"ing tables and by setting constraints you can increase the reliability of your data.

Why use a relational database?


When !or"ing !ith structured data, there are several reasons to use a relational database: %ome datasets are too complex to be ade5uately represented using a flat file. Database management systems include tools !hich can help ma"e entering data easier and more accurate. Databases allo! you to sort, filter, and manipulate your data in sophisticated !ays. Databases allo! you to present your data 9or a subset of it: in a !ide range of !ays.

o$!arison of Traditional File%Based A!!roach and Database A!!roach At the beginning, you should understand the rationale of replacing the traditional file=based system !ith the database system.

Computing Services | .ormali3ation

This material as produced as part o! the "#SC$!unded Sudamih- and is made available under a Creative Commons Attribution .on$Commercial

File%based &yste$ >ile=based systems !ere an early attempt to computeri<e the manual filing system. >ile=based system is a collection of application programs that perform services for the end=users. ach program defines and manages its data. *o!ever, five types of problem are occurred in using the file=based approach: '( &e!aration and isolation of data When data is isolated in separate files, it is more difficult for us to access data that should be available. The application programmer is re5uired to synchroni<e the processing of t!o or more files to ensure the correct data is extracted. )( Du!lication of data When employing the decentrali<ed file=based approach, the uncontrolled duplication of data is occurred. .nontrolled duplication of data is undesirable because: i. Duplication is !asteful

ii. Duplication can lead to loss of data integrity *( Data de!endence .sing file=based system, the physical structure and storage of the data files and records are defined in the application program code. This characteristic is "no!n as program=data dependence. ?a"ing changes to an existing structure are rather difficult and !ill lead to a modification of program. %uch maintenance activities are time=consuming and subject to error. +( Inco$!atible file for$ats The structures of the file are dependent on the application programming language. *o!ever file structure provided in one programming language such as direct file, indexed=se5uential file !hich is available in @-&-+ programming, may be different from the structure generated by other programming language such as @. The direct incompatibility ma"es them difficult to process jointly. Fi,ed "ueries - !roliferation of a!!lication !rogra$s >ile=based systems are very dependent upon the application programmer. Any re5uired 5ueries or reports have to be !ritten by the application programmer. Aormally, a fixed format 5uery or report can only be entertained and no facility for ad=hoc 5ueries if offered.

Database A!!roach: $n order to overcome the limitations of the file=based approach, the concept of database and the Database ?anagement %ystem 9D?%: !as emerged in B,s. Advantages A number of advantages of applying database approach in application system are obtained including: '( ontrol of data redundancy The database approach attempts to eliminate the redundancy by integrating the file. Although the database approach does not eliminate redundancy entirely, it controls the amount of redundancy inherent in the database. )( Data consistency

Computing Services | .ormali3ation

This material as produced as part o! the "#SC$!unded Sudamih- and is made available under a Creative Commons Attribution .on$Commercial

&y eliminating or controlling redundancy, the database approach reduces the ris" of inconsistencies occurring. $t ensures all copies of the data are "ept consistent. *( #ore infor$ation fro$ the sa$e a$ount of data With the integration of the operated data in the database approach, it may be possible to derive additional information for the same data. +( &haring of data Database belongs to the entire organi<ation and can be shared by all authori<ed users. .( I$!roved data integrity Database integrity provides the validity and consistency of stored data. $ntegrity is usually expressed in terms of constraints, !hich are consistency rules that the database is not permitted to violate. /( I$!roved security Database approach provides a protection of the data from the unauthori<ed users. $t may ta"e the term of user names and pass!ords to identify user type and their access right in the operation including retrieval, insertion, updating and deletion. 0( 1nforce$ent of standards The integration of the database enforces the necessary standards including data formats, naming conventions, documentation standards, update procedures and access rules. 2( 1cono$y of scale @ost savings can be obtained by combining all organi<ationCs operational data into one database !ith applications to !or" on one source of data. 3( Balance of conflicting re"uire$ents &y having a structural design in the database, the conflicts bet!een users or departments can be resolved. Decisions !ill be based on the base use of resources for the organi<ation as a !hole rather that for an individual entity. '4( I$!roved data accessibility and res!onsiveness &y having an integration in the database approach, data accessing can be crossed departmental boundaries. This feature provides more functionality and better services to the users. ''( Increased !roductivity The database approach provides all the lo!=level file=handling routines. The provision of these functions allo!s the programmer to concentrate more on the specific functionality re5uired by the users. The fourth=generation environment provided by the database can simplify the database application development. ')( I$!roved $aintenance Database approach provides a data independence. As a change of data structure in the database !ill be affect the application program, it simplifies database application maintenance. '*( Increased concurrency Database can manage concurrent data access effectively. $t ensures no interference bet!een users that !ould not result any loss of information nor loss of integrity. '+( I$!roved backing and recovery services ?odern database management system provides facilities to minimi<e the amount of processing that can be lost follo!ing a failure by using the transaction approach. Disadvantages

Computing Services | .ormali3ation

This material as produced as part o! the "#SC$!unded Sudamih- and is made available under a Creative Commons Attribution .on$Commercial

$n split of a large number of advantages can be found in the database approach, it is not !ithout any challenge. The follo!ing disadvantages can be found including: '( o$!le,ity Database management system is an extremely complex piece of soft!are. All parties must be familiar !ith its functionality and ta"e full advantage of it. Therefore, training for the administrators, designers and users is re5uired. )( &i5e The database management system consumes a substantial amount of main memory as !ell as a large number amount of dis" space in order to ma"e it run efficiently. *( ost of DB#& A multi=user database management system may be very expensive. there is a high recurrent annual maintenance cost on the soft!are. ven after the installation,

+( ost of conversion When moving from a file=base system to a database system, the company is re5uired to have additional expenses on hard!are ac5uisition and training cost. .( Perfor$ance As the database approach is to cater for many applications rather than exclusively for a particular one, some applications may not run as fast as before. /( 6igher i$!act of a failure The database approach increases the vulnerability of the system due to the centrali<ation. As all users and applications reply on the database availability, the failure of any component can bring operations to a halt and affect the services to the customer seriously.

DB#& Architecture: #. 1,ternal vie7: This is a highest level of abstraction as seen by user. This level of abstraction describes only the part of entire database. $t is based on the conceptual model, is the end user vie! of data environment. ach external vie! described by means of a schema called an external schema or subschema. ). once!tual level: At this level of database abstraction all the database entities and the relationships among them are included. -ne conceptual vie! represents the entire database. the conceptual schema defines ths conceptual vie!. 0. Internal8!hysical9 level : This lo!est level of abstraction. it closest to physical storage device. $t describes ho! data are actually stored on the storage medium. The internal schema, !hich contains the definition of the stored record, the method representing the data fields, expresses the internal vie! and the access aids used.

Computing Services | .ormali3ation

This material as produced as part o! the "#SC$!unded Sudamih- and is made available under a Creative Commons Attribution .on$Commercial

Data Inde!endence: #. The ability to modify a scheme definition in one level !ithout affecting a scheme definition in a higher level is called data inde!endence. ). There are t!o "inds: o Physical data inde!endence o The ability to modify the physical scheme !ithout causing application programs to be re!ritten ?odifications at this level are usually to improve performance

:ogical data inde!endence The ability to modify the conceptual scheme !ithout causing application programs to be re!ritten .sually done !hen logical structure of database is altered

0. +ogical data independence is harder to achieve as the application programs are usually heavily dependent on the logical structure of the data. An analogy is made to abstract data types in programming languages. Ty!es of Database ;sers: .sers are differentiated by the !ay they expect to interact !ith the system: #. A!!lication !rogra$$ers = interact !ith system through D?+ calls.

2. Sophisticated users - form requests in a database query language. 3. &yste$s develo!ers - These are programmers that write the application code to meet specification. They are typically well versed in the use of the DBM tools and in the function that they are programming. eldom are system developers the same people

Computing Services | .ormali3ation

This material as produced as part o! the "#SC$!unded Sudamih- and is made available under a Creative Commons Attribution .on$Commercial

non= technical and do not use most of the D&?% components 9at least they donCt "no! they are using them:. To the end user, the D&?% should loo" no !orse than the file= based system that it replaced 9or you havenCt done you job correctly Database Ad$inistrator Roles and Res!onsibilities :

that use the system. This partly due to the speciali!ed nature of the wor" and partly due to a need to provide security by separating the users and the designers of the system. #. 1nd%users - the end users are those that have a need to access the data. end users are

A Database Administrator, Database Analyst or Database Developer is the person responsible for managing the information !ithin an organi<ation. As most companies continue to experience inevitable gro!th of their databases, these positions are probably the most solid !ithin the $T industry. $n most cases, it is not an area that is targeted for layoffs or do!nsi<ing. -n the do!nside, ho!ever, most database departments are often understaffed, re5uiring adminstrators to perform a multitude of tas"s. Depending on the company and the department, this role can either be highly speciali<ed or incredibly diverse. The primary role of the Database Administrator is to adminster, develop, maintain and implement the policies and procedures necessary to ensure the security and integrity of the corporate database. %ub roles !ithin the Database Administrator classification may include security, architecture, !arehousing andDor business analysis. -ther primary roles !ill include: $mplementation of data models Database design Database accessibility /erformance issues @apacity issues Data replication Table ?aintainence

1le$ents of Database &yste$: Database schema %chema objects $ndexes Tables >ields and columns Eecords and ro!s Feys Eelationships Data types

Database #anage$ent &yste$ 8DB#&9: The Data &ase ?anagement %ystem 9D&?%: is the generic name for the collection of sub= systems used to create, maintain, and provide controlled access to data. They range in

Computing Services | .ormali3ation

This material as produced as part o! the "#SC$!unded Sudamih- and is made available under a Creative Commons Attribution .on$Commercial

complexity from small /@=D&?% systems 9Access, d&ase $G, Ebase...: costing a fe! hundred dollars to large mainframe products 9-EA@+ , $&? D&): costing several hundred thousand dollars. Database management systems typically offer a number of features designed to reduce errors and improve consistency. >or example, itCs possible to restrict the type of information that can be entered in a given field 4 to specify that it has to be a date in a particular format, for example. This is particularly helpful if multiple people are going to be using the database. When there are a limited number of options, you can create a drop=do!n list from !hich you can select the appropriate option: this both saves time and reduces the ris" of typos !hich might ma"e it harder to locate information !hen you search for it later. The central component of the D&?%. A module that provides access to the physical structure and the data. Also coordinates all of the other functions done by the D&?%. The central control module that receives re5uests from users , determines !here the data is and ho! to get it, and issues physical $D- re5uests to the computer operating system. Also provides some misc. services such as memory and buffer management, index maintenance, and dis" management. DB#& 1ngine '( Interface subsyste$ /rovides facilities for users and applications to access the various components of the D&?%. ?ost D&?% products provide a variety of languages and interfaces to satisfy the different types of users and the different sub=systems that must be accessed. The follo!ing are common interfaces that are provided 9some are missing in the smallerDcheaper products:. Aote that some D&?% combine the functions of several interfaces into a single sub=system 9e.g., %8+ is DD+, D?+, and D@+ combined:. a9 DD: .sed to define and maintain the database structures 9e.g., records, tables, files, vie!s, indexing, ...: %pecifically DD+ defines: all data items 9type, specification...: all record types 9tables in relational model: the relationships among record types 9not in relational model: user vie!s 9or subschemas:

The DD+ is used to define the conceptual database and turn it into a physical database b9 D#: .sed to manipulate and 5uery the data itself. Typically used by a host program or as ad hoc commands in interactive mode. >or example, you could select a subset of data based upon some 5uery criteria using the D?+. $n some database systems the D?+ also provides the commands to 6navigate6 the data structure. c9 D : .sed to grant and revo"e access rights to individuals 9and groups:. A 6right6 is the privilege to perform a data manipulation operation. >or example, the D&A can grant a

Computing Services | .ormali3ation

This material as produced as part o! the "#SC$!unded Sudamih- and is made available under a Creative Commons Attribution .on$Commercial

cler" the right to access and delete $AG AT-E; records 9but not to update them:. Another related concept is a database HroleI. A role is a predefined set of access rights and privileges that can be assigned to a user. When the definition of the role changes, all users assigned to that role get the updated access rights. Again, not all D&?%s call the DD+, D?+, and D@+ separate interfaces. *o!ever, all 0 functions must be present in the D&?%. d9 <B1 -ptional. %ome modern D&?%s provide a graphical representation of the data structure 9a table usually: that allo!s you to select !hich data items to 5uery on and the conditions for selection. Aormally this feature is found in 8uery &y xample sub=systems of relational D&?%s. The graphic interface ma"es it easier for non= technical users to ma"e complex 5ueries. Also handy because it is a common interface that can cross diverse D&?% systems. e9 For$s interface -ptional. A screen=oriented form is presented to the user. They respond by filling in the blan"s. The result is that the D&?% uses the form that you design to input and output data. f9 6i%level interface /rogrammers need to be able to access data via a high=level language. This could be old= style 90rd generation: languages li"e @-&-+, >-ETEAA, /ascal. -r it could be ne!er 7th generation languages li"e Toolbox, ?apper, asytrieve, ... . ?ost big mainframe products 9$ngress, -racle, $nformix...: include a 7J+ as part of the D&?%. %tudies have sho!n that application done using 7J+s result in a system that is up #, to ), times faster than using traditional 0rd generation languages. 9Aote, code does not run faster, it is just debugged sooner.: The interface usually is achieved by adding a fe! extra commands 9verbs: to the standard language and having a pre=processor translate these verbs into D&?% calls 9using the @A++ format of the specific operating system:. This method !or"s !ell because the user does not need to "no! the complexities of operating system calls and the resulting code is some!hat portable. The interfaces to 6old6 languages is needed because there is a lot of code and programmer experience out there that cannot be ignored. )( Dictionary The !ord 6repository6 is used to relate bac" to the $nformation Eesource ?anagement concept mentioned earlier. The data in the database should be treated as a corporate resource. This resource must be managed. The repository is more than a 6data dictionary6 or 6catalog6. $t is the central place that you store: system documentation data structure project life cycle information conceptual model information etcK

Computing Services | .ormali3ation

This material as produced as part o! the "#SC$!unded Sudamih- and is made available under a Creative Commons Attribution .on$Commercial

@A% tools use it extensively. A D&?% system component must be present to manage and control access to the repository. To a certain extent, the D@+ does part and @A% tools do part. $t provides facilities for recording, storing, and processing descriptions of an organi<ationCs data and data processing resources. /retty ne! idea 9still being defined by industry:. *( Data integrity subsyste$ There are 7 important functions: = intrarecord integrity = enforce constraints on data item values and types !ithin each record in the database % referential integrity = enforce the validity of references bet!een records in the database = user%define integrity = &usiness rules 9arbitrary: that must be upheld 9e.g., employee canCt ma"e more than boss:. % concurrency control = Assuring the validity of data !hen multiple users access simultaneously 9get into more later:. +( &ecurity $gt( subsyste$ A subsystem that provides facilities to protect and control access to the database. The ) most important aspects of security are: securing data from unauthori<ed access protect it against disasters

The first is done through pass!ords, vie!s, and protection levels. ncryption is also !idely used. The second aspect uses bac"ups, logs, before and after images, disaster recovery plans, etc. .( Backu!-Recovery A subsystem that logs transactions and database changes and periodically ma"es bac"up copies of the database. This is done so you donCt lose data in the event of a problem. There are different levels of problems that bac"upDrecovery prepares for. They range from redoing a transaction that !as rolled bac" due to a concurrency conflict 9minor: to totally restoring the database after the computer center is destroyed 9major:. /( A!!lication develo!$ent -ptional. /rovides facilities so that end users andDor programmers can develop complete database applications. %ome use elaborate @A% tools as !ell as screen and report generators to create full applications !ith minimal !or". -thers help !rite code from s"etchy specifications. $n any event, this is an aid to non=technical users and to beef up programmer productivity. 0( Perfor$ance $anage$ent The D&A needs some !ay to determine if the D&?% is performing !ell. These tools 9often called 6monitoring utilities6: give the D&A information needed to tune D&?% performance. xample: A monitor utility can find data items that are accessed fre5uently enough to need

Computing Services | .ormali3ation

12

This material as produced as part o! the "#SC$!unded Sudamih- and is made available under a Creative Commons Attribution .on$Commercial

an index. They can also be used to determine if a data item needs to be on a faster dis" drive or possibly replicated. Designing Databases A database is usually a fundamental component of the information system, especially in business oriented systems. Thus database design is part of system development. The follo!ing picture sho!s ho! database design is involved in the system development lifecycle. The phases in the middle of the picture 9Database Design, Database $mplementation: are the phases that you concentrate on in the Database Design course. The other phases are briefly described. They are part of the contents of the %ystems Analysis and Design courses, for example. There are various methods of ho! the different phases of information system design, analysis and implementation can be done. *ere the main tas"s or goals are described but no method is introduced.

Database Planning The database planning includes the activities that allo! the stages of the database system development lifecycle to be reali<ed as efficiently and effectively as possible. This phase must be integrated !ith the overall $nformation %ystem strategy of the organi<ation. The very first step in database planning is to define the mission statement and objectives for the database system. That is the definition of: = the major aims of the database system = the purpose of the database system = the supported tas"s of the database system = the resources of the database system &yste$s Definition

Computing Services | .ormali3ation

11

This material as produced as part o! the "#SC$!unded Sudamih- and is made available under a Creative Commons Attribution .on$Commercial

$n the systems definition phase, the scope and boundaries of the database application are described. This description includes: = lin"s !ith the other information systems of the organi<ation = !hat the planned system is going to do no! and in the future = !ho the users are no! and in the future. The major user vie!s are also described. i.e. !hat is re5uired of a database system from the perspectives of particular job roles or enterprise application areas. Re"uire$ents ollection and Analysis

During the re5uirements collection and analysis phase, the collection and analysis of the information about the part of the enterprise to be served by the database are completed. The results may include eg: = the description of the data used or generated = the details ho! the data is to be used or generated = any additional re5uirements for the ne! database system Database Design The database design phase is divided into three steps: = conceptual database design = logical database design = physical database design

>ig: ?odel of database development $n the conceptual database design phase, the model of the data to be used independent of all physical considerations is to be constructed. The model is based on the re5uirements specification of the system. $n the logical database design phase, the model of the data to be used is based on a specific data model, but independent of a particular database management system is constructed. This is based on the target data model for the database e.g. relational data model. $n the physical database design phase, the description of the implementation of the database on secondary storage is created. The base relations, indexes, integrity constraints, security, etc. are defined using the %8+ language. Database #anage$ent &yste$ &election

Computing Services | .ormali3ation

11

This material as produced as part o! the "#SC$!unded Sudamih- and is made available under a Creative Commons Attribution .on$Commercial

This in an optional phase. When there is a need for a ne! database management system 9D&?%:, this phase is done. D&?% means a database system li"e Access, %8+ %erver, ?y%8+, -racle. $n this phase the criteria for the ne! D&?% are defined. Then several products are evaluated according to the criteria. >inally the recommendation for the selection is decided. A!!lication Design $n the application design phase, the design of the user interface and the application programs that use and process the database are defined and designed. Proty!ing The purpose of a prototype is to allo! the users to use the prototype to identify the features of the system using the computer. There are hori<ontal and vertical prototypes. A hori<ontal prototype has many features 9e.g. user interfaces: but they are not !or"ing. A vertical prototype has very fe! features but they are !or"ing. %ee the follo!ing picture.

I$!le$entation During the implementation phase, the physical reali<ation of the database and application designs are to be done. This is the programming phase of the systems development. Data onversion and :oading

This phase is needed !hen a ne! database is replacing an old system. During this phase the existing data !ill be transferred into the ne! database. Testing &efore the ne! system is going to live, it should be thoroughly tested. The goal of testing is to find errorsL The goal is not to prove the soft!are is !or"ing !ell. =!erational #aintenance The operational maintenance is the process of monitoring and maintaining the database system. ?onitoring means that the performance of the system is observed. $f the performance of the system falls belo! an acceptable level, tuning or reorgani<ation of the database may be re5uired. ?aintaining and upgrading the database system means that, !hen ne! re5uirements arise, the ne! development lifecycle !ill be done.

Computing Services | .ormali3ation

14

This material as produced as part o! the "#SC$!unded Sudamih- and is made available under a Creative Commons Attribution .on$Commercial

9&ource: @onnolly, &egg. ),,2. Database %ystems. A /ractical Approach to Design, $mplementation, and ?anagement. Addison Wesley. @hapter '. Database /lanning, Design and Administration.: 1ntity Relation Diagra$ 81RDs9 An entity=relationship diagram is a data modeling techni5ue that creates a graphical representation of the entities, and the relationships bet!een entities, !ithin an information system The three main components of an ED are: The entity is a person, object, place or event for !hich data is collected. >or example, if you consider the information system for a business, entities !ould include not only customers, but the customerCs address, and orders as !ell. The entity is represented by a rectangle and labelled !ith a singular noun. The relationship is the interaction bet!een the entities. $n the example above, the customer places an order, so the !ord 6places6 defines the relationship bet!een that instance of a customer and the order or orders that they place. A relationship may be represented by a diamond shape, or more simply, by the line connecting the entities. $n either case, verbs are used to label the relationships. The relationshipDcardinality defines the relationship bet!een the entities in terms of numbers. An entity may be optional: for example, a sales rep could have no customers or could have one or many customersM or mandatory: for example, there must be at least one product listed in an order. There are several different types of cardinality notationM cro!Cs foot notation, used here, is a common one. $n cro!Cs foot notation, a single bar indicates one, a double bar indicates one and only one 9for example, a single instance of a product can only be stored in one !arehouse:, a circle indicates <ero, and a cro!Cs foot indicates many. The three main cardinal relationships are: one=to=one, expressed as #:#M one=to=many, expressed as #:?M and many=to=many, expressed as ?:A.

<ntit0

Relationship

Attributes

;RD<R

!or places

C=ST;><R ?.ame o%hone

The steps involved in creating an ED are: #: $dentify the &usiness Eules ): ?a"e a list of the ntities. 0: +ist A++ the simple relationships bet!een cardinalities, !ea" and optional relationships.

ntities and define the connectivities,

Computing Services | .ormali3ation

15

This material as produced as part o! the "#SC$!unded Sudamih- and is made available under a Creative Commons Attribution .on$Commercial

7: ?erge the simple possible.

EDs into one

ED, placing the most occurring

AT$T; in the middle if

&te! ' +et us ta"e a very simple example and !e try to reach a fully organi<ed database from it. +et us loo" at the follo!ing simple statement: A boy eats an ice cream. This is a description of a real !ord activity, and !e may consider the above statement as a !ritten document 9very short, of course:. Identify the Business Rules >rom the paragraph above $ can get the follo!ing &usiness Eules 9&E:: a: A department employs many employees, but each employee is employed by one department. b: %ome employees, "no!n as 6rovers,6 are not assigned to any department. c: A division operates many departments, but each department is operated by one division d: An employee may be assigned to many projects and a project may have many employees assigned to it. e: A project must have at least one employee assigned to it. f: -ne of the employees manages each department. g: -ne of the employees runs each division.

&te! ) Ao! !e have to prepare the ED. &efore doing that !e have to process the statement a little. We can see that the sentence contains a subject 9boy:, an object 9ice cream: and a verb 9eats: that defines the relationship bet!een the subject and the object. @onsider the nouns as entities 9boy and ice cream: and the verb 9eats: as a relationship. To plot them in the diagram, put the nouns !ithin rectangles and the relationship !ithin a diamond. Also, sho! the relationship !ith a directed arro!, starting from the subject entity 9 boy: to!ards the object entity 9ice cream:.

Well, fine. .p to this point the ED sho!s ho! boy and ice cream are related. Ao!, every boy must have a name, address, phone number etc. and every ice cream has a manufacturer, flavor, price etc. Without these the diagram is not complete. These items !hich !e mentioned here are "no!n as attributes, and they must be incorporated in the ED as connected ovals.

Computing Services | .ormali3ation

16

This material as produced as part o! the "#SC$!unded Sudamih- and is made available under a Creative Commons Attribution .on$Commercial

&ut can only entities have attributes? @ertainly not. $f !e !ant then the relationship must have their attributes too. These attribute do not inform anything more either about the boy or the ice cream, but they provide additional information about the relationships bet!een the boy and the ice cream.

&te! * We are almost complete no!. $f you loo" carefully, !e no! have defined structures for at least three tables li"e the follo!ing: Boy Aame Address Ice ?anufacturer rea$ /hone

>lavor /rice

1ats Date

Time

*o!ever, this is still not a !or"ing database, because by definition, database should be Hcollection of related tables.I To ma"e them connected, the tables must have some common attributes. $f !e chose the attribute Aame of the Boy table to play the role of the common attribute, then the revised structure of the above tables become something li"e the follo!ing. Boy Aame Address Ice rea$ /hone

Computing Services | .ormali3ation

17

This material as produced as part o! the "#SC$!unded Sudamih- and is made available under a Creative Commons Attribution .on$Commercial

?anufacturer

>lavor 1ats

/rice

Aame

Date

Time

Aame

This is as complete as it can be. We no! have information about the boy, about the ice cream he has eaten and about the date and time !hen the eating !as done. ardinality of Relationshi! While creating relationship bet!een t!o entities, !e may often need to face the cardinality problem. This simply means that ho! many entities of the first set are related to ho! many entities of the second set. @ardinality can be of the follo!ing three types. =ne%to%=ne -nly one entity of the first set is related to only one entity of the second set. .g. A teacher teaches a student. -nly one teacher is teaching only one student. This can be expressed in the follo!ing diagram as:

=ne%to%#any -nly one entity of the first set is related to multiple entities of the second set. .g. A teacher teaches students. -nly one teacher is teaching many students. This can be expressed in the follo!ing diagram as:

#any%to%=ne ?ultiple entities of the first set are related to multiple entities of the second set. .g. Teachers teach a student. ?any teachers are teaching only one student. This can be expressed in the follo!ing diagram as:

Computing Services | .ormali3ation

18

This material as produced as part o! the "#SC$!unded Sudamih- and is made available under a Creative Commons Attribution .on$Commercial

#any%to%#any ?ultiple entities of the first set is related to multiple entities of the second set. .g. Teachers teach students. $n any school or college many teachers are teaching many students. This can be considered as a t!o !ay one=to=many relationship. This can be expressed in the follo!ing diagram as:

$n this discussion !e have not included the attributes, but you can understand that they can be used !ithout any problem if !e !ant to. The once!t of >eys

A "ey is an attribute of a table !hich helps to identify a ro!. There can be many different types of "eys !hich are explained here. &u!er >ey or andidate >ey: $t is such an attribute of a table that can uni5uely identify a ro! in a table. Jenerally they contain uni5ue values and can never contain A.++ values. There can be more than one super "ey or candidate "ey in a table e.g. !ithin a %T.D AT table Eoll and ?obile Ao. can both serve to uni5uely identify a student. Pri$ary >ey: $t is one of the candidate "eys that are chosen to be the identifying "ey for the entire table. .g. although there are t!o candidate "eys in the %T.D AT table, the college !ould obviously use Eoll as the primary "ey of the table. Alternate >ey: This is the candidate "ey !hich is not chosen as the primary "ey of the table. They are named so because although not the primary "ey, they can still identify a ro!. o$!osite >ey: %ometimes one "ey is not enough to uni5uely identify a ro!. .g. in a single class Eoll is enough to find a student, but in the entire school, merely searching by the Eoll is not enough, because there could be #, classes in the school and each one of them may contain a certain roll no 2. To uni5uely identify the student !e have to say something li"e Hclass G$$, roll no 2I. %o, a combination of t!o or more attributes is combined to create a uni5ue combination of values, such as @lass N Eoll. Foreign >ey: %ometimes !e may have to !or" !ith an attribute that does not have a primary "ey of its o!n. To identify its ro!s, !e have to use the primary attribute of a related table. %uch a copy of another related table3s primary "ey is called foreign "ey. &trong and ?eak 1ntity &ased on the concept of foreign "ey, there may arise a situation !hen !e have to relate an entity having a primary "ey of its o!n and an entity not having a primary "ey of its o!n. $n such a case, the entity having its o!n primary "ey is called a strong entity and the entity not having its o!n primary "ey is called a !ea" entity. Whenever !e need to relate a strong and a !ea" entity together, the ED !ould change just a little.

Computing Services | .ormali3ation

19

This material as produced as part o! the "#SC$!unded Sudamih- and is made available under a Creative Commons Attribution .on$Commercial

%ay, for example, !e have a statement HA %tudent lives in a *ome.I %T.D AT is obviously a strong entity having a primary "ey Eoll. &ut *-? may not have a uni5ue primary "ey, as its only attribute Address may be shared by many homes 9!hat if it is a housing estate?:. *-? is a !ea" entity in this case. The ED of this statement !ould be li"e the follo!ing

As you can see, the !ea" entity itself and the relationship lin"ing a strong and !ea" entity must have double border. 1,a$!les 9#: A soft!are program manages rental property. A rental agreement is dra!n up !ith the customer for each property rented. A customer may rent more than one property.

9): A travel company speciali<es in offering camping sites in popular resorts. ach site has a number of plots that can be boo"ed by travelers for a number of days. -ne individual in each party is responsible for the boo"ing, !hich may re5uest more than one plot if the party is big. The company li"es to "eep details about all the individuals on a boo"ing for mar"eting and payment purposes.

Discussion <uestions #: @reate a data model for the follo!ing scenario.

Computing Services | .ormali3ation

1:

This material as produced as part o! the "#SC$!unded Sudamih- and is made available under a Creative Commons Attribution .on$Commercial

A charter company o!ns boats that are used to chart trips to islands. ach trip lasts bet!een ) and 2 hours. The company has created a computer system to trac" the boats it o!ns, including each boat3s $D number, name, and seating capacity. The company also trac"s information about the various islands, such as the island names and populations. very time a boat is chartered, it is important to "no! the date and time that the boat is to depart and the number of people planning to ta"e the trip. The company also "eeps information about each captain, such as %ocial %ecurity number, name, and birth date. &oats travel to only one island per trip. @aptains are not permanently assigned to a boat, rather they are randomly assigned as need arises. When the time for the trip comes, the company !ants to "eep trac" of the actual number of people ta"ing the trip, in case it is different from the planned count ): The &usy & @ompany !ants to store data about its employees3 s"ills. ach employee possesses one or more specific s"ills. $n addition, several employees may have the same s"ill. Dra! an ED diagram for the above information 0. Dra! an ED for the follo!ing scenario using notation discussed in class. %tate any reasonable assumptions 9not contradicting any provided in the scenario: you ma"e. >eel free to add attributes on your o!n. $dentify primary "ey for each entity. i: >oothills Athletics 9>A: is an athletic facility in the greater *ighlands Eanch area in @olorado. >A has a number of employees, primarily fitness course instructors and administrative personnel 9e.g., billing cler"s, e5uipment managers, facility supervisors:. ii: @ourses are offered by instructors on a 5uarterly basis such as fall, !inter, spring and summer. ?ultiple sections of each course are offered each 5uarter at different timings !ith different start and end dates. $n any 5uarter, any instructor can teach up to three sections assuming there are no scheduling conflicts. $n a specific 5uarter, each section of a course has only one instructor on record. %alary of instructors varies depending on level of experience, competency and popularity among customers. @ontact information for each employee is stored on file, and includes an emergency contact phone O. iii: A facility supervisor can be assigned to supervise up to four facilities at a time. At a given time, each facility has only one facility supervisor. iv: Eecords are "ept on each employee, past and present, detailing employee name, address, phone number, date of hire, position, and status as either a current or former employee. mployees are assigned a uni5ue four=digit mployee $D number !hen they are hired. v:When joining the >A center, customers are assigned a uni5ue four=digit ?ember $D number. This information along !ith their name, address, phone number, gender, birth date, and date of membership are recorded. At the time of enrollment, each member decides on one of three available membership types along !ith a fixed membership fee: /latinum 9P7,,:, Jold 9P0,,:, and %ilver 9P),,:. This is a one=time fee that establishes a lifetime membership.

Computing Services | .ormali3ation

12

This material as produced as part o! the "#SC$!unded Sudamih- and is made available under a Creative Commons Attribution .on$Commercial

@or$ali5ation

@or$ali5ation is a method for organi<ing data elements in a database into tables. We tal" about normali<ation in terms of Hnormal formsI 9A>:. The normal forms are cumulative, i.e. for a database to be in )A> 9second normal form:, it must also meet the re5uirements of #A>. &y normali<ing your data, you eliminate redundant information and organi<e your tables to ma"e it easier to manage the data and ma"e future changes to the table and database structure. This process removes the insertion, deletion, and modification anomalies you may see. $n normali<ing your data, you usually divide large tables into smaller, easier to maintain tables. ;ou can then use the techni5ue of adding foreign "eys to enable connections bet!een the tables.
@or$ali5ation Avoids Duplication of Data 4 The same data is listed in multiple lines of the database $nsert Anomaly 4 A record about an entity cannot be inserted into the table !ithout first inserting information about another entity 4 @annot enter a customer !ithout a sales order Delete Anomaly 4 A record cannot be deleted !ithout deleting a record about a related entity. @annot delete a sales order !ithout deleting all of the customer3s information. .pdate Anomaly 4 @annot update information !ithout changing information in many places. To update customer information, it must be updated for each sales order the customer has placed

Aormali<ation is a three stage process 4 After the first stage, the data is said to be in first normal form, after the second, it is in second normal form, after the third, it is in third normal form. There are, in order, first, second, third, &oyce=@odd, fourth, and fifth normal forms. ach normal form represents an increasingly stringent set of rulesM that is, each normal form assumes that the re5uirements of the preceding forms have been met. ?any relational database designers feel that, if their tables are in third normal form, most common design problems have been addressed. *o!ever, the higher level normal forms can be of use and are included here. The normal forms are defined as follo!s 8 In =ur #. lass9

>irst normal form 9#A>: sets the very basic rules for an organi<ed database: a. liminate duplicative columns from the same table. b. @reate separate tables for each group of related data and identify each ro! !ith a uni5ue column or set of columns 9the primary "ey:. ). %econd normal form 9)A>: further addresses the concept of removing duplicative data: a. Eemove subsets of data that apply to multiple ro!s of a table and place them in separate tables. b. @reate relationships bet!een these ne! tables and their predecessors through the use of foreign "eys. 0. Third normal form 90A>: goes one large step further: a. Eemove columns that are not dependent upon the primary "ey.

Computing Services | .ormali3ation

11

This material as produced as part o! the "#SC$!unded Sudamih- and is made available under a Creative Commons Attribution .on$Commercial

Before @or$ali5ation
#. &egin !ith a list of all of the fields that must appear in the database. Thin" of this as one big table. ). Do not include computed fields 0. -ne place to begin getting this information is from a printed document used by the system. 7. Additional attributes besides those for the entities described on the document can be added to the database.

Before @or$ali5ation A 1,a$!le


%ee %ales -rder from belo!:

>ields in the original data table !ill be as follo!s: %ales-rderAo, Date, @ustomerAo, @ustomerAame, @ustomerAdd, @ler"Ao, @ler"Aame, $temAo, Description, 8ty, .nit/rice Thin" of this as the baseline 4 one large table

@or$ali5ation: First @or$al For$


%eparate Eepeating Jroups into Ae! Tables. Re!eating Grou!s >ields that may be repeated several times for one documentDentity @reate a ne! table containing the repeating data The primary "ey of the ne! table 9repeating group: is al!ays a composite "eyM .sually document number and a field uni5uely describing the repeating line, li"e an item number.

First @or$al For$ 1,a$!le


The ne! table is as follo!s: %ales-rderAo, $temAo, Description, 8ty, .nit/rice The repeating fields !ill be removed from the original data table, leaving the follo!ing.

Computing Services | .ormali3ation

11

This material as produced as part o! the "#SC$!unded Sudamih- and is made available under a Creative Commons Attribution .on$Commercial

%ales-rderAo, Date, @ustomerAo, @ustomerAame, @ustomerAdd, @ler"Ao, @ler"Aame These t!o tables are a database in first normal form

?hat if 7e did not @or$ali5e the Database to First @or$al For$B


Eepetition of Data 4 %- *eader data repeated for every line in sales order.

@or$ali5ation: &econd @or$al For$


Eemove /artial Dependencies. Functional De!endency The value of one attribute in a table is determined entirely by the value of another. Partial De!endency A type of functional dependency !here an attribute is functionally dependent on only part of the primary "ey 9primary "ey must be a composite "ey:. @reate separate table !ith the functionally dependent data and the part of the "ey on !hich it depends. Tables created at this step !ill usually contain descriptions of resources.

&econd @or$al For$ 1,a$!le


The ne! table !ill contain the follo!ing fields: $temAo, Description All of these fields except the primary "ey !ill be removed from the original table. The primary "ey !ill be left in the original table to allo! lin"ing of data: %ales-rderAo, $temAo, 8ty, .nit/rice Aever treat price as dependent on item. /rice may be different for different sales orders 9discounts, special customers, etc.: Along !ith the unchanged table belo!, these tables ma"e up a database in second normal form: %ales-rderAo, Date, @ustomerAo, @ustomerAame, @ustomerAdd, @ler"Ao, @ler"Aame

?hat if 7e did not @or$ali5e the Database to &econd @or$al For$B


Eepetition of Data 4 Description !ould appear every time !e had an order for the item Delete Anomalies 4 All information about inventory items is stored in the %ales-rderDetail table. Delete a sales order, delete the item. $nsert Anomalies 4 To insert an inventory item, must insert sales order. .pdate Anomalies 4 To change the description, must change it on every %-.

@or$ali5ation: Third @or$al For$


Eemove transitive dependencies. Transitive De!endency A type of functional dependency !here an attribute is functionally dependent on an attribute other than the primary "ey. Thus its value is only indirectly determined by the primary "ey. @reate a separate table containing the attribute and the fields that are functionally dependent on it. Tables created at this step !ill usually contain descriptions of either resources or agents. Feep a copy of the "ey attribute in the original file.

Third @or$al For$ 1,a$!le


The ne! tables !ould be:

Computing Services | .ormali3ation

14

This material as produced as part o! the "#SC$!unded Sudamih- and is made available under a Creative Commons Attribution .on$Commercial

@ustomerAo, @ustomerAame, @ustomerAdd @ler"Ao, @ler"Aame All of these fields except the primary "ey !ill be removed from the original table. The primary "ey !ill be left in the original table to allo! lin"ing of data as follo!s: %ales-rderAo, Date, @ustomerAo, @ler"Ao Together !ith the unchanged tables belo!, these tables ma"e up the database in third normal form. $temAo, Description %ales-rderAo, $temAo, 8ty, .nit/rice

?hat if 7e did not @or$ali5e the Database to Third @or$al For$B


Eepetition of Data 4 Detail for @ustD@ler" !ould appear on every %Delete Anomalies 4 Delete a sales order, delete the customerDcler" $nsert Anomalies 4 To insert a customerDcler", must insert sales order. .pdate Anomalies 4 To change the nameDaddress, etc, must change it on every %-.

o$!leted Tables in Third @or$al For$


@ustomers: @ustomerAo, @ustomerAame, @ustomerAdd @ler"s: @ler"Ao, @ler"Aame $nventory $tems: $temAo, Description %ales -rders: %ales-rderAo, Date, @ustomerAo, @ler"Ao %ales-rderDetail: %ales-rderAo, $temAo, 8ty, .nit/rice

Computing Services | .ormali3ation

15

This material as produced as part o! the "#SC$!unded Sudamih- and is made available under a Creative Commons Attribution .on$Commercial

&tructured <uery :anguage8&<:9 %tructured 5uery language9%8+:, is a computer programming language intended for use as a data manager in relational database management systems. $t !as originally developed by Donald @hamberlin and Eaymond &oyce in #'Q7 to !or" on large mainframe systems. The advent of the personal computer revolution in the late #'(,Cs led to a redesign of %8+. %8+ language created to define and manipulate objects and data in the database 9database:. %8+ becomes something that is important because:is a portable that can be directly used in the /@ even though made in the minicomputer.procedure language database created on a /@, can be used to retrieve data in the same minicomputer because of its language. Tools to run the command %tructured 8uery +anguage 9%8+:, among others: #. %8+ /lus 9Windo!s:, is part of -racle Database products. This tool can be run on ?icrosoft R/D'(D),,,. >or ho! to run this product: %tart =S /rograms =S -racle for !indo!s AT =S %8+ /lus ). %8+ /lus 9D-%:, ho! to execute %8+ commands by using cmd or D-%. *o! to run it: %tart =S Eun, type cmd. -nce open cmd, type in s5l plus then enter. nter your user $D and pass!ord. 0. %GE?JE+ 9D-%:, the same as the %8+ /lus D-%, distinguished only !ay to run it. *o! to run it: %tart =S Eun. Type cmd. When cmd is already open, type svrmgrl then enter. After that, type connect internal and fill pass!ord. 7. -pen Database @onnectivity 9-D&@:, is the standardi<ation of the various databases in order to run other applications. *o! to run it: %tart =S %ettings =S @ontrol /anel =S Administrative Tools =S Data %ources. /ress the Add button, then select the driver database searchable. 2. Another tool that can be used, such as: Toad, 8uest %oft!are. %tructured 8uery +anguage 9%8+: has advantages and disadvantages. *ere are the advantages and disadvantages of these %8+ commands: Advantages %ince it is interactive and is embedded in the program are not much different, then in addition to easy for the user, the programmer easily perform error trac"ing program 9debugging: >acilitate communication bet!een users, programmers and D&As. $n addition, embedded %8+ for different host language 9@-&-+, >-ETEAA, /+ D $: nearly e5ual &ecause %8+ is compiled, the runtime is much faster than the interactive enterprise=grade soft!are excellent support for data recovery. Portable: %8+ is run in programs in mainframes, /@s, laptops, servers and even mobile phones. $t runs in local systems, intranet and internet. Databases using %8+ can be moved from device to another !ithout any problems.

Computing Services | .ormali3ation

16

This material as produced as part o! the "#SC$!unded Sudamih- and is made available under a Creative Commons Attribution .on$Commercial

;sed 7ith any DB#& syste$ 7ith any vendor: %8+ is used by a all the vendors !ho develop D&?%. &<: &tandard: >irst standard for %8+ !as put up in #'(B by AA%$ 9American Aational %tandards $nstitute: and $%- 9$nternational %tandards -rgani<ation:. $t !as later expanded in #'(' and in #'') and #'''. 1asy to learn and understand: %8+ mainly consists of nglish statements and it is very easy to learn and understand a %8+ 5uery. Interactive language: %8+ can be used to communicate !ith the databases and get ans!ers to complex 5uestions in seconds. Both as !rogra$$ing language and interactive language: %8+ can do both the jobs of being a programming as !ell as an interactive language at the same time. o$!lete language for a database: %8+ is used to create databases, manage security of a database. $t can also be used for updating, retrieving and sharing data !ith users. #ulti!le data vie7s: &y use of %8+, different vie!s of structure and content of a database can be provided for different users. lient-&erver language: %8+ is used for lin"ing front end computers and bac" end databases. Thus, providing client server architecture. Dyna$ic database language: &y the use of %8+ database structure can be changed in a dynamic fashion even !hen the contents of the database are accessed by users at the same time. &u!!orts obCect based !rogra$$ing: %8+ supports the latest object based programming and is highly flexible. &u!!orts enter!rise a!!lications: %8+ is the database language !hich is used by businesses and enterprises throughout the globe. >or an enterprise application it is a perfect language for a database. ;sed in internet: %8+ is used in three tiered $nternet architecture. The architecture includes a client, application server and a database.

Disadvantages

overall cost o! implementation


>orm of the language is much different from the host language that is so difficult for a programmer in !riting "odingnya. :ack of obCect reference, in relational algebra, the relations bet!een objects completely depends on foreign "ey. This not only ma"es the efficiency very lo! in loo"ing for relation, but also ma"es it impossible to directly treat the record pointed by foreign "ey as the attribute of primary record :ack the su!!ort of ordered set. %8+ inherits the unordered set in mathematics, !hich directly causes the fact that the computations relating to se5uence are rather difficult. And it can be imagined ho! common the computations relating to se5uence 9such as over the preceding month, over the same period last year, the first ),T, and ran"ings: !ill be

Computing Services | .ormali3ation

17

This material as produced as part o! the "#SC$!unded Sudamih- and is made available under a Creative Commons Attribution .on$Commercial

%et=li<ation is not complete. Though %8+ has the concept of set, it fails to provide set as a "ind of basic data type, !hich ma"es it necessary to transform a lot of natural set computations in thin"ing and !riting Do not support computation by steps. Dividing complex computation into several steps can reduce the difficulty of a problem to a great extent. -n the contrary, completing many steps of computation into one step can increase the difficulty of a problem to a great extent.

Aote %8+ is divided into Data Definition +anguage and Data ?anipulation +anguage. &o$e Basic &<: 1,a$!les '( reating a table: @E AT TA&+ 6@.%T-? E%6 96$D6 A.?& E9B,,:, 6+A%TAA? 6 GAE@*AE)90,:, 6>$E%TAA? 6 GAE@*AE)90,:, 6ADDE %%6 GAE@*AE92,:, 6@$T;6 GAE@*AE)9),:, 6%TAT 6 GAE@*AE)9):, 6U$/6 GAE@*AE)92:, @-A%TEA$AT 6@.%T-? EV/F6 /E$?AE; F ; 96$D6: AA&+ :M )( Inserting data into a table: $A% ET $AT- 6@.%T-? E%6 GA+. % 9#, C%impsonC, C*omerC, C)7) vergreen TerraceC, C%pringfieldC, C??C, C,2,2,C:M *( &electing data fro$ a table: % + @T W >E-? @.%T-? E%M % + @T $D, +A%TAA? , >$E%TAA? >E-? @.%T-? E% W* E $DX#M +( ;!dating data in a table:

./DAT @.%T-? E% % T U$/XC#)072C, >$E%TAA? XC?argeC W* E +A%TAA? XC%impsonCM .( Deleting data fro$ a table: D + T W >E-? @.%T-? E%M

Computing Services | .ormali3ation

18

This material as produced as part o! the "#SC$!unded Sudamih- and is made available under a Creative Commons Attribution .on$Commercial

D + T >E-? @.%T-? E% W* E $DX#M /( Deleting a table: DE-/ TA&+ 6@.%T-? E6M 0( ;sing a se"uence: a. @reating a se5uence:

@E AT % 8. A@ custVse5 %TAET W$T* # $A@E ? AT &; # A-?ARGA+. M b. .sing a se5uence:

$A% ET into 6@.%T-? E%)6 GA+. %9custVse5.nextval, C%impsonC, C?argeC:M

Computing Services | .ormali3ation

19

This material as produced as part o! the "#SC$!unded Sudamih- and is made available under a Creative Commons Attribution .on$Commercial

Você também pode gostar