Você está na página 1de 174

NATIONAL OPEN UNIVERSITY OF NIGERIA

COURSE CODE :MBA 758

COURSE TITLE: DATABASE MANAGEMENT SYSTEM

COURSE GUIDE MBA 758 DATABASE MANAGEMENT SYSTEM


Course Writer Gerald C. Okereke Eco Communications Inc. Lagos Ikeja Mr. E. Eseyin National Open University o Nigeria "r. O. #. On$e National Open University o Nigeria %im&ola' E.U. (deg&ola National Open University o Nigeria

Course Editor !rogramme Leader Course Coordinator

NATIONAL OPEN UNIVERSITY OF NIGERIA

National Open University o Nigeria )ead*uarters +,-+. (/madu %ello Way 0ictoria Island Lagos (&uja O ice 1' "ar es 2alaam 2treet O (minu 3ano Crescent Wuse II' (&uja Nigeria e4mail5 centralin o6nou.edu.ng U7L5 $$$.nou.edu.ng
!u&lis/ed &y

National Open University o Nigeria !rinted 899: I2%N5 :;<491<4==+4: (ll 7ig/ts 7eserved

CONTENTS

PAGE
+ + 8 8 8 = , , , ,

Introduction>>>>>>>>>>>>>>>>>>>>>> Course (im>>>>>>>>>>>>>>>>>>>>>> Course O&jectives>>>>>>>>>>>>>>>>>>>.. Course Materials>>>>>>>>>>>>>>>>>>>.. 2tudy Units>>>>>>>>>>>>>>>>>>.>>>> (ssignment ?ile >>>>>>>>>>>>>>>>>>>.. (ssessment>>>>>>>>>>>>>>>>>>>>>> Credit Units >>>>>>>>>>>>>>>>>>>>>.. !resentation 2c/edule >>>>>>>>>>>>>>>>>>. Course Overvie$ >>>>>>>>>>>>>>>>>>>..

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

Introduct on
@/is course' "ata&ase Management 2ystem B"%M2C' is a course designed in t/e pursuit o a degree in Masters "egrees in &usiness' inance' marketing and related ields o study. It is also a course t/at can &e studied &y !ostgraduate "iploma students in &usiness' sciences and education. @/is course is relevant to students studying &usiness &ecause in ormation-data orm t/e oundation o any &usiness enterprise. @/us a t/oroug/ understanding o /o$ to manipulate' design and manage data&ases. @/is course is primarily to &e studied &y students $/o are already graduates or post graduates in any ield o study. 2tudents $/o /ad not /ad eDposure to computer science in t/eir irst degrees need to put in eDtra e ort to grasp t/is course properly. @/is course guide takes you t/roug/ t/e nature o t/e course' t/e materials you are going to use and /o$ you are to use materials to your maDimum &ene it. It is eDpected t/at at least t$o /ours s/ould &e devoted to t/e study o eac/ course unit. ?or eac/ unit t/ere assessments in t/e orm o tutor4marked assignment. Aou are advised carry out t/e eDercises immediately a ter studying t/e unit. @/ere $ill &e tutorial lectures to organiEed or t/is course. @/is serves as an avenue to interact $it/ course instructors $/o $ill communicate more clearly $it/ you regarding t/e course. Aou are advised to attend t/e tutorial lectures &ecause it $ill en/ance your understanding o t/e course. Note t/at it is also t/roug/ t/ese tutorial lectures t/at you $ill su&mit your tutor4marked assignment and &e assessed accordingly.

Cour!" A #
%e/ind t/e development and design o t/is course is to kno$ /o$ to design' manipulate and manage data&ases. @/e course participants are eDposed to t/e various orms' types and models o data&ase systems to ena&le t/em make via&le c/oices. 2upportive and complimentary concepts o managing data and documents are t/oroug/ly eDamined to give a $/olesome vie$ o data-in ormation management. @/e ultimate aim is to encourage t/e usage o data&ase management systems or e ective data management.

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

Cour!" O$%"ct &"!


@/e ollo$ing are t/e major o&jectives o t/is course5 de ine a "ata&ase Management 2ystem give a description o t/e "ata&ase Management structure de ine a "ata&ase de ine &asic oundational terms o "ata&ase understand t/e applications o "ata&ases kno$ t/e advantages and disadvantages o t/e di erent models compare relational model $it/ t/e 2tructured Fuery Language B2FLC kno$ t/e constraints and controversies associated $it/ relational data&ase model. kno$ t/e rules guiding transaction (CI" identi y t/e major types o relational management systems compare and contrast t/e types o 7"%M2 &ased on several criteria understand t/e concept o data planning and "ata&ase design kno$ t/e steps in t/e development o "ata&ases trace t/e /istory and development process o 2FL kno$ t/e scope and eDtension o 2FL di erentiate "iscretionary and. Mandatory (ccess Control !olicies kno$ t/e !roposed OO"%M2 2ecurity Models identi y t/e various unctions o "ata&ase (dministrator trace t/e /istory and development process o data$are/ouse list various &ene its o data$are/ouse compare and contrast document management system and content management systems kno$ t/e &asic components o document management systems

Cour!" M't"r '(!


+. 8. =. ,. 1. Course Guide 2tudy Units @eDt&ooks (ssignment ?ile @utorials

Stud) Un t!
@/is course consists o t/irteen B+=C units' divided into = modules. Eac/ module deals $it/ major aspect o t/e course. Modu(" *

ii

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

Unit + Unit 8 Unit = Unit , Unit 1 Unit . Modu(" + Unit + Unit 8 Unit = Unit , Modu(" , Unit + Unit 8 Unit =

Overvie$ "ata&ase "ata&ase Concepts "ata&ase Models + "ata&ase Models5 7elational Model %asic Components o "%M2

"evelopment and "esign4O "ata&ase 2tructured Fuery Languages B2FLC "ata&ase and In ormation 2ystems 2ecurity "ata&ase (dministrator and (dministration

7elational "ata&ase Management 2ystems "ata$are/ouse "ocument Management 2ystem

In studying t/e units' a minimum o 8 /ours is eDpected o you. 2tart &y going t/roug/ t/e unit o&jectives or you to kno$ $/at you need to learn and kno$ in t/e course o studying t/e unit. (t t/e end o t/e study o t/e unit' evaluate yoursel to kno$ i you /ave ac/ieved t/e o&jectives o t/e unit. I not' you need to go t/roug/ t/e unit again. @o /elp you ascertain /o$ $ell you understood t/e course' t/ere $ill &e eDercises mainly in t/e orm o tutor4marked assignments at t/e end o eac/ unit. (t irst attempt' try to ans$er t/e *uestions $it/out necessarily /aving to go t/roug/ t/e unit. )o$ever' i you cannot pro er solutions o /and' t/en go t/roug/ t/e unit to ans$er t/e *uestions.

A!! -n#"nt F ("


?or eac/ unit' you $ill ind one B+C or t$o B8C tutor4marked assignments. @/ese assignments serve t$o purposes5 *. S"(/ E&'(u't on: @/e tutor4marked assignment $ill assists you to t/oroug/ly go t/roug/ eac/ unit' &ecause you are advised to attempt to ans$er t/e *uestions immediately a ter studying eac/ unit. @/e *uestions are designed in suc/ a $ay t/at at least one *uestion must prompt a typical sel assessment test.

iii

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

+.

O$t' n V'(u'$(" M'r0!: @/e tutor4marked assignment is also a valid means to o&tain marks t/at $ill orm part o your total score in t/is course. It constitutes =9G o total marks o&taina&le in t/is course.

Aou are advised to go t/roug/ t/e units t/oroug/ly or you to &e a&le to pro er correct solution to t/e tutor4marked assignment

A!!"!!#"nt
Aou $ill &e assessed and graded in t/is course t/roug/ tutor4marked assignment and ormal $ritten eDamination. @/e allocation o marks is as indicated &elo$. (ssignments H =9 G EDamination H ;9G ?inal eDamination and grading @/e inal eDamination $ill consist o t$o B8C sections5 +. 8. 2ection +5 @/is is compulsory and $eig/s ,9 marks 2ection 85 @/is consists o siD B.C *uestions out o $/ic/ you are to ans$er B,C *uestions. It $eig/ts .9 marks.

@/e duration o t/e eDamination $ill &e = /ours.

Cr"d t Un t!
@/is course attracts = credit units only.

Pr"!"nt't on Sc1"du("
@/is constitutes t/e sc/eduled dates and venue or tutorial classes' as $ell as /o$ and $/en to su&mit t/e tutorials. (ll t/is $ill &e communicated to you in due course.

Cour!" O&"r& "2


@/is indicates t/e units-topic' issues to &e studied eac/ $eek. It also includes t/e duration o t/e course' revision $eek and eDamination $eek. @/e details are as provided &elo$5

iv

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

Un t T t(" o/ 3or0 Course Guide Modu(" * Overvie$ "ata&ase "ata&ase Concepts "ata&ase Models + "ata&ase Models5 7elational Model %asic Components o "%M2 Modu(" + "evelopment and "esign4O "ata&ase 2tructured Fuery Languages B2FLC "ata&ase and In ormation 2ystems 2ecurity "ata&ase (dministrator and (dministration Modu(" , 7elational "ata&ase Management 2ystems "ata$are/ouse "ocument Management 2ystem R"& ! on 'nd E7'# n't on

3""04! Act & t) + 8 = , 1 . ; < : +9 ++ +8 += *8

A!!"!!#"nt 5"nd o/ un t6 @M( @M( @M( @M( @M( @M( @M( @M( @M( @M( @M( @M( @M(

+ 8 = , 1 . + 8 = , + 8 =

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

Course Code Course @itle Course Writer

M%( ;1< "ata&ase Management 2ystem Gerald C. Okereke Eco Communications Inc. Lagos Ikeja Mr. E. Eseyin National Open University o Nigeria "r. O. #. On$e National Open University o Nigeria %im&ola' E.U. (deg&ola National Open University o Nigeria

Course Editor !rogramme Leader Course Coordinator

vi

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

NATIONAL OPEN UNIVERSITY OF NIGERIA

vii

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

National Open University o Nigeria )ead*uarters +,-+. (/madu %ello Way 0ictoria Island Lagos (&uja O ice 1' "ar es 2alaam 2treet O (minu 3ano Crescent Wuse II' (&uja Nigeria e4mail5 centralin o6nou.edu.ng U7L5 $$$.nou.edu.ng
!u&lis/ed &y

National Open University o Nigeria !rinted 899: I2%N5 :;<491<4==+4: (ll 7ig/ts 7eserved

viii

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

CONTENTS
Modu(" * Unit + Unit 8 Unit = Unit , Unit 1 Unit . Modu(" + Unit + Unit 8 Unit = Unit , Modu(" , Unit + Unit 8 Unit =

PAGE
* + ++ 8= =. 18 ., 75 ;1 << +9+ ++1 *+8

99999999999999999999.. Overvie$>>>>>>>>>>>>>.>>>. "ata&ase>>>>>>>>>>>>>>..>>.. "ata&ase Concepts>>>>>>>>>..>>>.. "ata&ase Models +>>>>>>>>>.>.>>.. "ata&ase Models5 7elational Model>>>>>>.. %asic Components o "%M2 >>>>>>>>> 999999999..99999999999.. "evelopment and "esign4O "ata&ase >>>>>> 2tructured Fuery Languages B2FLC>>>>>>>. "ata&ase and In ormation 2ystems 2ecurity >>>... "ata&ase (dministrator and (dministration >>>.. 999999999999999999..99..

7elational "ata&ase Management 2ystems >>.> +8, "ata Ware/ouse>>>.>>>>>>>.>..>> +=1 "ocument Management 2ystem>>>>>>>>.. +,;

iD

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

MODULE *
Unit + Unit 8 Unit = Unit , Unit 1 Unit . Overvie$ "ata&ase "ata&ase Concepts "ata&ase Models + "ata&ase Models5 7elational Model %asic Components o "%M2

UNIT *
CONTENTS +.9 8.9 =.9

OVERVIE3

,.9 1.9 ..9 ;.9

Introduction O&jectives Main Content =.+ "escription =.8 "%M2 %ene its =.= ?eatures and capa&ilities o "%M2 =., Uses o "%M2 =.1 List o "ata&ase Management 2ystems 2o t$are Conclusion 2ummary @utor4Marked (ssignment 7e erences-?urt/er 7eadings

*.:

INTRODUCTION

( "ata&ase Management 2ystem B"%M2C is computer so t$are designed or t/e purpose o managing data&ases &ased on a variety o data models.

+.:

OB;ECTIVES

(t t/e end o t/is unit' you s/ould &e a&le to5

de ine a "ata&ase Management 2ystem give a description o t/e "ata&ase Management 2tructure numerate t/e &ene its o "ata&ase Management 2ystem descri&e t/e eatures and capa&ilities o a typical "%M2 identi y and di erentiate t/e di erent types and models o "%M2.

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

,.: ,.*

MAIN CONTENT D"!cr <t on

( "%M2 is a compleD set o so t$are programs t/at controls t/e organiEation' storage' management' and retrieval o data in a data&ase. "%M2 are categoriEed according to t/eir data structures or types' sometime "%M2 is also kno$n as "ata &ase Manager. It is a set o pre$ritten programs t/at are used to store' update and retrieve a "ata&ase. ( "%M2 includes5 ( modeling language to de ine t/e sc/ema o eac/ data&ase /osted in t/e "%M2' according to t/e "%M2 data model. @/e our most common types o organiEations are t/e /ierarc/ical' net$ork' relational and o&ject models. Inverted lists and ot/er met/ods are also used. ( given data&ase management system may provide one or more o t/e our models. @/e optimal structure depends on t/e natural organiEation o t/e applicationIs data' and on t/e applicationIs re*uirements B$/ic/ include transaction rate BspeedC' relia&ility' maintaina&ility' scala&ility' and costC. @/e dominant model in use today is t/e ad /oc one em&edded in 2FL' despite t/e o&jections o purists $/o &elieve t/is model is a corruption o t/e relational model' since it violates several o its undamental principles or t/e sake o practicality and per ormance. Many "%M2s also support t/e Open "ata&ase Connectivity (!I t/at supports a standard $ay or programmers to access t/e "%M2. "ata structures B ields' records' iles and o&jectsC optimiEed to deal $it/ very large amounts o data stored on a permanent data storage device B$/ic/ implies relatively slo$ access compared to volatile main memoryC. ( data&ase *uery language and report $riter to allo$ users to interactively interrogate t/e data&ase' analyEe its data and update it according to t/e users privileges on data. It also controls t/e security o t/e data&ase.
"ata security prevents unaut/oriEed users rom vie$ing or updating t/e

data&ase. Using pass$ords' users are allo$ed access to t/e entire data&ase or su&sets o it called subschemas. ?or eDample' an employee data&ase can contain all t/e data a&out an individual employee' &ut one group o users may &e aut/oriEed to vie$ only payroll data' $/ile ot/ers are allo$ed access to only $ork /istory and medical data.
8

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

I t/e "%M2 provides a $ay to interactively enter and update t/e data&ase' as $ell as interrogate it' t/is capa&ility allo$s or managing personal data&ases. )o$ever' it may not leave an audit trail o actions or provide t/e kinds o controls necessary in a multi4user organiEation. @/ese controls are only availa&le $/en a set o application programs are customiEed or eac/ data entry and updating unction. ( transaction mec/anism' t/at ideally $ould guarantee t/e (CI" properties' in order to ensure data integrity' despite concurrent user accesses Bconcurrency controlC' and aults B ault toleranceC. It also maintains t/e integrity o t/e data in t/e data&ase. @/e "%M2 can maintain t/e integrity o t/e data&ase &y not allo$ing more t/an one user to update t/e same record at t/e same time. @/e "%M2 can /elp prevent duplicate records via uni*ue indeD constraintsJ or eDample' no t$o customers $it/ t/e same customer num&ers Bkey ieldsC can &e entered into t/e data&ase. @/e "%M2 accepts re*uests or data rom t/e application program and instructs t/e operating system to trans er t/e appropriate data. W/en a "%M2 is used' in ormation systems can &e c/anged muc/ more easily as t/e organiEationIs in ormation re*uirements c/ange. Ne$ categories o data can &e added to t/e data&ase $it/out disruption to t/e eDisting system. OrganiEations may use one kind o "%M2 or daily transaction processing and t/en move t/e detail onto anot/er computer t/at uses anot/er "%M2 &etter suited or random in*uiries and analysis. Overall systems design decisions are per ormed &y data administrators and systems analysts. "etailed data&ase design is per ormed &y data&ase administrators. "ata&ase servers are specially designed computers t/at /old t/e actual data&ases and run only t/e "%M2 and related so t$are. "ata&ase servers are usually multiprocessor computers' $it/ 7(I" disk arrays used or sta&le storage. Connected to one or more servers via a /ig/4 speed c/annel' /ard$are data&ase accelerators are also used in large volume transaction processing environments. "%M2s are ound at t/e /eart o most data&ase applications. 2ometimes "%M2s are &uilt around a private multitasking kernel $it/ &uilt4in net$orking support alt/oug/ no$adays t/ese unctions are le t to t/e operating system.

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

,.+

DBMS B"n"/ t!

Improved strategic use o corporate data 7educed compleDity o t/e organiEationKs in ormation systems environment 7educed data redundancy and inconsistency En/anced data integrity (pplication4data independence Improved security 7educed application development and maintenance costs Improved leDi&ility o in ormation systems Increased access and availa&ility o data and in ormation Logical L !/ysical data independence Concurrent access anomalies. ?acilitate atomicity pro&lem. !rovides central control on t/e system t/roug/ "%(. Figure 1: An example of a database management approach in a banking information system.

Note /o$ t/e savings' c/ecking' and installment loan programs use a data&ase management system to s/are a customer data&ase. Note also t/at t/e "%M2 allo$s a user to make a direct' ad /oc interrogation o t/e data&ase $it/out using application programs.

,.,

F"'tur"! 'nd C'<'$ ( t "! o/ DBMS

( "%M2 can &e c/aracteriEed as an Mattri&ute management systemM $/ere attri&utes are small c/unks o in ormation t/at descri&e somet/ing. ?or eDample' McolourM is an attri&ute o a car. @/e value o t/e attri&ute may &e a color suc/ as MredM' M&lueM or MsilverM.
,

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

(lternatively' and especially in connection $it/ t/e relational model o data&ase management' t/e relation &et$een attri&utes dra$n rom a speci ied set o domains can &e seen as &eing primary. ?or instance' t/e data&ase mig/t indicate t/at a car t/at $as originally MredM mig/t ade to MpinkM in time' provided it $as o some particular MmakeM $it/ an in erior paint jo&. 2uc/ /ig/er arity relations/ips provide in ormation on all o t/e underlying domains at t/e same time' $it/ none o t/em &eing privileged a&ove t/e ot/ers. @/roug/out recent /istory specialiEed data&ases /ave eDisted or scienti ic' geospatial' imaging' and document storage and like uses. ?unctionality dra$n rom suc/ applications /as lately &egun appearing in mainstream "%M2s as $ell. )o$ever' t/e main ocus t/ere' at least $/en aimed at t/e commercial data processing market' is still on descriptive attri&utes on repetitive record structures. @/us' t/e "%M2s o today roll toget/er re*uently4needed services or eatures o attri&ute management. %y eDternaliEing suc/ unctionality to t/e "%M2' applications e ectively s/are code $it/ eac/ ot/er and are relieved o muc/ internal compleDity. ?eatures commonly o ered &y data&ase management systems include5 =u"r) A$ ( t) Fuerying is t/e process o re*uesting attri&ute in ormation rom various perspectives and com&inations o actors. EDample5 M)o$ many 84door cars in @eDas are greenNM ( data&ase *uery language and report $riter allo$ users to interactively interrogate t/e data&ase' analyEe its data and update it according to t/e users privileges on data. It also controls t/e security o t/e data&ase. "ata security prevents unaut/oriEed users rom vie$ing or updating t/e data&ase. Using pass$ords' users are allo$ed access to t/e entire data&ase or su&sets o it called su&sc/emas. ?or eDample' an employee data&ase can contain all t/e data a&out an individual employee' &ut one group o users may &e aut/oriEed to vie$ only payroll data' $/ile ot/ers are allo$ed access to only $ork /istory and medical data. I t/e "%M2 provides a $ay to interactively enter and update t/e data&ase' as $ell as interrogate it' t/is capa&ility allo$s or managing personal data&ases. )o$ever it may not leave an audit trail o actions or provide t/e kinds o controls necessary in a multi4user organiEation. @/ese controls are only availa&le $/en a set o application programs are customiEed or eac/ data entry and updating unction.

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

B'c0u< 'nd R"<( c't on Copies o attri&utes need to &e made regularly in case primary disks or ot/er e*uipment ails. ( periodic copy o attri&utes may also &e created or a distant organiEation t/at cannot readily access t/e original. "%M2 usually provide utilities to acilitate t/e process o eDtracting and disseminating attri&ute sets. W/en data is replicated &et$een data&ase servers' so t/at t/e in ormation remains consistent t/roug/out t/e data&ase system and users cannot tell or even kno$ $/ic/ server in t/e "%M2 t/ey are using' t/e system is said to eD/i&it replication transparency. Ru(" En/orc"#"nt O ten one $ants to apply rules to attri&utes so t/at t/e attri&utes are clean and relia&le. ?or eDample' $e may /ave a rule t/at says eac/ car can /ave only one engine associated $it/ it Bidenti ied &y Engine Num&erC. I some&ody tries to associate a second engine $it/ a given car' $e $ant t/e "%M2 to deny suc/ a re*uest and display an error message. )o$ever' $it/ c/anges in t/e model speci ication suc/ as' in t/is eDample' /y&rid gas4electric cars' rules may need to c/ange. Ideally suc/ rules s/ould &e a&le to &e added and removed as needed $it/out signi icant data layout redesign. S"cur t) O ten it is desira&le to limit $/o can see or c/ange a given attri&utes or groups o attri&utes. @/is may &e managed directly &y individual' or &y t/e assignment o individuals and privileges to groups' or Bin t/e most ela&orate modelsC t/roug/ t/e assignment o individuals and groups to roles $/ic/ are t/en granted entitlements. Co#<ut't on @/ere are common computations re*uested on attri&utes suc/ as counting' summing' averaging' sorting' grouping' cross4re erencing' etc. 7at/er t/an /ave eac/ computer application implement t/ese rom scratc/' t/ey can rely on t/e "%M2 to supply suc/ calculations. (ll arit/metical $ork to per orm &y computer is called a computation. C1'n-" 'nd Acc"!! Lo-- nO ten one $ants to kno$ $/o accessed $/at attri&utes' $/at $as c/anged' and $/en it $as c/anged. Logging services allo$ t/is &y keeping a record o access occurrences and c/anges.

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

Auto#'t"d O<t # >'t on I t/ere are re*uently occurring usage patterns or re*uests' some "%M2 can adjust t/emselves to improve t/e speed o t/ose interactions. In some cases t/e "%M2 $ill merely provide tools to monitor per ormance' allo$ing a /uman eDpert to make t/e necessary adjustments a ter revie$ing t/e statistics collected.

,.8

U!"! O/ D't'$'!" M'n'-"#"nt S)!t"#!

@/e our major uses o data&ase management systems are5 +. 8. =. ,. "ata&ase "evelopment "ata&ase Interrogation "ata&ase Maintenance (pplication "evelopment

D't'$'!" D"&"(o<#"nt "ata&ase packages like Microso t (ccess' Lotus (pproac/ allo$ end users to develop t/e data&ase t/ey need. )o$ever' large organiEations $it/ client-server or main rame4&ased system usually place control o enterprise4$ide data&ase development in t/e /ands o data&ase administrators and ot/er data&ase specialists. @/is improves t/e integrity and security o organiEational data&ase. "ata&ase developers use t/e data de inition languages B""LC in data&ase management systems like oracle :i or I%MKs %"8 to develop and speci y t/e data contents' relations/ips and structure eac/ data&ases' and to modi y t/ese data&ase speci ications called a data dictionary. F -ur" +: T1" Four M'%or U!"! o/ DBMS
O<"r't nS)!t"# D't'$'!" D't'$'!" M'n'-"#"nt 2ystems 4Database Development 4Database Interrogation D't'$'!" Int"rro-'t on (p<( c't on Pro-r'#! 4Database Maintenance 4Application Development @/e "ata&ase interrogation capa&ility is a major use o

D't'$'!"

U!"!

"ata "ictionary

"ata&ase management system. End users can interrogate a data&ase management system &y asking or in ormation rom a data&ase using a query language or a report generator. @/ey can receive an immediate
;

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

response in t/e orm o video displays or printed reports. No di icult programming ideas are re*uired. D't'$'!" M' nt"n'nc" @/e data&ases o organiEations need to &e updated continually to re lect ne$ &usiness transactions and ot/er events. Ot/er miscellaneous c/anges must also &e made to ensure accuracy o t/e data in t/e data&ase. @/is data&ase maintenance process is accomplis/ed &y transaction processing programs and ot/er end4user application packages $it/in t/e support o t/e data&ase management system. End4 users and in ormation specialists can also employ various utilities provided &y a "%M2 or data&ase maintenance. A<<( c't on D"&"(o<#"nt "ata&ase management system packages play major roles in application development. End4users' systems analysts and ot/er application developers can use t/e ourt/ generational languages B,GLC programming languages and &uilt4in so t$are development tools provided &y many "%M2 packages to develop custom application programs. ?or eDample you can use a "%M2 to easily develop t/e data entry screens' orms' reports' or $e& pages &y a &usiness application. ( data&ase management system also makes t/e jo& o application programmers easier' since t/ey do not /ave to develop detailed data /andling procedures using a conventional programming language every time t/ey $rite a program.

,.5

Mod"(!

@/e various models o data&ase management systems are5 +. 8. =. ,. 1. .. ;. <. :. )ierarc/ical Net$ork O&ject4oriented (ssociative Column4Oriented Navigational "istri&uted 7eal @ime 7elational 2FL

@/ese models $ill &e discussed in details in su&se*uent units o t/is course.

,.?

L !t o/ D't'$'!" M'n'-"#"nt S)!t"#! So/t2'r"

EDamples o "%M2s include


<

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

Oracle "%8 2y&ase (daptive 2erver Enterprise ?ileMaker ?ire&ird Ingres In ormiD Microso t (ccess Microso t 2FL 2erver Microso t 0isual ?oD!ro My2FL !ostgre2FL !rogress 2FLite @eradata C2FL OpenLink 0irtuoso

8.:

CONCLUSION

"ata&ase management systems /as continue to make data arrangement and storage to &e muc/ easier t/an it used to &e. Wit/ t/e emergence o relational model o data&ase management systems muc/ o t/e &ig c/allenge in /andling large data&ase /as &een reduced. More data&ase management products $ill &e availa&le on t/e market as t/ere $ill &e improvement in t/e already eDisting once.

5.:

SUMMARY

( D't'$'!" M'n'-"#"nt S)!t"# BDBMSC is computer so t$are designed or t/e purpose o managing data&ases &ased on a variety o data models. ( "%M2 is a compleD set o so t$are programs t/at controls t/e organiEation' storage' management' and retrieval o data in a data&ase W/en a "%M2 is used' in ormation systems can &e c/anged muc/ more easily as t/e organiEationIs in ormation re*uirements c/ange. Ne$ categories o data can &e added to t/e data&ase $it/out disruption to t/e eDisting system. O ten it is desira&le to limit $/o can see or c/ange $/ic/ attri&utes or groups o attri&utes. @/is may &e managed directly &y individual' or &y t/e assignment o individuals and privileges to groups' or Bin t/e most ela&orate modelsC t/roug/ t/e assignment o individuals and groups to roles $/ic/ are t/en granted entitlements.
:

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

( "%M2 can &e c/aracteriEed as an Mattri&ute management systemM $/ere attri&utes are small c/unks o in ormation t/at descri&e somet/ing. ?or eDample' McolourM is an attri&ute o a car. @/e value o t/e attri&ute may &e a color suc/ as MredM' M&lueM or MsilverM. Fuerying is t/e process o re*uesting attri&ute in ormation rom various perspectives and com&inations o actors. EDample5 M)o$ many 84door cars in @eDas are greenNM (s computers gre$ in capa&ility' t/is trade4o &ecame increasingly unnecessary and a num&er o general4purpose data&ase systems emergedJ &y t/e mid4+:.9s t/ere $ere a num&er o suc/ systems in commercial use. Interest in a standard &egan to gro$' and C/arles %ac/man' aut/or o one suc/ product' IDS' ounded t/e Database Task roup $it/in CO"(2AL

?.:

TUTOR@MARAED ASSIGNMENT

+. Mention +9 data&ase management systems so t$are 8. "escri&e &rie ly t/e &ackup and replication a&ility o data&ase management systems.

7.:

REFERENCESBFURTCER READINGS

Codd' E.?. B+:;9C. M( 7elational Model o "ata or Large 2/ared "ata %anksM. Communications o t/e (CM += B.C5 =;;O=<;. OK%rien' #ames (. 899=' Introduction to In ormation 2ystems' McGra$4 )ill' ++t/ Edition

UNIT +
CONTENTS +.9 8.9

DATABASE

Introduction O&jectives

+9

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

=.9

,.9 1.9 ..9 ;.9

Main Content =.+ ?oundations o "ata&ase @erms =.8 )istory =.= "ata&ase @ypes =., "ata&ase 2torage 2tructures =.1 "ata&ase 2ervers =.. "ata&ase 7eplication =.; 7elational "ata&ase Conclusion 2ummary @utor4Marked (ssignment 7e erences-?urt/er 7eadings

*.:

INTRODUCTION

( "ata&ase is a structured collection o data t/at is managed to meet t/e needs o a community o users. @/e structure is ac/ieved &y organiEing t/e data according to a data&ase model. @/e model in most common use today is t/e relational model. Ot/er models suc/ as t/e /ierarc/ical model and t/e net$ork model use a more eDplicit representation o relations/ips Bsee &elo$ or eDplanation o t/e various data&ase modelsC. ( computer data&ase relies upon so t$are to organiEe t/e storage o data. @/is so t$are is kno$n as a data&ase management system B"%M2C. "ata&ases management systems are categoriEed according to t/e data&ase model t/at t/ey support. @/e model tends to determine t/e *uery languages t/at are availa&le to access t/e data&ase. ( great deal o t/e internal engineering o a "%M2' /o$ever' is independent o t/e data model' and is concerned $it/ managing actors suc/ as per ormance' concurrency' integrity' and recovery rom /ard$are ailures. In t/ese areas t/ere are large di erences &et$een products.

+.:

OB;ECTIVES

(t t/e end o t/is unit' you s/ould &e a&le to5 de ine a data&ase de ine &asic oundational terms o data&ase kno$ a little &it o t/e /istory o t/e development o data&ase kno$ and di erentiate t/e di erent types o data&ase ans$er t/e *uestion o t/e structure o data&ase.

,.: ,.*

MAIN CONTENT

Found't on! o/ D't'$'!" T"r#!

F ("

++

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

( ile is an ordered arrangement o records in $/ic/ eac/ record is stored in a uni*ue identi ia&le location. @/e se*uence o t/e record is t/en t/e means &y $/ic/ t/e record $ill &e located. In most computer systems' t/e se*uence o records is eit/er alp/a&etic or numeric &ased on ield common to all records suc/ as name or num&er. R"cord! ( record or tuple is a complete set o related ields. ?or eDample' t/e Table ! &elo$ s/o$s a set o related ields' $/ic/ is a record. In ot/er $ords' i t/is $ere to &e a part o a ta&le t/en $e $ould call it a ro$ o data. @/ere ore' a ro$ o data is also a record. T'$(" *
Sr No + Icod" Ord No Ord D't" =-=-899< PDt) +89

73234@ 99<=-::

F "(d ( ield is a property or a c/aracteristic t/at /olds some piece o in ormation a&out an entity. (lso' it is a category o in ormation $it/in a set o records. ?or eDample' t/e irst names' or address or p/one num&ers o people listed in address &ook. R"('t on! In t/e relational data model' t/e data in a data&ase is organiEed in relations. ( relation is synonymous $it/ aKta&leK. ( ta&le consists o columns and ro$s' $/ic/ are re erred as ield and records in "%M2 terms' and attri&utes and tuples in 7elational "%M2 terms. Attr $ut"! (n attri&ute is a property or c/aracteristics t/at /old some in ormation a&out an entity. ( PCustomerK or eDample' /as attri&utes suc/ as a name' and an address.

+8

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

T'$(" +: DBMS 'nd R"('t on'( DBMS T"r#! n Co#<'r !on Co##on T"r# "ata&ase @a&le Column 7o$ DBMS T"r# no(o-) @a&le @a&le ?ield 7ecord RDBMS T"r# no(o-) "ata&ase 7elation (ttri&ute @uple

,.+

C !tor)

@/e earliest kno$n use o t/e term database $as in Novem&er +:.=' $/en t/e 2ystem "evelopment Corporation sponsored a symposium under t/e title Development and Management of a "omputer#centered Data $ase. D't'$'!" as a single $ord &ecame common in Europe in t/e early +:;9s and &y t/e end o t/e decade it $as &eing used in major (merican ne$spapers. B@/e a&&reviation "%' /o$ever' survives.C @/e irst data&ase management systems $ere developed in t/e +:.9s. ( pioneer in t/e ield $as C/arles %ac/man. %ac/manIs early papers s/o$ t/at /is aim $as to make more e ective use o t/e ne$ direct access storage devices &ecoming availa&le5 until t/en' data processing /ad &een &ased on punc/ed cards and magnetic tape' so t/at serial processing $as t/e dominant activity. @$o key data models arose at t/is time5 CO"(2AL developed t/e net$ork model &ased on %ac/manIs ideas' and Bapparently independentlyC t/e /ierarc/ical model $as used in a system developed &y Nort/ (merican 7ock$ell later adopted &y I%M as t/e cornerstone o t/eir IM2 product. W/ile IM2 along $it/ t/e CO"(2AL I"M2 $ere t/e &ig' /ig/ visi&ility data&ases developed in t/e +:.9s' several ot/ers $ere also &orn in t/at decade' some o $/ic/ /ave a signi icant installed &ase today. @/e relational model $as proposed &y E. ?. Codd in +:;9. )e criticiEed eDisting models or con using t/e a&stract description o in ormation structure $it/ descriptions o p/ysical access mec/anisms. ?or a long $/ile' /o$ever' t/e relational model remained o academic interest only. W/ile CO"(2AL products BI"M2C and net$ork model products BIM2C $ere conceived as practical engineering solutions taking account o t/e tec/nology as it eDisted at t/e time' t/e relational model took a muc/ more t/eoretical perspective' arguing BcorrectlyC t/at /ard$are and so t$are tec/nology $ould catc/ up in time. (mong t/e irst implementations $ere Mic/ael 2tone&rakerIs Ingres at %erkeley' and t/e 2ystem 7 project at I%M. %ot/ o t/ese $ere researc/ prototypes' announced during +:;.. @/e irst commercial products' Oracle and "%8' did not appear until around +:<9.

+=

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

"uring t/e +:<9s' researc/ activity ocused on distri&uted data&ase systems and data&ase mac/ines. (not/er important t/eoretical idea $as t/e ?unctional "ata Model' &ut apart rom some specialiEed applications in genetics' molecular &iology' and raud investigation' t/e $orld took little notice. In t/e +::9s' attention s/i ted to o&ject4oriented data&ases. @/ese /ad some success in ields $/ere it $as necessary to /andle more compleD data t/an relational systems could easily cope $it/' suc/ as spatial data&ases' engineering data Bincluding so t$are repositoriesC' and multimedia data. In t/e 8999s' t/e as/iona&le area or innovation is t/e QML data&ase. (s $it/ o&ject data&ases' t/is /as spa$ned a ne$ collection o start4up companies' &ut at t/e same time t/e key ideas are &eing integrated into t/e esta&lis/ed relational products.

,.,

D't'$'!" T)<"!

Considering development in in ormation tec/nology and &usiness applications' t/ese /ave resulted in t/e evolution o several major types o data&ases. ?igure + illustrates several major conceptual categories o data&ases t/at may &e ound in many organiEations. O<"r't on'( D't'$'!" @/ese data&ases store detailed data needed to support t/e &usiness processes and operations o t/e e4&usiness enterprise. @/ey are also called sub%ect area databases B2""%C' transaction database and production databases. EDamples are a customer data&ase' /uman resources data&ases' inventory data&ases' and ot/er data&ases containing data generated &y &usiness operations. @/is includes data&ases on Internet and e4commerce activity suc/ as click stream data& descri&ing t/e online &e/aviour o customers or visitors to a company $e&site. D !tr $ut"d D't'$'!"! Many organiEations replicate and distri&ute copies or parts o data&ases to net$ork servers at a variety o sites. @/ey can also reside in net$ork servers at a variety o sites. @/ese distri&uted data&ases can reside on net$ork servers on t/e World Wide We&' on corporate intranets or eDtranets or on any ot/er company net$orks. "istri&uted data&ases may &e copies o operational or analytic data&ases' /ypermedia or discussion data&ases' or any ot/er type o data&ase. 7eplication and distri&ution o data&ases is done to improve data&ase per ormance and security. Ensuring t/at all o t/e data in an organiEationKs distri&uted data&ases
+,

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

are consistently and currently updated is a major c/allenge o distri&uted data&ase management. F -ur" *: E7'#<("! o/ t1" #'%or t)<"! o/ d't'$'!"! u!"d $) or-'n >'t on! 'nd "nd u!"r!.

Client !C or NC

EDternal "ata&ase on t/e Internet and online services Net$ork 2erver Operational "ata&ases o t/e Org

"istri&uted "ata&ases on On Intranets and ot/er Net$orks

End User "ata&ases

"ata Ware/ouse

"ata Marts

E7t"rn'( D't'$'!"! (ccess to $ealt/ o in ormation rom eDternal data&ases is availa&le or a ee rom conventional online services' and $it/ or $it/out c/arges rom many sources on t/e Internet' especially t/e $orld $ide $e&. We&sites provide an endless variety o /yperlinked pages o multimedia documents in hypermedia databases or you to access. "ata are availa&le in t/e orm o statistics in economics and demograp/ic activity rom statistical data &anks. Or you can vie$ or do$nload a&stracts or complete copies o ne$spapers' magaEines' ne$sletters' researc/ papers' and ot/er pu&lis/ed materials and ot/er periodicals rom bibliographic and full teDt data&ases.

,.8

D't'$'!" Stor'-" Structur"!

"ata&ase ta&les-indeDes are typically stored in memory or on /ard disk in one o many orms' ordered-unordered ?lat iles' I2(M' )eaps' )as/ &uckets or %R @rees. @/ese /ave various advantages and disadvantages discussed in t/is topic. @/e most commonly used are %Rtrees and I2(M.

+1

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

M"t1od! F('t F ("! ( /('t / (" d't'$'!" descri&es any o various means to encode a data model Bmost commonly a ta&leC as a plain teDt ile. ( lat ile is a ile t/at contains records' and in $/ic/ eac/ record is speci ied in a single line. ?ields rom eac/ record may simply /ave a iDed $idt/ $it/ padding' or may &e delimited &y $/itespace' ta&s' commas BC20C or ot/er c/aracters. EDtra ormatting may &e needed to avoid delimiter collision. @/ere are no structural relations/ips. @/e data are M latM as in a s/eet o paper' in contrast to more compleD models suc/ as a relational data&ase. @/e classic eDample o a lat ile data&ase is a &asic name4and4address list' $/ere t/e data&ase consists o a small' iDed num&er o ields5 'ame' Address' and (hone 'umber. (not/er eDample is a simple )@ML ta&le' consisting o ro$s and columns. @/is type o data&ase is routinely encountered' alt/oug/ o ten not eDpressly recogniEed as a data&ase. I#<("#"nt't on: It is possi&le to $rite out &y /and' on a s/eet o paper' a list o names' addresses' and p/one num&ersJ t/is is a lat ile data&ase. @/is can also &e done $it/ any type$riter or $ord processor. %ut many pieces o computer so t$are are designed to implement lat ile data&ases. Unord"r"d storage typically stores t/e records in t/e order t/ey are inserted' $/ile /aving good insertion e iciency' it may seem t/at it $ould /ave ine icient retrieval times' &ut t/is is usually never t/e case as most data&ases use indeDes on t/e primary keys' resulting in e icient retrieval times. Ord"r"d or Linked list storage typically stores t/e records in order and may /ave to rearrange or increase t/e ile siEe in t/e case a record is inserted' t/is is very ine icient. )o$ever is &etter or retrieval as t/e records are pre4sorted BCompleDity OBlogBnCCC. Structur"d / ("! simplest and most &asic met/od 4 4 4 insert e icient' records added at end o ile O Pc/ronologicalK order retrieval ine icient as searc/ing /as to &e linear deletion O deleted records marked

+.

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

4 re*uires periodic reorganiEation i ile is very volatile advantages 4 4 4 good or &ulk loading data good or relatively small relations as indeDing over/eads are avoided good $/en retrievals involve large proportion o records

disadvantages 4 4 not e icient or selective retrieval using key values' especially i large sorting may &e time4consuming

not suita&le or PvolatileK ta&les Hash Buckets )as/ unctions calculate t/e address o t/e page in $/ic/ t/e record is to &e stored &ased on one or more ields in t/e record 4 4 4 )as/ing unctions c/osen to ensure t/at addresses are spread evenly across t/e address space PoccupancyK is generally ,9G O .9G o total ile siEe uni*ue address not guaranteed so collision detection and collision resolution mec/anisms are re*uired

open addressing c/ained-unc/ained over lo$ pros and cons 4 4 4 4 4 e icient or eDact matc/es on key ield not suita&le or range retrieval' $/ic/ re*uires se*uential storage calculates $/ere t/e record is stored &ased on ields in t/e record /as/ unctions ensure even spread o data collisions are possi&le' so collision detection and restoration is re*uired

B+ Trees @/ese are t/e most used in practice. t/e time taken to access any tuple is t/e same &ecause same num&er o nodes searc/ed indeD is a ull indeD so data ile does not /ave to &e ordered !ros and cons
+;

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

4 4 4 4

versatile data structure O se*uential as $ell as random access access is ast supports eDact' range' part key and pattern matc/es e iciently PvolatileK iles are /andled e iciently &ecause indeD is dynamic O eDpands and contracts as ta&le gro$s and s/rinks

Less $ell suited to relatively sta&le iles O in t/is case' I2(M is more e icient.

,.5

D't'$'!" S"r&"r!

( d't'$'!" !"r&"r is a computer program t/at provides data&ase services to ot/er computer programs or computers' as de ined &y t/e client4server model. @/e term may also re er to a computer dedicated to running suc/ a program. "ata&ase management systems re*uently provide data&ase server unctionality' and some "%M2Is Be.g.' My2FLC rely eDclusively on t/e client4server model or data&ase access. In a master4slave model' data&ase master servers are central and primary locations o data $/ile data&ase slave servers are sync/roniEed &ackups o t/e master acting as proDies.

,.?

D't'$'!" R"<( c't on

"ata&ase replication can &e used on many data&ase management systems' usually $it/ a master-slave relations/ip &et$een t/e original and t/e copies. @/e master logs t/e updates' $/ic/ t/en ripple t/roug/ to t/e slaves. @/e slave outputs a message stating t/at it /as received t/e update success ully' t/us allo$ing t/e sending Band potentially re4 sending until success ully appliedC o su&se*uent updates. Multi4master replication' $/ere updates can &e su&mitted to any data&ase node' and t/en ripple t/roug/ to ot/er servers' is o ten desired' &ut introduces su&stantially increased costs and compleDity $/ic/ may make it impractical in some situations. @/e most common c/allenge t/at eDists in multi4master replication is transactional con lict prevention or resolution. Most sync/ronous or eager replication solutions do con lict prevention' $/ile async/ronous solutions /ave to do con lict resolution. ?or instance' i a record is c/anged on t$o nodes simultaneously' an eager replication system $ould detect t/e con lict &e ore con irming t/e commit and a&ort one o t/e transactions. ( laEy replication system $ould allo$ &ot/ transactions to commit and run a con lict resolution during resync/roniEation. "ata&ase replication &ecomes di icult $/en it scales up. Usually' t/e
+<

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

scale up goes $it/ t$o dimensions' /oriEontal and vertical5 /oriEontal scale up /as more data replicas' vertical scale up /as data replicas located urt/er a$ay in distance. !ro&lems raised &y /oriEontal scale up can &e alleviated &y a multi4layer multi4vie$ access protocol. 0ertical scale up runs into less trou&le $/en t/e Internet relia&ility and per ormance are improving.

,.7

R"('t on'( D't'$'!"

( r"('t on'( d't'$'!" is a data&ase t/at con orms to t/e relational model' and re ers to a data&aseIs data and sc/ema Bt/e data&aseIs structure o /o$ t/ose data are arrangedC. @/e term Mrelational data&aseM is sometimes in ormally used to re er to a relational data&ase management system' $/ic/ is t/e so t$are t/at is used to create and use a relational data&ase. @/e term relational database $as originally de ined and coined &y Edgar Codd at I%M (lmaden 7esearc/ Center in +:;9"ontents 2trictly' a relational data&ase is a collection o relations B re*uently called ta&lesC. Ot/er items are re*uently considered part o t/e data&ase' as t/ey /elp to organiEe and structure t/e data' in addition to orcing t/e data&ase to con orm to a set o re*uirements. T"r# no(o-) 7elational data&ase terminology. 7elational data&ase t/eory uses a di erent set o mat/ematical4&ased terms' $/ic/ are e*uivalent' or roug/ly e*uivalent' to 2FL data&ase terminology. @/e ta&le &elo$ summariEes some o t/e most important relational data&ase terms and t/eir 2FL data&ase e*uivalents. R"('t on'( t"r# derived relvar tuple attri&ute R"('t on! or T'$("! ( relation is de ined as a set o tuples t/at /ave t/e same attri&utes ( tuple usually represents an o&ject and in ormation a&out t/at o&ject. O&jects are typically p/ysical o&jects or concepts. ( relation is usually S=L "Du &'("nt vie$' *uery result' result set ro$ column

relation' &ase relvar ta&le

+:

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

descri&ed as a ta&le' $/ic/ is organiEed into ro$s and columns. (ll t/e data re erenced &y an attri&ute are in t/e same domain and con orm to t/e same constraints. @/e relational model speci ies t/at t/e tuples o a relation /ave no speci ic order and t/at t/e tuples' in turn' impose no order on t/e attri&utes. (pplications access data &y speci ying *ueries' $/ic/ use operations suc/ as select to identi y tuples' pro%ect to identi y attri&utes' and %oin to com&ine relations. 7elations can &e modi ied using t/e insert' delete' and update operators. Ne$ tuples can supply eDplicit values or &e derived rom a *uery. 2imilarly' *ueries identi y tuples or updating or deleting. B'!" 'nd D"r &"d R"('t on! In a relational data&ase' all data are stored and accessed via relations. 7elations t/at store data are called M&ase relationsM' and in implementations are called Mta&lesM. Ot/er relations do not store data' &ut are computed &y applying relational operations to ot/er relations. @/ese relations are sometimes called Mderived relationsM. In implementations t/ese are called Mvie$sM or M*ueriesM. "erived relations are convenient in t/at t/oug/ t/ey may gra& in ormation rom several relations' t/ey act as a single relation. (lso' derived relations can &e used as an a&straction layer. Keys ( uni*ue key is a kind o constraint t/at ensures t/at an o&ject' or critical in ormation a&out t/e o&ject' occurs in at most one tuple in a given relation. ?or eDample' a sc/ool mig/t $ant eac/ student to /ave a separate locker. @o ensure t/is' t/e data&ase designer creates a key on t/e locker attri&ute o t/e student relation. 3eys can include more t/an one attri&ute' or eDample' a nation may impose a restriction t/at no province can /ave t$o cities $it/ t/e same name. @/e key $ould include province and city name. @/is $ould still allo$ t$o di erent provinces to /ave a to$n called 2pring ield &ecause t/eir province is di erent. ( key over more t/an one attri&ute is called a compound key. Foreign Keys ( oreign key is a re erence to a key in anot/er relation' meaning t/at t/e re erencing tuple /as' as one o its attri&utes' t/e values o a key in t/e re erenced tuple. ?oreign keys need not /ave uni*ue values in t/e re erencing relation. ?oreign keys e ectively use t/e values o attri&utes in t/e re erenced relation to restrict t/e domain o one or more attri&utes in t/e re erencing relation.

89

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

( oreign key could &e descri&ed ormally as5 M?or all tuples in t/e re erencing relation projected over t/e re erencing attri&utes' t/ere must eDist a tuple in t/e re erenced relation projected over t/ose same attri&utes suc/ t/at t/e values in eac/ o t/e re erencing attri&utes matc/ t/e corresponding values in t/e re erenced attri&utes.M

8.:

CONCLUSION

"ata&ase applications are used to store and manipulate data. ( data&ase application can &e used in many &usiness unctions including sales and inventory tracking' accounting' employee &ene its' payroll' production and more. "ata&ase programs or personal computers come in various s/ape and siEes. ( data&ase remains undamental or t/e implementation o any data&ase management system.

5.:

SUMMARY

( "ata&ase is a structured collection o data t/at is managed to meet t/e needs o a community o users. @/e structure is ac/ieved &y organiEing t/e data according to a data&ase model @/e earliest kno$n use o t/e term database $as in Novem&er +:.=' $/en t/e 2ystem "evelopment Corporation sponsored a symposium under t/e title Development and Management of a "omputer# centered Data $ase. Considering development in in ormation tec/nology and &usiness applications /ave resulted in t/e evolution o several major types o data&ases. "ata&ase ta&les-indeDes are typically stored in memory or on /ard disk in one o many orms' ordered-unordered ?lat iles' I2(M' )eaps' )as/ &uckets or %R @rees ( d't'$'!" !"r&"r is a computer program t/at provides data&ase services to ot/er computer programs or computers' as de ined &y t/e client4server model "ata&ase replication can &e used on many data&ase management systems' usually $it/ a master-slave relations/ip &et$een t/e original and t/e copies ( r"('t on'( d't'$'!" is a data&ase t/at con orms to t/e relational model' and re ers to a data&aseIs data and sc/ema

?.:

TUTOR@MARAED ASSIGNMENT

+. "e ine t/e terms5 ?ield' 7ecords' ?ield 7elation and (ttri&ute

8. %rie ly descri&e a lat ile

8+

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

7.:

REFERENCESBFURTCER READINGS

Codd' E.?. B+:;9C. M( 7elational Model o "ata or Large 2/ared "ata %anksM. "ommunications of the A"M *, B.C5 =;;O=<;. doi5 +9.++,1-=.8=<,.=.8.<1. OK%rien' #ames (. B899=C. B++t/ EditionC Introduction to In ormation 2ystems. McGra$4)ill.

88

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

UNIT ,
CONTENTS +.9 8.9 =.9

DATABASE CONCEPTS

,.9 1.9 ..9 ;.9

Introduction O&jectives Main Content =.+ Create' 7ead' Update and "elete =.8 (CI" =.= 3eys Conclusion 2ummary @utor4Marked (ssignment 7e erences-?urt/er 7eadings

*.:

INTRODUCTION

@/ere are &asic and standard concepts associated $it/ all data&ases' and t/ese are $/at $e $ill discuss in muc/ detail in t/is unit. @/ese include t/e concept o Creating' 7eading' Updating and "eleting BC7U"C data' (CI" BAtomicity& "onsistency& Isolation& DurabilityC' and 3eys o di erent kinds.

+.:

OB;ECTIVES

(t t/e end o t/is unit' you s/ould &e a&le to5 kno$ t/e meaning o t/e acronymn C7U" understand t/e applications o data&ases kno$ t/e meaning o t/e acronymn (CI" and /o$ eac/ mem&ers o t/e (CI" di er rom eac/ ot/er understand t/e structure o a data&ase kno$ t/e types o keys associated $it/ data&ases.

,.: ,.*

MAIN CONTENT Cr"'t"E R"'dE U<d't" 'nd D"("t"

Create' read' update and delete BCRUDC are t/e our &asic unctions o persistent storage a major part o nearly all computer so t$are. 2ometimes ")*D is eDpanded $it/ t/e $ords retrieve instead o read or destroys instead o delete. It is also sometimes used to descri&e user inter ace conventions t/at acilitate vie$ing' searc/ing' and c/anging in ormationJ o ten using computer4&ased orms and reports. (lternate terms or C7U" Bone initialism and t/ree acronymsC5
8=

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

(%C"5 add' &ro$se' c/ange' delete (CI"5 add' c/ange' in*uire' delete S t/oug/ t/is can &e con used $it/ t/e transactional use o t/e acronym (CI". %7E("5 &ro$se' read' edit' add' delete 0("EB7C5 vie$' add' delete' edit Band restore' or systems supporting transaction processingC D't'$'!" A<<( c't on! @/e acronym ")*D re ers to all o t/e major unctions t/at need to &e implemented in a relational data&ase application to consider it complete. Eac/ letter in t/e acronym can &e mapped to a standard 2FL statement5 O<"r't on Create 7ead B7etrieveC Update "elete B"estroyC S=L IN2E7@ 2ELEC@ U!"(@E "ELE@E

(lt/oug/ a relational data&ase is a common persistence layer in so t$are applications' t/ere are numerous ot/ers. C7U" can &e implemented $it/ an o&ject data&ase' an QML data&ase' lat teDt iles' custom ile ormats' tape' or card' or eDample. Google 2c/olar lists t/e irst re erence to create4read4update4delete as &y 3ilov in +::9. @/e concept seems to &e also descri&ed in more detail in 3ilovIs +::< &ook. U!"r Int"r/'c" C7U" is also relevant at t/e user inter ace level o most applications. ?or eDample' in address &ook so t$are' t/e &asic storage unit is an individual contact entry. (s a &are minimum' t/e so t$are must allo$ t/e user to5 Create or add ne$ entries 7ead' retrieve' searc/' or vie$ eDisting entries Update or edit eDisting entries "elete eDisting entries Wit/out at least t/ese our operations' t/e so t$are cannot &e considered complete. %ecause t/ese operations are so undamental' t/ey are o ten documented and descri&ed under one compre/ensive /eading' suc/ as Mcontact managementM or Mcontact maintenanceM Bor Mdocument managementM in general' depending on t/e &asic storage unit or t/e particular applicationC.
8,

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

,.+

ACID

In computer science' ACID BAtomicity& "onsistency& Isolation& DurabilityC is a set o properties t/at guarantee t/at data&ase transactions are processed relia&ly. In t/e conteDt o data&ases' a single logical operation on t/e data is called a transaction. (n eDample o a transaction is a trans er o unds rom one account to anot/er' even t/oug/ it mig/t consist o multiple individual operations Bsuc/ as de&iting one account and crediting anot/erC. Ato# c t) (tomicity re ers to t/e a&ility o t/e "%M2 to guarantee t/at eit/er all o t/e tasks o a transaction are per ormed or none o t/em are. ?or eDample' t/e trans er o unds can &e completed or it can ail or a multitude o reasons' &ut atomicity guarantees t/at one account $onIt &e de&ited i t/e ot/er is not credited. (tomicity states t/at data&ase modi ications must ollo$ an Tall or not/ingU rule. Eac/ transaction is said to &e Tatomic.U I one part o t/e transaction ails' t/e entire transaction ails. It is critical t/at t/e data&ase management system maintain t/e atomic nature o transactions in spite o any "%M2' operating system or /ard$are ailure. Con! !t"nc) Consistency property ensures t/at t/e data&ase remains in a consistent state &e ore t/e start o t/e transaction and a ter t/e transaction is over B$/et/er success ul or notC. Consistency states t/at only valid data $ill &e $ritten to t/e data&ase. I ' or some reason' a transaction is eDecuted t/at violates t/e data&aseKs consistency rules' t/e entire transaction $ill &e rolled &ack and t/e data&ase $ill &e restored to a state consistent $it/ t/ose rules. On t/e ot/er /and' i a transaction success ully eDecutes' it $ill take t/e data&ase rom one state t/at is consistent $it/ t/e rules to anot/er state t/at is also consistent $it/ t/e rules. Dur'$ ( t) "ura&ility re ers to t/e guarantee t/at once t/e user /as &een noti ied o success' t/e transaction $ill persist' and not &e undone. @/is means it $ill survive system ailure' and t/at t/e data&ase system /as c/ecked t/e integrity constraints and $onIt need to a&ort t/e transaction. Many data&ases implement dura&ility &y $riting all transactions into a log t/at can &e played &ack to recreate t/e system state rig/t &e ore t/e ailure. ( transaction can only &e deemed committed a ter it is sa ely in t/e log.
81

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

I#<("#"nt't on Implementing t/e (CI" properties correctly is not simple. !rocessing a transaction o ten re*uires a num&er o small c/anges to &e made' including updating indices t/at are used &y t/e system to speed up searc/es. @/is se*uence o operations is su&ject to ailure or a num&er o reasonsJ or instance' t/e system may /ave no room le t on its disk drives' or it may /ave used up its allocated C!U time. (CI" suggests t/at t/e data&ase &e a&le to per orm all o t/ese operations at once. In act t/is is di icult to arrange. @/ere are t$o popular amilies o tec/ni*ues5 $rite a/ead logging and s/ado$ paging. In &ot/ cases' locks must &e ac*uired on all in ormation t/at is updated' and depending on t/e implementation' on all data t/at is &eing read. In $rite a/ead logging' atomicity is guaranteed &y ensuring t/at in ormation a&out all c/anges is $ritten to a log &e ore it is $ritten to t/e data&ase. @/at allo$s t/e data&ase to return to a consistent state in t/e event o a cras/. In s/ado$ing' updates are applied to a copy o t/e data&ase' and t/e ne$ copy is activated $/en t/e transaction commits. @/e copy re ers to unc/anged parts o t/e old version o t/e data&ase' rat/er t/an &eing an entire duplicate. Until recently almost all data&ases relied upon locking to provide (CI" capa&ilities. @/is means t/at a lock must al$ays &e ac*uired &e ore processing data in a data&ase' even on read operations. Maintaining a large num&er o locks' /o$ever' results in su&stantial over/ead as $ell as /urting concurrency. I user ( is running a transaction t/at /as read a ro$ o data t/at user % $ants to modi y' or eDample' user % must $ait until user (Is transaction is inis/ed. (n alternative to locking is multiversion concurrency control in $/ic/ t/e data&ase maintains separate copies o any data t/at is modi ied. @/is allo$s users to read data $it/out ac*uiring any locks. Going &ack to t/e eDample o user ( and user %' $/en user (Is transaction gets to data t/at user % /as modi ied' t/e data&ase is a&le to retrieve t/e eDact version o t/at data t/at eDisted $/en user ( started t/eir transaction. @/is ensures t/at user ( gets a consistent vie$ o t/e data&ase even i ot/er users are c/anging data t/at user ( needs to read. ( natural implementation o t/is idea results in a relaDation o t/e isolation property' namely snaps/ot isolation. It is di icult to guarantee (CI" properties in a net$ork environment. Net$ork connections mig/t ail' or t$o users mig/t $ant to use t/e same part o t/e data&ase at t/e same time. @$o4p/ase commit is typically applied in distri&uted transactions to ensure t/at eac/ participant in t/e transaction agrees on $/et/er t/e transaction s/ould &e committed or not.
8.

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

Care must &e taken $/en running transactions in parallel. @$o p/ase locking is typically applied to guarantee ull isolation.

,.,

A")!

,.,.* For" -n A")


In t/e conteDt o relational data&ases' a oreign key is a re erential constraint &et$een t$o ta&les. @/e oreign key identi ies a column or a set o columns in one Bre erencingC ta&le t/at re ers to a column or set o columns in anot/er Bre erencedC ta&le. @/e columns in t/e re erencing ta&le must &e t/e primary key or ot/er candidate key in t/e re erenced ta&le. @/e values in one ro$ o t/e re erencing columns must occur in a single ro$ in t/e re erenced ta&le. @/us' a ro$ in t/e re erencing ta&le cannot contain values t/at donIt eDist in t/e re erenced ta&le BeDcept potentially NULLC. @/is $ay re erences can &e made to link in ormation toget/er and it is an essential part o data&ase normaliEation. Multiple ro$s in t/e re erencing ta&le may re er to t/e same ro$ in t/e re erenced ta&le. Most o t/e time' it re lects t/e one Bmaster ta&le' or re erenced ta&leC to many Bc/ild ta&le' or re erencing ta&leC relations/ip. @/e re erencing and re erenced ta&le may &e t/e same ta&le' i.e. t/e oreign key re ers &ack to t/e same ta&le. 2uc/ a oreign key is kno$n in 2FL5899= as !"(/@r"/"r"nc n- or r"cur! &" oreign key. ( ta&le may /ave multiple oreign keys' and eac/ oreign key can /ave a di erent re erenced ta&le. Eac/ oreign key is en orced independently &y t/e data&ase system. @/ere ore' cascading relations/ips &et$een ta&les can &e esta&lis/ed using oreign keys. Improper oreign key-primary key relations/ips or not en orcing t/ose relations/ips are o ten t/e source o many data&ase and data modeling pro&lems. R"/"r"nt '( Act on! %ecause t/e "%M2 en orces re erential constraints' it must ensure data integrity i ro$s in a re erenced ta&le are to &e deleted Bor updatedC. I dependent ro$s in re erencing ta&les still eDist' t/ose re erences /ave to &e considered. 2FL5 899= speci ies 1 di erent r"/"r"nt '( 'ct on! t/at s/all take place in suc/ occurrences5 C(2C("E 7E2@7IC@ NO (C@ION 2E@ NULL
8;

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

2E@ "E?(UL@ CASCADE W/enever ro$s in t/e master Bre erencedC ta&le are deleted' t/e respective ro$s o t/e c/ild Bre erencingC ta&le $it/ a matc/ing oreign key column $ill get deleted as $ell. ( oreign key $it/ a cascade delete means t/at i a record in t/e parent ta&le is deleted' t/en t/e corresponding records in t/e c/ild ta&le $ill automatically &e deleted. @/is is called a cascade delete. EDample @a&les5 CustomerBcustomerVid'cname'caddressCand OrderBcustomerVid'products'paymentC Customer is t/e master ta&le and Order is t/e c/ild ta&le' $/ere IcustomerVidI is t/e oreign key in Order and represents t/e customer $/o placed t/e order. W/en a ro$ o Customer is deleted' any Order ro$ matc/ing t/e deleted CustomerIs customerVid $ill also &e deleted. t/e values are deleted in t/e ro$ like i $e delete one ro$ in t/e parent ta&le t/en t/e same ro$ in t/e c/ild ta&le $ill &e automatically deleted. RESTRICT ( ro$ in t/e re erenced ta&le cannot &e updated or deleted i dependent ro$s still eDist. In t/at case' no data c/ange is even attempted and s/ould not &e allo$ed. NO ACTION @/e U!"(@E or "ELE@E 2FL statement is eDecuted on t/e re erenced ta&le. @/e "%M2 veri ies at t/e end o t/e statement eDecution i none o t/e re erential relations/ips is violated. @/e major di erence to 7E2@7IC@ is t/at triggers or t/e statement semantics itsel may give a result in $/ic/ no oreign key relations/ips is violated. @/en' t/e statement can &e eDecuted success ully. SET NULL @/e oreign key values in t/e re erencing ro$ are set to NULL $/en t/e re erenced ro$ is updated or deleted. @/is is only possi&le i t/e respective columns in t/e re erencing ta&le are nulla&le. "ue to t/e semantics o NULL' a re erencing ro$ $it/ NULLs in t/e oreign key columns does not re*uire a re erenced ro$.

8<

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

SET DEFAULT 2imilarly to 2E@ NULL' t/e oreign key values in t/e re erencing ro$ are set to t/e column de ault $/en t/e re erenced ro$ is updated or deleted.

,.,.+ C'nd d't" A")


In t/e relational model' a c'nd d't" 0") o a relvar Brelation varia&leC is a set o attri&utes o t/at relvar suc/ t/at at all times it /olds in t/e relation assigned to t/at varia&le t/at t/ere are no t$o distinct turples $it/ t/e same values or t/ese attri&utes and t/ere is not a proper su&set o t/is set o attri&utes or $/ic/ B+C /olds. 2ince a superkey is de ined as a set o attri&utes or $/ic/ B+C /olds' $e can also de ine a candidate key as a minimal superkey' i.e. a superkey o $/ic/ no proper su&set is also a superkey. @/e importance o candidate keys is t/at t/ey tell us /o$ $e can identi y individual tuples in a relation. (s suc/ t/ey are one o t/e most important types o data&ase constraint t/at s/ould &e speci ied $/en designing a data&ase sc/ema. 2ince a relation is a set Bno duplicate elementsC' it /olds t/at every relation $ill /ave at least one candidate key B&ecause t/e entire /eading is al$ays a superkeyC. 2ince in some 7"%M2s ta&les may also represent multisets B$/ic/ strictly means t/ese "%M2s are not relationalC' it is an important design rule to speci y eDplicitly at least one candidate key or eac/ relation. ?or practical reasons 7"%M2s usually re*uire t/at or eac/ relation one o its candidate keys is declared as t/e primary key' $/ic/ means t/at it is considered as t/e pre erred $ay to identi y individual tuples. ?oreign keys' or eDample' are usually re*uired to re erence suc/ a primary key and not any o t/e ot/er candidate keys. D"t"r# n n- C'nd d't" A")! @/e previous eDample only illustrates t/e de inition o candidate key and not /o$ t/ese are in practice determined. 2ince most relations /ave a large num&er or even in initely many instances it $ould &e impossi&le to determine all t/e sets o attri&utes $it/ t/e uni*ueness property or eac/ instance. Instead it is easier to consider t/e sets o real4$orld entities t/at are represented &y t/e relation and determine $/ic/ attri&utes o t/e entities uni*uely identi y t/em. ?or eDample a relation +mployeeB'ame' Address' DeptC pro&a&ly represents employees and t/ese are likely to &e uni*uely identi ied &y a com&ination o 'ame and Address $/ic/ is t/ere ore a superkey' and unless t/e same /olds or only 'ame or only Address' t/en t/is com&ination is also a candidate key.
8:

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

In order to determine correctly t/e candidate keys it is important to determine all superkeys' $/ic/ is especially di icult i t/e relation represents a set o relations/ips rat/er t/an a set o entities

,.,., Un Du" 0")


In relational data&ase design' a un Du" 0") or <r #'r) 0") is a candidate key to uni*uely identi y eac/ ro$ in a ta&le. ( uni*ue key or primary key comprises a single column or set o columns. No t$o distinct ro$s in a ta&le can /ave t/e same value Bor com&ination o valuesC in t/ose columns. "epending on its design' a ta&le may /ave ar&itrarily many uni*ue keys &ut at most one primary key. ( uni*ue key must uni*uely identi y all possible ro$s t/at eDist in a ta&le and not only t/e currently eDisting ro$s. EDamples o uni*ue keys are 2ocial 2ecurity num&ers Bassociated $it/ a speci ic personC or I2%Ns Bassociated $it/ a speci ic &ookC. @elep/one &ooks and dictionaries cannot use names or $ords or "e$ey "ecimal system num&ers as candidate keys &ecause t/ey do not uni*uely identi y telep/one num&ers or $ords. ( primary key is a special case o uni*ue keys. @/e major di erence is t/at or uni*ue keys t/e implicit NO@ NULL constraint is not automatically en orced' $/ile or primary keys it is. @/us' t/e values in a uni*ue key column may or may not &e NULL. (not/er di erence is t/at primary keys must &e de ined using anot/er syntaD. @/e relational model' as eDpressed t/roug/ relational calculus and relational alge&ra' does not distinguis/ &et$een primary keys and ot/er kinds o keys. !rimary keys $ere added to t/e 2FL standard mainly as a convenience to t/e application programmer. Uni*ue keys as $ell as primary keys can &e re erenced &y orm

,.,.8 Su<"r0")
A !u<"r0") is de ined in t/e relational model o data&ase organiEation as a set o attri&utes o a relation varia&le BrelvarC or $/ic/ it /olds t/at in all relations assigned to t/at varia&le t/ere are no t$o distinct tuples Bro$sC t/at /ave t/e same values or t/e attri&utes in t/is set. E*uivalently a superkey can also &e de ined as a set o attri&utes o a relvar upon $/ic/ all attri&utes o t/e relvar are unctionally dependent. Note t/at i attri&ute set , is a superkey o relvar )' t/en at all times it is t/e case t/at t/e projection o ) over , /as t/e same cardinality as ) itsel .

=9

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

In ormally' a superkey is a set o columns $it/in a ta&le $/ose values can &e used to uni*uely identi y a ro$. ( candidate key is a minimal set o columns necessary to identi y a ro$' t/is is also called a minimal superkey. ?or eDample' given an employee ta&le' consisting o t/e columns employeeI"' name' jo&' and departmentI"' $e could use t/e employeeI" in com&ination $it/ any or all ot/er columns o t/is ta&le to uni*uely identi y a ro$ in t/e ta&le. EDamples o superkeys in t/is ta&le $ould &e WemployeeI"' NameX' WemployeeI"' Name' jo&X' and WemployeeI"' Name' jo&' departmentI"X. In a real data&ase $e donIt need values or all o t/ose columns to identi y a ro$. We only need' per our eDample' t/e set WemployeeI"X. @/is is a minimal superkey O t/at is' a minimal set o columns t/at can &e used to identi y a single ro$. 2o' employeeI" is a candidate key. E7'#<(" En-( !1 Mon'rc1! Mon'rc1 N'#" Ed$ard Ed$ard 7ic/ard )enry Mon'rc1 Nu#$"r II III II I0 Ro)'( Cou!" !lantagenet !lantagenet !lantagenet Lancaster

In t/is eDample' t/e possi&le superkeys are5 WMonarc/ Name' Monarc/ Num&erX WMonarc/ Name' Monarc/ Num&er' 7oyal )ouseX

,.,.8 Surro-'t" 0")


( !urro-'t" 0") in a data&ase is a uni*ue identi ier or eit/er an entity in t/e modeled $orld or an ob%ect in t/e data&ase. @/e surrogate key is not derived rom application data. D"/ n t on @/ere appear to &e t$o de initions o a surrogate in t/e literature. We s/all call t/ese surrogate -!. and surrogate -/.5 Surro-'t" 5*6 @/is de inition is &ased on t/at given &y )all' O$lett and @odd B+:;.C. )ere a surrogate represents an entity in t/e outside $orld. @/e surrogate is internally generated &y t/e system &ut is nevert/eless visi&le &y t/e user or application.
=+

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

Surro-'t" 5+6 @/is de inition is &ased on t/at given &y Wieringa and de #ung B+::+C. )ere a surrogate represents an ob%ect in t/e data&ase itsel . @/e surrogate is internally generated &y t/e system and is invisi&le to t/e user or application. We s/all adopt t/e surrogate -!. de inition t/roug/out t/is article largely &ecause it is more data model rat/er t/an storage model oriented. 2ee "ate B+::<C. (n important distinction eDists &et$een a surrogate and a primary key' depending on $/et/er t/e data&ase is a current data&ase or a temporal data&ase. ( current database stores only currently valid data' t/ere ore t/ere is a one4to4one correspondence &et$een a surrogate in t/e modelled $orld and t/e primary key o some o&ject in t/e data&aseJ in t/is case t/e surrogate may &e used as a primary key' resulting in t/e term surrogate key. )o$ever' in a temporal data&ase t/ere is a many4to4 one relations/ip &et$een primary keys and t/e surrogate. 2ince t/ere may &e several o&jects in t/e data&ase corresponding to a single surrogate' $e cannot use t/e surrogate as a primary keyJ anot/er attri&ute is re*uired' in addition to t/e surrogate' to uni*uely identi y eac/ o&ject. (lt/oug/ )all et alia B+:;.C say not/ing a&out t/is' other aut/ors /ave argued t/at a surrogate s/ould /ave t/e ollo$ing constraints5 t/e value is uni*ue system4$ide' /ence never reusedJ t/e value is system generatedJ t/e value is not manipula&le &y t/e user or applicationJ t/e value contains no semantic meaningJ t/e value is not visi&le to t/e user or applicationJ t/e value is not composed o several values rom di erent domains. Surro-'t"! n Pr'ct c" In a current data&ase' t/e surrogate key can &e t/e primary key' generated &y t/e data&ase management system and not derived rom any application data in t/e data&ase. @/e only signi icance o t/e surrogate key is to act as t/e primary key. It is also possi&le t/at t/e surrogate key eDists in addition to t/e data&ase4generated uuid' e.g. a )7 num&er or eac/ employee &esides t/e UUI" o eac/ employee. ( surrogate key is re*uently a se*uential num&er Be.g. a 2y&ase or 2FL 2erver Midentity columnM' a !ostgre2FL serial' an Oracle 2EFUENCE or a column de ined $it/ (U@OVINC7EMEN@ in My2FLC &ut doesnIt
=8

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

/ave to &e. )aving t/e key independent o all ot/er columns insulates t/e data&ase relations/ips rom c/anges in data values or data&ase design Bmaking t/e data&ase more agileC and guarantees uni*ueness. In a temporal data&ase' it is necessary to distinguis/ &et$een t/e surrogate key and t/e primary key. @ypically' every ro$ $ould /ave &ot/ a primary key and a surrogate key. @/e primary key identi ies t/e uni*ue ro$ in t/e data&ase' t/e surrogate key identi ies t/e uni*ue entity in t/e modelled $orldJ t/ese t$o keys are not t/e same. ?or eDample' ta&le 0taff may contain t$o ro$s or M#o/n 2mit/M' one ro$ $/en /e $as employed &et$een +::9 and +:::' anot/er ro$ $/en /e $as employed &et$een 899+ and 899.. @/e surrogate key is identical Bnon4 uni*ueC in &ot/ ro$s /o$ever t/e primary key 1ill &e uni*ue. 2ome data&ase designers use surrogate keys religiously regardless o t/e suita&ility o ot/er candidate keys' $/ile ot/ers $ill use a key already present in t/e data' i t/ere is one. ( surrogate may also &e called a surrogate key' entity identi ier' system4generated key' data&ase se*uence num&er' synt/etic key' tec/nical key' or ar&itrary uni*ue identi ier. 2ome o t/ese terms descri&e t/e $ay o generating ne$ surrogate values rat/er t/an t/e nature o t/e surrogate concept.

8.:

CONCLUSION

@/e undamental concepts t/at guide t/e operation o a data&ase' t/at is' C7U" and (CI" remains t/e same irrespective o t/e types and models o data&ases t/at emerge &y t/e day. )o$ever' one cannot rule out t/e possi&ilities o ot/er concepts emerging $it/ time in t/e near uture.

5.:

SUMMARY

Create' read' update and delete BCRUDC are t/e our &asic unctions o persistent storage a major part o nearly all computer so t$are. In computer science' ACID BAtomicity& "onsistency& Isolation& DurabilityC is a set o properties t/at guarantee t/at data&ase transactions are processed relia&ly. In t/e conteDt o data&ases' a single logical operation on t/e data is called a transaction.

==

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

In t/e conteDt o relational data&ases a oreign key is a re erential constraint &et$een t$o ta&les In t/e relational model' a c'nd d't" 0") o a relvar Brelation varia&leC is a set o attri&utes o t/at relvar suc/ t/at at all times it /olds in t/e relation assigned to t/at varia&le t/at t/ere are no t$o distinct tuples $it/ t/e same values or t/ese attri&utes In relational data&ase design' a un Du" 0") or <r #'r) 0") is a candidate key to uni*uely identi y eac/ ro$ in a ta&le Su<"r0"): A !u<"r0") is de ined in t/e relational model o data&ase organiEation as a set o attri&utes o a relation varia&le BrelvarC or $/ic/ it /olds t/at in all relations assigned to t/at varia&le t/ere are no t$o distinct tuples Bro$sC t/at /ave t/e same values or t/e attri&utes in t/is set ( !urro-'t" 0") in a data&ase is a uni*ue identi ier or eit/er an entity in t/e modeled $orld or an ob%ect in t/e data&ase.

?.:

TUTOR@MARAED ASSIGNMENT

+. W/at are t/e meaning o t/e acronyms C7U" and (CI" 8. W/at are t/e constraints associated $it/ surrogate keys

7.:

REFERENCESBFURTCER READINGS

Nijssen' G.M. B+:;.C. Modelling in Data $ase Management 0ystems. Nort/4)olland !u&. Co. I2%N 94;89,49,1:48. Engles' 7.W.5 B+:;8C. A Tutorial on Data#$ase 2rgani3ation' (nnual 7evie$ in (utomatic !rogramming' 0ol.;' !art +' !ergamon !ress' OD ord' pp. +O.,. Lange ors' %5 B+:.<C. +lementary 4iles and +lementary 4ile )ecords' !roceedings o ?ile .<' an I?I!-I(G International 2eminar on ?ile Organisation' (msterdam' Novem&er' pp. <:O:.. @/e Identi ication o O&jects and 7oles5 O&ject Identi iers 7evisited &y Wieringa and de #ung B+::+C. 7elational "ata&ase Writings +::,O+::; &y C.#. "ate B+::<C' C/apters ++ and +8. Carter' %reck. MIntelligent 0ersus 2urrogate 3eysM. 7etrieved on 899.4+849=. 7ic/ardson' Lee. MCreate "ata "isaster5 (void Uni*ue IndeDes O BMistake = o +9CM. %erkus' #os/. M"ata&ase 2oup5 !rimary 3eyvil' !art IM.

=,

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

Gray' #im B2eptem&er +:<+C. M@/e @ransaction Concept5 0irtues and LimitationsM. (roceedings of the 5th International "onference on 6ery 7arge Data $ases5 pages +,,O+1,' +:=== 0allco !ark$ay' Cupertino C( :19+,5 @andem Computers. #im Gray L (ndreas 7euter' "istri&uted @ransaction !rocessing5 Concepts and @ec/ni*ues' Morgan 3au man +::=. I2%N +11<.9+:98. "ate' C/ristop/er B899=C. M15 IntegrityM' An Introduction to Database 0ystems. (ddison4Wesley' pp. 8.<48;.. I2%N :;<49=8++<:1.+.

=1

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

UNIT 8
CONTENTS +.9 8.9 =.9

DATABASE MODELS *

,.9 1.9 ..9 ;.9

Introduction O&jectives Main Content =.+ )ierarc/ical Model =.8 Net$ork Model =.= O&ject47elational "ata&ase =., O&ject "ata&ase =.1 (ssociative Model o "ata =.. Column4Oriented "%M2 =.; Navigational "ata&ase =.< "istri&uted "ata&ase =.: 7eal @ime "ata&ase Conclusion 2ummary @utor4Marked (ssignment 7e erences-?urt/er 7eadings

*.:

INTRODUCTION

2everal models /ave evolved in t/e course o development o data&ases and data&ase management system. @/is /as resulted in several orms o models deployed &y users depending on t/eir needs and understanding. In t/is unit $e set t/e pace to Q4ray t/ese models and conclude in su&se*uent unit.

+.:

OB;ECTIVES

(t t/e end o t/is unit' you s/ould &e a&le to5 kno$ and de ine t/e di erent types o data&ase models di erentiate t/e data&ase models rom eac/ ot/er sketc/ t/e rame$ork o /ierarc/ical and net$ork models understand t/e concepts and model &e/ind t/e models kno$ t/e advantages and disadvantages o t/e di erent models.

,.: ,.*

MAIN CONTENT C "r'rc1 c'( Mod"(

In a /ierarc/ical model' data is organiEed into an inverted tree4like structure' implying a multiple do$n$ard link in eac/ node to descri&e t/e nesting' and a sort ield to keep t/e records in a particular order in
=.

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

eac/ same4level list. @/is structure arranges t/e various data elements in a /ierarc/y and /elps to esta&lis/ logical relations/ips among data elements o multiple iles. Eac/ unit in t/e model is a record $/ic/ is also kno$n as a node. In suc/ a model' eac/ record on one level can &e related to multiple records on t/e neDt lo$er level. ( record t/at /as su&sidiary records is called a parent and t/e su&sidiary records are called c/ildren. "ata elements in t/is model are $ell suited or one4to4many relations/ips $it/ ot/er data elements in t/e data&ase. F -ur" *: A C "r'rc1 c'( Structur"
"epartment "ata Element

!roject ( "ata Element

!roject % "ata Element

Employee + "ata Element

Employee % "ata Element

@/is model is advantageous $/en t/e data elements are in/erently /ierarc/ical. @/e disadvantage is t/at in order to prepare t/e data&ase it &ecomes necessary to identi y t/e re*uisite groups o iles t/at are to &e logically integrated. )ence' a /ierarc/ical data model may not al$ays &e leDi&le enoug/ to accommodate t/e dynamic needs o an organiEation. E7'#<(" (n eDample o a 1 "r'rc1 c'( d't' #od"( $ould &e i an organiEation /ad records o employees in a ta&le Bentity typeC called MEmployeesM. In t/e ta&le t/ere $ould &e attri&utes-columns suc/ as ?irst Name' Last Name' #o& Name and Wage. @/e company also /as data a&out t/e employeeKs c/ildren in a separate ta&le called MC/ildrenM $it/ attri&utes suc/ as ?irst Name' Last Name' and date o &irt/. @/e Employee ta&le represents a parent segment and t/e C/ildren ta&le represents a C/ild segment. @/ese t$o segments orm a /ierarc/y $/ere an employee may /ave many c/ildren' &ut eac/ c/ild may only /ave one parent.

=;

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

Consider t/e ollo$ing structure5 E#<No D"! -n't on +9 89 =9 ,9 "irector 2enior Manager +9 @ypist !rogrammer 89 89 R"<ort!To

In t/is' t/e Mc/ildM is t/e same type as t/e MparentM. @/e /ierarc/y stating EmpNo +9 is &oss o 89' and =9 and ,9 eac/ report to 89 is represented &y t/e M7eports@oM column. In 7elational data&ase terms' t/e 7eports@o column is a oreign key re erencing t/e EmpNo column. I t/e Mc/ildM data type $ere di erent' it $ould &e in a di erent ta&le' &ut t/ere $ould still &e a oreign key re erencing t/e EmpNo column o t/e employees ta&le. @/is simple model is commonly kno$n as t/e adjacency list model' and $as introduced &y "r. Edgar ?. Codd a ter initial criticisms sur aced t/at t/e relational model could not model /ierarc/ical data.

,.+

N"t2or0 Mod"(

In t/e net$ork model' records can participate in any num&er o named relations/ips. Eac/ relations/ip associates a record o one type Bcalled t/e o2n"rC $it/ multiple records o anot/er type Bcalled t/e #"#$"rC. @/ese relations/ips Bsome$/at con usinglyC are called !"t!. ?or eDample a student mig/t &e a mem&er o one set $/ose o$ner is t/e course t/ey are studying' and a mem&er o anot/er set $/ose o$ner is t/e college t/ey &elong to. (t t/e same time t/e student mig/t &e t/e o$ner o a set o email addresses' and o$ner o anot/er set containing p/one num&ers. @/e main di erence &et$een t/e net$ork model and /ierarc/ical model is t/at in a net$ork model' a c/ild can /ave a num&er o parents $/ereas in a /ierarc/ical model' a c/ild can /ave only one parent. @/e /ierarc/ical model is t/ere ore a su&set o t/e net$ork model.

=<

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

F -ur" ,: N"t2or0 Structur" "epartment ( "epartment %

2tudent (

2tudent %

2tudent C

!roject (

!roject %

!rogrammatic access to net$ork data&ases is traditionally &y means o a navigational data manipulation language' in $/ic/ programmers navigate rom a current record to ot/er related records using ver&s suc/ as find o1ner' find next' and find prior. @/e most common eDample o suc/ an inter ace is t/e CO%OL4&ased "ata Manipulation Language de ined &y CO"(2AL. Net$ork data&ases are traditionally implemented &y using c/ains o pointers &et$een related records. @/ese pointers can &e node num&ers or disk addresses. @/e net$ork model &ecame popular &ecause it provided considera&le leDi&ility in modelling compleD data relations/ips' and also o ered /ig/ per ormance &y virtue o t/e act t/at t/e access ver&s used &y programmers mapped directly to pointer4 ollo$ing in t/e implementation. @/e net$ork model provides greater advantage t/an t/e /ierarc/ical model in t/at it promotes greater leDi&ility and data accessi&ility' since records at a lo$er level can &e accessed $it/out accessing t/e records a&ove t/em. @/is model is more e icient t/an /ierarc/ical model' easier to understand and can &e applied to many real $orld pro&lems t/at re*uire routine transactions. @/e disadvantages are t/at5 It is a compleD process to design and develop a net$ork data&aseJ It /as to &e re ined re*uentlyJ It re*uires t/at t/e relations/ips among all t/e records &e de ined &e ore development starts' and c/anges o ten demand major programming e ortsJ Operation and maintenance o t/e net$ork model is eDpensive and time consuming. EDamples o data&ase engines t/at /ave net$ork model capa&ilities are 7"M Em&edded and 7"M 2erver.

=:

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

)o$ever' t/e model /ad several disadvantages. Net$orkl programming proved error4prone as data models &ecame more compleD' and small c/anges to t/e data structure could re*uire c/anges to many programs. (lso' &ecause o t/e use o p/ysical pointers' operations suc/ as data&ase loading and restructuring could &e very time4consuming. Conc"<t 'nd C !tor): @/e net$ork model is a data&ase model conceived as a leDi&le $ay o representing o&jects and t/eir relations/ips. Its original inventor $as C/arles %ac/man' and it $as developed into a standard speci ication pu&lis/ed in +:.: &y t/e CO"(2AL Consortium. W/ere t/e /ierarc/ical model structures data as a tree o records' $it/ eac/ record /aving one parent record and many c/ildren' t/e net$ork model allo$s eac/ record to /ave multiple parent and c/ild records' orming a lattice structure. @/e c/ie argument in avour o t/e net$ork model' in comparison to t/e /ierarc/ic model' $as t/at it allo$ed a more natural modeling o relations/ips &et$een entities. (lt/oug/ t/e model $as $idely implemented and used' it ailed to &ecome dominant or t$o main reasons. ?irstly' I%M c/ose to stick to t/e /ierarc/ical model $it/ semi4 net$ork eDtensions in t/eir esta&lis/ed products suc/ as IM2 and "L-I. 2econdly' it $as eventually displaced &y t/e relational model' $/ic/ o ered a /ig/er4level' more declarative inter ace. Until t/e early +:<9s t/e per ormance &ene its o t/e lo$4level navigational inter aces o ered &y /ierarc/ical and net$ork data&ases $ere persuasive or many large4 scale applications' &ut as /ard$are &ecame aster' t/e eDtra productivity and leDi&ility o t/e relational model led to t/e gradual o&solescence o t/e net$ork model in corporate enterprise usage.

,.,

O$%"ct@R"('t on'( D't'$'!"

(n o&ject4relational data&ase BO7"C or o&ject4relational data&ase management system BO7"%M2C is a data&ase management system B"%M2C similar to a relational data&ase' &ut $it/ an o&ject4oriented data&ase model5 o&jects' classes and in/eritance are directly supported in data&ase sc/emas and in t/e *uery language. In addition' it supports eDtension o t/e data model $it/ custom data4types and met/ods. One aim or t/is type o system is to &ridge t/e gap &et$een conceptual data modeling tec/ni*ues suc/ as Entity4relations/ip diagram BE7"C and o&ject4relational mapping BO7MC' $/ic/ o ten use classes and in/eritance' and relational data&ases' $/ic/ do not directly support t/em. (not/er' related' aim is to &ridge t/e gap &et$een relational data&ases and t/e o&ject4oriented modeling tec/ni*ues used in programming

,9

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

languages suc/ as #ava' CRR or CY )o$ever' a more popular alternative or ac/ieving suc/ a &ridge is to use a standard relational data&ase systems $it/ some orm o O7M so t$are. W/ereas traditional 7"%M2 or 2FL4"%M2 products ocused on t/e e icient management o data dra$n rom a limited set o data4types Bde ined &y t/e relevant language standardsC' an o&ject4relational "%M2 allo$s so t$are4developers to integrate t/eir o$n types and t/e met/ods t/at apply to t/em into t/e "%M2. O7"%M2 tec/nology aims to allo$ developers to raise t/e level o a&straction at $/ic/ t/ey vie$ t/e pro&lem domain. @/is goal is not universally s/aredJ proponents o relational data&ases o ten argue t/at o&ject4oriented speci ication lo1ers t/e a&straction level. (n o&ject4relational data&ase can &e said to provide a middle ground &et$een relational data&ases and ob%ect#oriented databases BOO"%M2C. In o&ject4relational data&ases' t/e approac/ is essentially t/at o relational data&ases5 t/e data resides in t/e data&ase and is manipulated collectively $it/ *ueries in a *uery languageJ at t/e ot/er eDtreme are OO"%M2es in $/ic/ t/e data&ase is essentially a persistent o&ject store or so t$are $ritten in an o&ject4oriented programming language' $it/ a programming (!I or storing and retrieving o&jects' and little or no speci ic support or *uerying. Many 2FL O7"%M2s on t/e market today are eDtensi&le $it/ user4 de ined types BU"@C and custom4$ritten unctions Be.g. stored procedures. 2ome Be.g. 2FL 2erverC allo$ suc/ unctions to &e $ritten in o&ject4oriented programming languages' &ut t/is &y itsel doesnIt make t/em o&ject4oriented data&asesJ in an o&ject4oriented data&ase' o&ject orientation is a eature o t/e data model.

,.8

O$%"ct D't'$'!"

In an o$%"ct d't'$'!" Balso o$%"ct or "nt"d d't'$'!"C' in ormation is represented in t/e orm o o&jects as used in o&ject4oriented programming. W/en data&ase capa&ilities are com&ined $it/ o&ject programming language capa&ilities' t/e result is an o&ject data&ase management system BO"%M2C. (n O"%M2 makes data&ase o&jects appear as programming language o&jects in one or more o&ject programming languages. (n O"%M2 eDtends t/e programming language $it/ transparently persistent data' concurrency control' data recovery' associative *ueries' and ot/er capa&ilities. 2ome o&ject4oriented data&ases are designed to $ork $ell $it/ o&ject4 oriented programming languages suc/ as !yt/on' #ava' CY' 0isual %asic .NE@' CRR' O&jective4C and 2malltalk. Ot/ers /ave t/eir o$n
,+

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

programming languages. (n O"%M2s use eDactly t/e same model as o&ject4oriented programming languages. O&ject data&ases are generally recommended $/en t/ere is a &usiness need or /ig/ per ormance processing on compleD data. Ado<t on o/ O$%"ct D't'$'!"! O&ject data&ases &ased on persistent programming ac*uired a nic/e in application areas suc/ as engineering and spatial data&ases' telecommunications' and scienti ic areas suc/ as /ig/ energy p/ysics and molecular &iology. @/ey /ave made little impact on mainstream commercial data processing' t/oug/ t/ere is some usage in specialiEed areas o inancial serviceZ. It is also $ort/ noting t/at o&ject data&ases /eld t/e record or t/e WorldIs largest data&ase B&eing irst to /old over +999 @era&ytes at 2tan ord Linear (ccelerator Center MLessons Learned ?rom Managing ( !eta&yteMC and t/e /ig/est ingest rate ever recorded or a commercial data&ase at over one @era&yte per /our. (not/er group o o&ject data&ases ocuses on em&edded use in devices' packaged so t$are' and realtime systems. Ad&'nt'-"! 'nd D !'d&'nt'-"! %enc/marks &et$een O"%M2s and 7"%M2s /ave s/o$n t/at an O"%M2 can &e clearly superior or certain kinds o tasks. @/e main reason or t/is is t/at many operations are per ormed using navigational rat/er t/an declarative inter aces' and navigational access to data is usually implemented very e iciently &y ollo$ing pointers. Critics o navigational data&ase4&ased tec/nologies like O"%M2 suggest t/at pointer4&ased tec/ni*ues are optimiEed or very speci ic Msearc/ routesM or vie$points. )o$ever' or general4purpose *ueries on t/e same in ormation' pointer4&ased tec/ni*ues $ill tend to &e slo$er and more di icult to ormulate t/an relational. @/us' navigation appears to simpli y speci ic kno$n uses at t/e eDpense o general' un oreseen' and varied uture uses. )o$ever' $it/ suita&le language support' direct o&ject re erences may &e maintained in addition to normalised' indeDed aggregations' allo$ing &ot/ kinds o accessJ urt/ermore' a persistent language may indeD aggregations on $/atever is returned &y some ar&itrary o&ject access met/od' rat/er t/an only on attri&ute value' $/ic/ can simpli y some *ueries. Ot/er t/ings t/at $ork against an O"%M2 seem to &e t/e lack o interopera&ility $it/ a great num&er o tools- eatures t/at are taken or granted in t/e 2FL $orld including &ut not limited to industry standard

,8

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

connectivity' reporting tools' OL(! tools' and &ackup and recovery standards. (dditionally' o&ject data&ases lack a ormal mat/ematical oundation' unlike t/e relational model' and t/is in turn leads to $eaknesses in t/eir *uery support. )o$ever' t/is o&jection is o set &y t/e act t/at some O"%M2s ully support 2FL in addition to navigational access' e.g. O&jectivity-2FLRR' Matisse' and Inter2ystems C(C)[. E ective use may re*uire compromises to keep &ot/ paradigms in sync. In act t/ere is an intrinsic tension &et$een t/e notion o encapsulation' $/ic/ /ides data and makes it availa&le only t/roug/ a pu&lis/ed set o inter ace met/ods' and t/e assumption underlying muc/ data&ase tec/nology' $/ic/ is t/at data s/ould &e accessi&le to *ueries &ased on data content rat/er t/an prede ined access pat/s. "ata&ase4centric t/inking tends to vie$ t/e $orld t/roug/ a declarative and attri&ute4 driven vie$point' $/ile OO! tends to vie$ t/e $orld t/roug/ a &e/avioral vie$point' maintaining entity4identity independently o c/anging attri&utes. @/is is one o t/e many impedance mismatc/ issues surrounding OO! and data&ases. (lt/oug/ some commentators /ave $ritten o o&ject data&ase tec/nology as a ailure' t/e essential arguments in its avor remain valid' and attempts to integrate data&ase unctionality more closely into o&ject programming languages continue in &ot/ t/e researc/ and t/e industrial communities.

,.5

A!!oc 't &" Mod"( o/ D't'

@/e '!!oc 't &" #od"( o/ d't' is an alternative data model or data&ase systems. Ot/er data models' suc/ as t/e relational model and t/e o&ject data model' are record4&ased. @/ese models involve encompassing attri&utes a&out a t/ing' suc/ as a car' in a record structure. 2uc/ attri&utes mig/t &e registration' colour' make' model' etc. In t/e associative model' everyt/ing $/ic/ /as Tdiscrete independent eDistenceU is modeled as an entity' and relations/ips &et$een t/em are modeled as associations. @/e granularity at $/ic/ data is represented is similar to sc/emes presented &y C/en BEntity4relations/ip modelCJ %racc/i' !aolini and !elagatti B%inary 7elationsCJ and 2enko B@/e Entity 2et ModelC.

,.?

Co(u#n@Or "nt"d DBMS

( co(u#n@or "nt"d DBMS is a data&ase management system B"%M2C $/ic/ stores its content &y column rat/er t/an &y ro$. @/is /as advantages or data&ases suc/ as data $are/ouses and li&rary

,=

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

catalogues' $/ere aggregates are computed over large num&ers o similar data items. B"n"/ t! Comparisons &et$een ro$4oriented and column4oriented systems are typically concerned $it/ t/e e iciency o /ard4disk access or a given $orkload' as seek time is incredi&ly long compared to t/e ot/er delays in computers. ?urt/er' &ecause seek time is improving at a slo$ rate relative to cpu po$er Bsee MooreIs La$C' t/is ocus $ill likely continue on systems reliant on /ard4disks or storage. ?ollo$ing is a set o over4 simpli ied o&servations $/ic/ attempt to paint a picture o t/e trade4o s &et$een column and ro$ oriented organiEations. +. Column4oriented systems are more e icient $/en an aggregate needs to &e computed over many ro$s &ut only or a nota&ly smaller su&set o all columns o data' &ecause reading t/at smaller su&set o data can &e aster t/an reading all data. 8. Column4oriented systems are more e icient $/en ne$ values o a column are supplied or all ro$s at once' &ecause t/at column data can &e $ritten e iciently and replace old column data $it/out touc/ing any ot/er columns or t/e ro$s. =. 7o$4oriented systems are more e icient $/en many columns o a single ro$ are re*uired at t/e same time' and $/en ro$4siEe is relatively small' as t/e entire ro$ can &e retrieved $it/ a single disk seek. ,. 7o$4oriented systems are more e icient $/en $riting a ne$ ro$ i all o t/e column data is supplied at t/e same time' as t/e entire ro$ can &e $ritten $it/ a single disk seek. In practice' ro$ oriented arc/itectures are $ell4suited or OL@!4like $orkloads $/ic/ are more /eavily loaded $it/ interactive transactions. Column stores are $ell4suited or OL(!4like $orkloads Be.g.' data $are/ousesC $/ic/ typically involve a smaller num&er o /ig/ly compleD *ueries over all data Bpossi&ly tera&ytesC. Stor'-" E// c "nc) &!. R'ndo# Acc"!! Column data is o uni orm typeJ t/ere ore' t/ere are some opportunities or storage siEe optimiEations availa&le in column oriented data t/at are not availa&le in ro$ oriented data. ?or eDample' many popular modern compression sc/emes' suc/ as L\W' make use o t/e similarity o adjacent data to compress. W/ile t/e same tec/ni*ues may &e used on

,,

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

ro$4oriented data' a typical implementation $ill ac/ieve less e ective results. ?urt/er' t/is &e/avior &ecomes more dramatic $/en a large percentage o adjacent column data is eit/er t/e same or not4present' suc/ as in a sparse column Bsimilar to a sparse matriDC. @/e opposing tradeo is 7andom (ccess. 7etrieving all data rom a single ro$ is more e icient $/en t/at data is located in a single location' suc/ as in a ro$4oriented arc/itecture. ?urt/er' t/e greater adjacent compression ac/ieved' t/e more di icult random4access may &ecome' as data mig/t need to &e uncompressed to &e read. I#<("#"nt't on! ?or many years' only t/e 2y&ase IF product $as commonly availa&le in t/e column4oriented "%M2 class. )o$ever' t/at /as c/anged rapidly in t/e last e$ years $it/ many open source and commercial implementations.

,.7

N'& -'t on'( D't'$'!"

N'& -'t on'( d't'$'!"! are c/aracteriEed &y t/e act t/at o&jects in t/e data&ase are ound primarily &y ollo$ing re erences rom ot/er o&jects. @raditionally navigational inter aces are procedural' t/oug/ one could c/aracteriEe some modern systems like Q!at/ as &eing simultaneously navigational and declarative. Navigational access is traditionally associated $it/ t/e net$ork model and /ierarc/ical model o data&ase inter aces and /ave evolved into 2et4 oriented systems. Navigational tec/ni*ues use MpointersM and Mpat/sM to navigate among data records Balso kno$n as MnodesMC. @/is is in contrast to t/e relational model Bimplemented in relational data&asesC' $/ic/ strives to use MdeclarativeM or logic programming tec/ni*ues in $/ic/ you ask t/e system or 1hat you $ant instead o ho1 to navigate to it. ?or eDample' to give directions to a /ouse' t/e navigational approac/ $ould resem&le somet/ing like' MGet on /ig/$ay 81 or < miles' turn onto )orse 7oad' le t at t/e red &arn' t/en stop at t/e =rd /ouse do$n t/e roadM. W/ereas' t/e declarative approac/ $ould resem&le' M0isit t/e green /ouseBsC $it/in t/e ollo$ing coordinates....M )ierarc/ical models are also considered navigational &ecause one MgoesM up Bto parentC' do$n Bto leavesC' and t/ere are Mpat/sM' suc/ as t/e amiliar ile- older pat/s in /ierarc/ical ile systems. In general' navigational systems $ill use com&inations o pat/s and prepositions suc/ as MneDtM' MpreviousM' M irstM' MlastM' MupM' Mdo$nM' etc.

,1

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

2ome also suggest t/at navigational data&ase engines are easier to &uild and take up less memory B7(MC t/an relational e*uivalents. )o$ever' t/e eDistence o relational or relational4&ased products o t/e late +:<9s t/at possessed small engines B&y todayIs standardsC &ecause t/ey did not use 2FL suggest t/is is not necessarily t/e case. W/atever t/e reason' navigational tec/ni*ues are still t/e pre erred $ay to /andle smaller4 scale structures. ( current eDample o navigational structuring can &e ound in t/e "ocument O&ject Model B"OMC o ten used in $e& &ro$sers and closely associated $it/ #ava2cript. @/e "OM MengineM is essentially a lig/t4 $eig/t navigational data&ase. @/e World Wide We& itsel and Wikipedia could even &e considered orms o navigational data&ases. BOn a large scale' t/e We& is a net$ork model and on smaller or local scales' suc/ as domain and U7L partitioning' it uses /ierarc/ies.C

,.8

D !tr $ut"d D't'$'!"

( d !tr $ut"d d't'$'!" is a data&ase t/at is under t/e control o a central data&ase management system B"%M2C in $/ic/ storage devices are not all attac/ed to a common C!U. It may &e stored in multiple computers located in t/e same p/ysical location' or may &e dispersed over a net$ork o interconnected computers. Collections o data Be.g. in a data&aseC can &e distri&uted across multiple p/ysical locations. ( distri&uted data&ase is distri&uted into separate partitions- ragments. Eac/ partition- ragment o a distri&uted data&ase may &e replicated Bi.e. redundant ail4overs' 7(I" likeC. %esides distri&uted data&ase replication and ragmentation' t/ere are many ot/er distri&uted data&ase design tec/nologies. ?or eDample' local autonomy' sync/ronous and async/ronous distri&uted data&ase tec/nologies. @/ese tec/nologiesI implementation can and does depend on t/e needs o t/e &usiness and t/e sensitivity-con identiality o t/e data to &e stored in t/e data&ase' and /ence t/e price t/e &usiness is $illing to spend on ensuring data security' consistency and integrity. I#<ort'nt con! d"r't on! Care $it/ a distri&uted data&ase must &e taken to ensure t/e ollo$ing5 @/e distri&ution is transparent S users must &e a&le to interact $it/ t/e system as i it $ere one logical system. @/is applies to t/e systemIs per ormance' and met/ods o access amongst ot/er t/ings. @ransactions are transparent S eac/ transaction must maintain data&ase integrity across multiple data&ases. @ransactions must also

,.

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

&e divided into su&transactions' eac/ su&transaction a ecting one data&ase system. Ad&'nt'-"! o/ D !tr $ut"d D't'$'!"! 7e lects organiEational structure S data&ase ragments are located in t/e departments t/ey relate to. Local autonomy S a department can control t/e data a&out t/em Bas t/ey are t/e ones amiliar $it/ it.C Improved availa&ility S a ault in one data&ase system $ill only a ect one ragment' instead o t/e entire data&ase. Improved per ormance S data is located near t/e site o greatest demand' and t/e data&ase systems t/emselves are paralleliEed' allo$ing load on t/e data&ases to &e &alanced among servers. B( /ig/ load on one module o t/e data&ase $onIt a ect ot/er modules o t/e data&ase in a distri&uted data&ase.C Economics S it costs less to create a net$ork o smaller computers $it/ t/e po$er o a single large computer. Modularity S systems can &e modi ied' added and removed rom t/e distri&uted data&ase $it/out a ecting ot/er modules BsystemsC.

D !'d&'nt'-"! o/ D !tr $ut"d D't'$'!"! CompleDity S eDtra $ork must &e done &y t/e "%(s to ensure t/at t/e distri&uted nature o t/e system is transparent. EDtra $ork must also &e done to maintain multiple disparate systems' instead o one &ig one. EDtra data&ase design $ork must also &e done to account or t/e disconnected nature o t/e data&ase S or eDample' joins &ecome pro/i&itively eDpensive $/en per ormed across multiple systems. Economics S increased compleDity and a more eDtensive in rastructure means eDtra la&our costs. 2ecurity S remote data&ase ragments must &e secured' and t/ey are not centraliEed so t/e remote sites must &e secured as $ell. @/e in rastructure must also &e secured Be.g.' &y encrypting t/e net$ork links &et$een remote sitesC. "i icult to maintain integrity S in a distri&uted data&ase' en orcing integrity over a net$ork may re*uire too muc/ o t/e net$orkIs resources to &e easi&le. IneDperience S distri&uted data&ases are di icult to $ork $it/' and as a young ield t/ere is not muc/ readily availa&le eDperience on proper practice. Lack o standards O t/ere are no tools or met/odologies yet to /elp users convert a centraliEed "%M2 into a distri&uted "%M2.

,;

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

"ata&ase design more compleD O &esides o t/e normal di iculties' t/e design o a distri&uted data&ase /as to consider ragmentation o data' allocation o ragments to speci ic sites and data replication.

,.F

R"'( T #" D't'$'!"

( r"'(@t #" d't'$'!" is a processing system designed to /andle $orkloads $/ose state is constantly c/anging B%uc/mannC. @/is di ers rom traditional data&ases containing persistent data' mostly una ected &y time. ?or eDample' a stock market c/anges very rapidly and is dynamic. @/e grap/s o t/e di erent markets appear to &e very unsta&le and yet a data&ase /as to keep track o current values or all o t/e markets o t/e Ne$ Aork 2tock EDc/ange B3anitkarC. 7eal4time processing means t/at a transaction is processed ast enoug/ or t/e result to come &ack and &e acted on rig/t a$ay BCapronC. 7eal4time data&ases are use ul or accounting' &anking' la$' medical records' multi4media' process control' reservation systems' and scienti ic data analysis B2nodgrassC. (s computers increase in po$er and can store more data' t/ey are integrating t/emselves into our society and are employed in many applications. O&"r& "2 7eal4time data&ases are traditional data&ases t/at use an eDtension to give t/e additional po$er to yield relia&le responses. @/ey use timing constraints t/at represent a certain range o values or $/ic/ t/e data are valid. @/is range is called temporal validity. ( conventional data&ase cannot $ork under t/ese circumstances &ecause t/e inconsistencies &et$een t/e real $orld o&jects and t/e data t/at represents t/em are too severe or simple modi ications. (n e ective system needs to &e a&le to /andle time4sensitive *ueries' return only temporally valid data' and support priority sc/eduling. @o enter t/e data in t/e records' o ten a sensor or an input device monitors t/e state o t/e p/ysical system and updates t/e data&ase $it/ ne$ in ormation to re lect t/e p/ysical system more accurately B(&&otC. W/en designing a real4time data&ase system' one s/ould consider /o$ to represent valid time' /o$ acts are associated $it/ real4time system. (lso' consider /o$ to represent attri&ute values in t/e data&ase so t/at process transactions and data consistency /ave no violations B(&&otC. W/en designing a system' it is important to consider $/at t/e system s/ould do $/en deadlines are not met. ?or eDample' an air4tra ic control system constantly monitors /undreds o aircra t and makes decisions a&out incoming lig/t pat/s and determines t/e order in $/ic/ aircra t s/ould land &ased on data suc/ as uel' altitude' and speed. I
,<

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

any o t/is in ormation is late' t/e result could &e devastating B2ivasankaranC. @o address issues o o&solete data' t/e timestamp can support transactions &y providing clear time re erences B2ivasankaranC. S=L DBMS I%M started $orking on a prototype system loosely &ased on CoddIs concepts as S)!t"# R in t/e early +:;9s S un ortunately' 2ystem 7 $as conceived as a $ay o proving CoddIs ideas unimplementa&le' and t/us t/e project $as delivered to a group o programmers $/o $ere not under CoddIs supervision' never understood /is ideas ully and ended up violating several undamentals o t/e relational model. @/e irst M*uickieM version $as ready in +:;,-1' and $ork t/en started on multi4 ta&le systems in $/ic/ t/e data could &e &roken do$n so t/at all o t/e data or a record Bmuc/ o $/ic/ is o ten optionalC did not /ave to &e stored in a single large Mc/unkM. 2u&se*uent multi4user versions $ere tested &y customers in +:;< and +:;:' &y $/ic/ time a standardiEed *uery language' 2FL' /ad &een added. CoddIs ideas $ere esta&lis/ing t/emselves as &ot/ $orka&le and superior to Codasyl' pus/ing I%M to develop a true production version o 2ystem 7' kno$n as S=LBDS' and' later' D't'$'!" + B"%8C. Many o t/e people involved $it/ ING7E2 &ecame convinced o t/e uture commercial success o suc/ systems' and ormed t/eir o$n companies to commercialiEe t/e $ork &ut $it/ an 2FL inter ace. 2y&ase' In ormiD' Non2top 2FL and eventually Ingres itsel $ere all &eing sold as o s/oots to t/e original ING7E2 product in t/e +:<9s. Even Microso t 2FL 2erver is actually a re4&uilt version o 2y&ase' and t/us' ING7E2. Only Larry EllisonKs Oracle started rom a di erent c/ain' &ased on I%MIs papers on 2ystem 7' and &eat I%M to market $/en t/e irst version $as released in +:;<. 2tone&raker $ent on to apply t/e lessons rom ING7E2 to develop a ne$ data&ase' !ostgres' $/ic/ is no$ kno$n as !ostgre2FL. !ostgre2FL is primarily used or glo&al mission critical applications Bt/e .org and .in o domain name registries use it as t/eir primary data store' as do many large companies and inancial institutionsC. In 2$eden' CoddIs paper $as also read and Mimer 2FL $as developed rom t/e mid4;9s at Uppsala University. In +:<,' t/is project $as consolidated into an independent enterprise. In t/e early +:<9s' Mimer introduced transaction /andling or /ig/ ro&ustness in applications' an idea t/at $as su&se*uently implemented on most ot/er "%M2.

8.:

CONCLUSION

,:

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

@/e evolution o data&ase models is continuous until a time an ideal model $ill emerge t/at $ill meet all t/e re*uirements o end users. @/is sound impossi&le &ecause t/ere can never &e a system t/at is completely ault4 ree. @/us $e $ill yet see more o models o data&ase. @/e lat and /ierarc/ical models /ad set t/e tune or emerging models.

5.:

SUMMARY
In a /ierarc/ical model' data is organiEed into an inverted tree4 like structure' implying a multiple do$n$ard link in eac/ node to descri&e t/e nesting' and a sort ield to keep t/e records in a particular order in eac/ same4level list. In t/e net$ork model' records can participate in any num&er o named relations/ips. Eac/ relations/ip associates a record o one type Bcalled t/e o2n"rC $it/ multiple records o anot/er type Bcalled t/e #"#$"rC. (n o&ject4relational data&ase BO7"C or o&ject4relational data&ase management system BO7"%M2C is a data&ase management system B"%M2C similar to a relational data&ase' &ut $it/ an o&ject4oriented data&ase model5 o&jects' classes and in/eritance are directly supported in data&ase sc/emas and in t/e *uery language. In an o$%"ct d't'$'!" Balso o$%"ct or "nt"d d't'$'!"C' in ormation is represented in t/e orm o o&jects as used in o&ject4oriented programming. @/e '!!oc 't &" #od"( o/ d't' is an alternative data model or data&ase systems. Ot/er data models' suc/ as t/e relational model and t/e o&ject data model' are record4&ased. ( co(u#n@or "nt"d DBMS is a data&ase management system B"%M2C $/ic/ stores its content &y column rat/er t/an &y ro$. @/is /as advantages or data&ases suc/ as data $are/ouses and li&rary catalogues' $/ere aggregates are computed over large num&ers o similar data items N'& -'t on'( d't'$'!"! are c/aracteriEed &y t/e act t/at o&jects in t/e data&ase are ound primarily &y ollo$ing re erences rom ot/er o&jects. ( d !tr $ut"d d't'$'!" is a data&ase t/at is under t/e control o a central data&ase management system B"%M2C in $/ic/ storage devices are not all attac/ed to a common C!U ( real4time data&ase is a processing system designed to /andle $orkloads $/ose state is constantly c/anging B%uc/mannC. @/is di ers rom traditional data&ases containing persistent data' mostly una ected &y time

?.:

TUTOR@MARAED ASSIGNMENT

+. Mention 1 models o data&ases


19

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

8. %rie ly discuss t/e advantages and disadvantages o distri&uted data&ases

7.:

REFERENCESBFURTCER READINGS

C/arles W. %ac/man' The (rogrammer as 'avigator. (CM @uring ($ard Lecture' Communications o t/e (CM' 0olume +.' Issue ++' +:;=' pp. .1=4.1<' I22N 999+49;<8' doi5 +9.++,1-=11.++.=.81=,. 2tone&raker' Mic/ael $it/ Moore' "orot/y. 2b%ect#)elational D$M0s: The 'ext reat 8ave. Morgan 3au mann !u&lis/ers' +::.. I2%N +411<.94=:;48. @/ere $as' at t/e @ime' 2ome "ispute W/et/er t/e @erm $as coined &y Mic/ael 2tone&raker o Illustra or Won 3im o Uni2FL. 3im' Won. Introduction to 2b%ect#2riented Databases. @/e MI@ !ress' +::9. I2%N 948.84+++8,4+. %ancil/on' ?rancoisJ "elo&el' ClaudeJ and 3anellakis' !aris. $uilding an 2b%ect#2riented Database 0ystem: The 0tory of 2/. Morgan 3au mann !u&lis/ers' +::8. I2%N +411<.94+.:4,. C42tore5 ( column4oriented "%M2' 2tone&raker et al' !roceedings o t/e =+st 0L"% Con erence' @rond/eim' Nor$ay' 8991 %]a^e$icE' #acekJ 3r_liko$ski' \&ysEkoJ MorEy' @adeusE B899=C. 9andbook on Data Management in Information 0ystems. 2pringer' pp. +<. I2%N =1,9,=<:=:. M. @. OEsu and !. 0aldurieE' (rinciples of Distributed Databases B8nd editionC' !rentice4)all' I2%N 94+=4.1:;9;4. ?ederal 2tandard +9=;C. Elmasri and Navat/e' 4undamentals of Database 0ystems B=rd editionC' (ddison4Wesley Longman' I2%N 9489+41,8.=4=. (&&ot' 7o&ert 3.' and )ector Garcia4Molina. 2c/eduling 7eal4@ime @ransactions5 a !er ormance Evaluation. 2tan ord University and "igital E*uipment Corp. (CM' +::8. += "ec. 899. . %uc/mann' (. M7eal @ime "ata&ase 2ystems.M Encyclopedia o "ata&ase @ec/nologies and (pplications. Ed. Laura C. 7ivero' #orge ). "oorn' and 0iviana E. ?erraggine. Idea Group' 8991.

1+

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

2tankovic' #o/n (.' Marco 2puri' 3rit/i 7amamrit/am' and Giorgio C. %uttaEEo. "eadline 2c/eduling or 7eal4@ime 2ystems5 E"? and 7elated (lgorit/ms. 2pringer' +::<.

UNIT 5
CONTENTS +.9 8.9

DATABASE MODELS: RELATIONAL MODEL

Introduction O&jectives

18

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

=.9

,.9 1.9 ..9 ;.9

Main Content =.+ T1" Mod"( =.8 Int"r<r"t't on =.= A<<( c't on to D't'$'!"! =., A(t"rn't &"! to t1" R"('t on'( Mod"( =.1 C !tor) =.. S=L 'nd t1" R"('t on'( Mod"( =.; I#<("#"nt't on =.< Contro&"r! "! =.: D"! -n ,.*: S"t@T1"or"t c For#u('t on ,.** A") Con!tr' nt! 'nd Funct on'( D"<"nd"nc "! Conclusion 2ummary @utor4Marked (ssignment 7e erences-?urt/er 7eadings

*.:

INTRODUCTION

@/e relational model or data&ase management is a data&ase model &ased on irst4order predicate logic' irst ormulated and proposed in +:.: &y Edgar Codd Its core idea is to descri&e a data&ase as a collection o predicates over a inite set o predicate varia&les' descri&ing constraints on t/e possi&le values and com&inations o values. @/e content o t/e data&ase at any given time is a inite model BlogicC o t/e data&ase' i.e. a set o r"('t on!' one per predicate varia&le' suc/ t/at all predicates are satis ied. ( re*uest or in ormation rom t/e data&ase Ba data&ase *ueryC is also a predicate. @/e purpose o t/e relational model is to provide a declarative met/od or speci ying data and *ueries5 $e directly state $/at in ormation t/e data&ase contains and $/at in ormation $e $ant rom it' and let t/e data&ase management system so t$are take care o descri&ing data structures or storing t/e data and retrieval procedures or getting *ueries ans$ered. I%M implemented CoddIs ideas $it/ t/e "%8 data&ase management systemJ it introduced t/e 2FL data de inition and *uery language. Ot/er relational data&ase management systems ollo$ed' most o t/em using 2FL as $ell. ( table in an 2FL data&ase sc/ema corresponds to a predicate varia&leJ t/e contents o a ta&le to a relationJ key constraints' ot/er constraints' and 2FL *ueries correspond to predicates. )o$ever' it must &e noted t/at 2FL data&ases' including "%8' deviate rom t/e relational model in many detailsJ Codd iercely argued against
1=

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

deviations t/at compromise t/e original principles.Z

+.:

OB;ECTIVES

(t t/e end o t/is unit' t/e you s/ould &e a&le to5 de ine relational model o data&ase understand and eDplain t/e concept &e/ind relational models ans$er t/e *uestion o /o$ to interpret a relational data&ase model kno$ t/e various applications o relational data&ase compare relational model $it/ t/e structured *uery language B2FLC kno$ t/e constraints and controversies associated $it/ relational data&ase model.

F -ur" *: R"('t on'( Structur" "epartment @a&le D"<tno "ept ( "ept % "ept C Dn'#" D(oc D#-r

E#<(o)"" T'$(" E#<no E#< * E#< + E#< , E#< 8 E#< 5 E#< ? En'#" Et t(" E!'('r) D"<tno D"<t A D"<t B D"<t C D"<t D D"<t E D"<t F

,.: ,.*

MAIN CONTENT T1" Mod"(

@/e undamental assumption o t/e relational model is t/at all data is represented as mat/ematical n4ary r"('t on!' an n4ary relation &eing a su&set o t/e Cartesian product o n domains. In t/e mat/ematical model' reasoning a&out suc/ data is done in t$o4valued predicate logic' meaning t/ere are t$o possi&le evaluations or eac/ proposition5 eit/er
1,

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

true or false Band in particular no t/ird value suc/ as unkno1n' or not applicable' eit/er o $/ic/ are o ten associated $it/ t/e concept o NULLC. 2ome t/ink t$o4valued logic is an important part o t/e relational model' $/ere ot/ers t/ink a system t/at uses a orm o t/ree4 valued logic can still &e considered relationalZ "ata are operated upon &y means o a relational calculus or relational alge&ra' t/ese &eing e*uivalent in eDpressive po$er. @/e relational model o data permits t/e data&ase designer to create a consistent' logical representation o in ormation. Consistency is ac/ieved &y including declared constraints in t/e data&ase design' $/ic/ is usually re erred to as t/e logical sc/ema. @/e t/eory includes a process o data&ase normaliEation $/ere&y a design $it/ certain desira&le properties can &e selected rom a set o logically e*uivalent alternatives. @/e access plans and ot/er implementation and operation details are /andled &y t/e "%M2 engine' and are not re lected in t/e logical model. @/is contrasts $it/ common practice or 2FL "%M2s in $/ic/ per ormance tuning o ten re*uires c/anges to t/e logical model. @/e &asic relational &uilding &lock is t/e domain or data type' usually a&&reviated no$adays to type. ( tuple is an unordered set o attribute values. (n attri&ute is an ordered pair o attribute name and type name. (n attri&ute value is a speci ic valid value or t/e type o t/e attri&ute. @/is can &e eit/er a scalar value or a more compleD type. ( relation consists o a heading and a body. ( /eading is a set o attri&utes. ( &ody Bo an n4ary relationC is a set o n4tuples. @/e /eading o t/e relation is also t/e /eading o eac/ o its tuples. ( relation is de ined as a set o n4tuples. In &ot/ mat/ematics and t/e relational data&ase model' a set is an unordered collection o items' alt/oug/ some "%M2s impose an order to t/eir data. In mat/ematics' a tuple /as an order' and allo$s or duplication. E.?. Codd originally de ined tuples using t/is mat/ematical de inition. Later' it $as one o E.?. CoddKs great insig/ts t/at using attri&ute names instead o an ordering $ould &e so muc/ more convenient Bin generalC in a computer language &ased on relations. @/is insig/t is still &eing used today. @/oug/ t/e concept /as c/anged' t/e name MtupleM /as not. (n immediate and important conse*uence o t/is distinguis/ing eature is t/at in t/e relational model t/e Cartesian product &ecomes commutative. ( ta&le is an accepted visual representation o a relationJ a tuple is similar to t/e concept o ro1' &ut note t/at in t/e data&ase language 2FL t/e columns and t/e ro$s o a ta&le are ordered.

11

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

( relvar is a named varia&le o some speci ic relation type' to $/ic/ at all times some relation o t/at type is assigned' t/oug/ t/e relation may contain Eero tuples. @/e &asic principle o t/e relational model is t/e In ormation !rinciple5 all in ormation is represented &y data values in relations. In accordance $it/ t/is !rinciple' a relational data&ase is a set o relvars and t/e result o every *uery is presented as a relation. @/e consistency o a relational data&ase is en orced' not &y rules &uilt into t/e applications t/at use it' &ut rat/er &y constraints' declared as part o t/e logical sc/ema and en orced &y t/e "%M2 or all applications. In general' constraints are eDpressed using relational comparison operators' o $/ic/ just one' Mis su&set o M BC' is t/eoretically su icient. In practice' several use ul s/ort/ands are eDpected to &e availa&le' o $/ic/ t/e most important are candidate key Breally' superkeyC and oreign key constraints.

,.+

Int"r<r"t't on

@o ully appreciate t/e relational model o data it is essential to understand t/e intended interpretation o a relation. @/e &ody o a relation is sometimes called its eDtension. @/is is &ecause it is to &e interpreted as a representation o t/e eDtension o some predicate' t/is &eing t/e set o true propositions t/at can &e ormed &y replacing eac/ ree varia&le in t/at predicate &y a name Ba term t/at designates somet/ingC. @/ere is a one4to4one correspondence &et$een t/e ree varia&les o t/e predicate and t/e attri&ute names o t/e relation /eading. Eac/ tuple o t/e relation &ody provides attri&ute values to instantiate t/e predicate &y su&stituting eac/ o its ree varia&les. @/e result is a proposition t/at is deemed' on account o t/e appearance o t/e tuple in t/e relation &ody' to &e true. Contrari$ise' every tuple $/ose /eading con orms to t/at o t/e relation &ut $/ic/ does not appear in t/e &ody is deemed to &e alse. @/is assumption is kno$n as t/e closed $orld assumption ?or a ormal eDposition o t/ese ideas' see t/e section S"t T1"or) For#u('t on' &elo$.

,.,

A<<( c't on to D't'$'!"!

( t)<" as used in a typical relational data&ase mig/t &e t/e set o integers' t/e set o c/aracter strings' t/e set o dates' or t/e t$o &oolean values true and false' and so on. @/e corresponding t)<" n'#"! or
1.

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

t/ese types mig/t &e t/e strings MintM' Mc/arM' MdateM' M&ooleanM' etc. It is important to understand' t/oug/' t/at relational t/eory does not dictate $/at types are to &e supportedJ indeed' no$adays provisions are eDpected to &e availa&le or user#defined types in addition to t/e built#in ones provided &y t/e system. Attr $ut" is t/e term used in t/e t/eory or $/at is commonly re erred to as a co(u#n. 2imilarly' t'$(" is commonly used in place o t/e t/eoretical term r"('t on Bt/oug/ in 2FL t/e term is &y no means synonymous $it/ relationC. ( ta&le data structure is speci ied as a list o column de initions' eac/ o $/ic/ speci ies a uni*ue column name and t/e type o t/e values t/at are permitted or t/at column. (n 'ttr $ut" &'(u" is t/e entry in a speci ic column and ro$' suc/ as M#o/n "oeM or M=1M. ( tu<(" is &asically t/e same t/ing as a ro2' eDcept in an 2FL "%M2' $/ere t/e column values in a ro$ are ordered. B@uples are not orderedJ instead' eac/ attri&ute value is identi ied solely &y t/e 'ttr $ut" n'#" and never &y its ordinal position $it/in t/e tuple.C (n attri&ute name mig/t &e MnameM or MageM. ( r"('t on is a t'$(" structure de inition Ba set o column de initionsC along $it/ t/e data appearing in t/at structure. @/e structure de inition is t/e 1"'d n- and t/e data appearing in it is t/e $od)' a set o ro$s. ( data&ase r"(&'r Brelation varia&leC is commonly kno$n as a $'!" t'$(". @/e /eading o its assigned value at any time is as speci ied in t/e ta&le declaration and its &ody is t/at most recently assigned to it &y invoking some u<d't" o<"r'tor Btypically' IN2E7@' U!"(@E' or "ELE@EC. @/e /eading and &ody o t/e ta&le resulting rom evaluation o some *uery are determined &y t/e de initions o t/e operators used in t/e eDpression o t/at *uery. BNote t/at in 2FL t/e /eading is not al$ays a set o column de initions as descri&ed a&ove' &ecause it is possi&le or a column to /ave no name and also or t$o or more columns to /ave t/e same name. (lso' t/e &ody is not al$ays a set o ro$s &ecause in 2FL it is possi&le or t/e same ro$ to appear more t/an once in t/e same &ody.C

,.8

A(t"rn't &"! to t1" R"('t on'( Mod"(

Ot/er models are t/e /ierarc/ical model and net$ork model. 2ome systems using t/ese older arc/itectures are still in use today in data centers $it/ /ig/ data volume needs or $/ere eDisting systems are so compleD and a&stract it $ould &e cost pro/i&itive to migrate to systems employing t/e relational modelJ also o note are ne$er o&ject4oriented

1;

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

data&ases' even t/oug/ many o t/em are "%M24construction kits' rat/er t/an proper "%M2s. ( recent development is t/e O&ject47elation type4O&ject model' $/ic/ is &ased on t/e assumption t/at any act can &e eDpressed in t/e orm o one or more &inary relations/ips. @/e model is used in O&ject 7ole Modeling BO7MC' 7"?-Notation = BN=C and in Gellis/ Englis/. @/e relational model $as t/e irst ormal data&ase model. ( ter it $as de ined' in ormal models $ere made to descri&e /ierarc/ical data&ases Bt/e /ierarc/ical modelC and net$ork data&ases Bt/e net$ork modelC. )ierarc/ical and net$ork data&ases eDisted before relational data&ases' &ut $ere only descri&ed as models after t/e relational model $as de ined' in order to esta&lis/ a &asis or comparison.

,.5

C !tor)

@/e relational model $as invented &y E.?. B@edC Codd as a general model o data' and su&se*uently maintained and developed &y C/ris "ate and )ug/ "ar$en among ot/ers. In @/e @/ird Mani esto B irst pu&lis/ed in +::1C "ate and "ar$en s/o$ /o$ t/e relational model can accommodate certain desired o&ject4oriented eatures.

,.?

S=L 'nd t1" R"('t on'( Mod"(

2FL' initially pus/ed as t/e standard language or relational data&ases' deviates rom t/e relational model in several places. @/e current I2O 2FL standard doesnIt mention t/e relational model or use relational terms or concepts. )o$ever' it is possi&le to create a data&ase con orming to t/e relational model using 2FL i one does not use certain 2FL eatures. @/e ollo$ing deviations rom t/e relational model /ave &een noted in 2FL. Note t/at e$ data&ase servers implement t/e entire 2FL standard and in particular do not allo$ some o t/ese deviations. W/ereas NULL is nearly u&i*uitous' or eDample' allo$ing duplicate column names $it/in a ta&le or anonymous columns is uncommon. Du<( c't" Ro2! @/e same ro$ can appear more t/an once in an 2FL ta&le. @/e same tuple cannot appear more t/an once in a relation. Anon)#ou! Co(u#n!

1<

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

( column in an 2FL ta&le can &e unnamed and t/us una&le to &e re erenced in eDpressions. @/e relational model re*uires every attri&ute to &e named and re erencea&le. Du<( c't" Co(u#n N'#"! @$o or more columns o t/e same 2FL ta&le can /ave t/e same name and t/ere ore cannot &e re erenced' on account o t/e o&vious am&iguity. @/e relational model re*uires every attri&ute to &e re erencea&le. Co(u#n Ord"r S -n / c'nc" @/e order o columns in an 2FL ta&le is de ined and signi icant' one conse*uence &eing t/at 2FLIs implementations o Cartesian product and union are &ot/ noncommutative. @/e relational model re*uires t/at t/ere s/ould &e o no signi icance to any ordering o t/e attri&utes o a relation. V "2! 2 t1out CCECA OPTION Updates to a vie$ de ined $it/out C)EC3 O!@ION can &e accepted &ut t/e resulting update to t/e data&ase does not necessarily /ave t/e eDpressed e ect on its target. ?or eDample' an invocation o IN2E7@ can &e accepted &ut t/e inserted ro$s mig/t not all appear in t/e vie$' or an invocation o U!"(@E can result in ro$s disappearing rom t/e vie$. @/e relational model re*uires updates to a vie$ to /ave t/e same e ect as i t/e vie$ $ere a &ase relvar. Co(u#n("!! T'$("! Unr"co-n >"d 2FL re*uires every ta&le to /ave at least one column' &ut t/ere are t$o relations o degree Eero Bo cardinality one and EeroC and t/ey are needed to represent eDtensions o predicates t/at contain no ree varia&les. NULL @/is special mark can appear instead o a value $/erever a value can appear in 2FL' in particular in place o a column value in some ro$. @/e deviation rom t/e relational model arises rom t/e act t/at t/e implementation o t/is ad hoc concept in 2FL involves t/e use o t/ree4 valued logic' under $/ic/ t/e comparison o NULL $it/ itsel does not yield true &ut instead yields t/e t/ird trut/ value' unkno1nJ similarly t/e comparison NULL $it/ somet/ing ot/er t/an itsel does not yield false &ut instead yields unkno1n. It is &ecause o t/is &e/aviour in comparisons t/at NULL is descri&ed as a mark rat/er t/an a value. @/e

1:

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

relational model depends on t/e la$ o eDcluded middle under $/ic/ anyt/ing t/at is not true is alse and anyt/ing t/at is not alse is trueJ it also re*uires every tuple in a relation &ody to /ave a value or every attri&ute o t/at relation. @/is particular deviation is disputed &y some i only &ecause E.?. Codd /imsel eventually advocated t/e use o special marks and a ,4valued logic' &ut t/is $as &ased on /is o&servation t/at t/ere are t$o distinct reasons $/y one mig/t $ant to use a special mark in place o a value' $/ic/ led opponents o t/e use o suc/ logics to discover more distinct reasons and at least as many as +: /ave &een noted' $/ic/ $ould re*uire a 8+4valued logic. 2FL itsel uses NULL or several purposes ot/er t/an to represent Mvalue unkno$nM. ?or eDample' t/e sum o t/e empty set is NULL' meaning Eero' t/e average o t/e empty set is NULL' meaning unde ined' and NULL appearing in t/e result o a LE?@ #OIN can mean Mno value &ecause t/ere is no matc/ing ro$ in t/e rig/t4/and operandM. Conc"<t! 2FL uses concepts Mta&leM' McolumnM' Mro$M instead o MrelvarM' Mattri&uteM' MtupleM. @/ese are not merely di erences in terminology. ?or eDample' a Mta&leM may contain duplicate ro$s' $/ereas t/e same tuple cannot appear more t/an once in a relation.

,.7

I#<("#"nt't on

@/ere /ave &een several attempts to produce a true implementation o t/e relational data&ase model as originally de ined &y Codd and eDplained &y "ate' "ar$en and ot/ers' &ut none /ave &een popular successes so ar. 7el is one o t/e more recent attempts to do t/is.

,.8

Contro&"r! "!

Codd /imsel ' some years a ter pu&lication o /is +:;9 model' proposed a t/ree4valued logic B@rue' ?alse' Missing or NULLC version o it in order to deal $it/ missing in ormation' and in /is The )elational Model for Database Management 6ersion / B+::9C /e $ent a step urt/er $it/ a our4valued logic B@rue' ?alse' Missing &ut (pplica&le' Missing &ut Inapplica&leC version. %ut t/ese /ave never &een implemented' presuma&ly &ecause o attending compleDity. 2FLIs NULL construct $as intended to &e part o a t/ree4valued logic system' &ut ell s/ort o t/at due to logical errors in t/e standard and in its implementations.

,.F

D"! -n

.9

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

"ata&ase normaliEation is usually per ormed $/en designing a relational data&ase' to improve t/e logical consistency o t/e data&ase design. @/is trades o transactional per ormance or space e iciency. @/ere are t$o commonly used systems o diagramming to aid in t/e visual representation o t/e relational model5 t/e entity4relations/ip diagram BE7"C' and t/e related I"E? diagram used in t/e I"E?+Q met/od created &y t/e U.2. (ir ?orce &ased on E7"s. @/e tree structure o data may en orce /ierarc/ical model organiEation' $it/ parent4c/ild relations/ip ta&le.

,.*: S"t@T1"or"t c For#u('t on


%asic notions in t/e relational model are relation names and attribute names. We $ill represent t/ese as strings suc/ as M!ersonM and MnameM and $e $ill usually use t/e varia&les and a'b'c to range over t/em. (not/er &asic notion is t/e set o atomic values t/at contains values suc/ as num&ers and strings. Our irst de inition concerns t/e notion o tuple' $/ic/ ormaliEes t/e notion o ro$ or record in a ta&le5 Tu<(" ( tuple is a partial unction rom attri&ute names to atomic values. )eader ( /eader is a inite set o attri&ute names. !rojection @/e projection o a tuple t on a inite set o attri&utes A is. @/e neDt de inition de ines relation $/ic/ ormaliEes t/e contents o a ta&le as it is de ined in t/e relational model. R"('t on ( relation is a tuple B9'$C $it/ 9' t/e /eader' and $' t/e &ody' a set o tuples t/at all /ave t/e domain 9. 2uc/ a relation closely corresponds to $/at is usually called t/e eDtension o a predicate in irst4order logic eDcept t/at /ere $e identi y t/e places in t/e predicate $it/ attri&ute names. Usually in t/e relational model a data&ase sc/ema is said to consist o a set o relation names' t/e

.+

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

/eaders t/at are associated $it/ t/ese names and t/e constraints t/at s/ould /old or every instance o t/e data&ase sc/ema.

,.** A") Con!tr' nt! 'nd Funct on'( D"<"nd"nc "!


One o t/e simplest and most important types o relation constraints is t/e key constraint. It tells us t/at in every instance o a certain relational sc/ema t/e tuples can &e identi ied &y t/eir values or certain attri&utes.

8.:

CONCLUSION

@/e evolution o t/e relational model o data&ase and data&ase management systems is signi icant in t/e /istory and development o data&ase and data&ase management systems. @/is concept pioneered &y Edgar Codd &roug/t an entirely and muc/ e icient $ay o storing and retrieving data' especially or a large data&ase. @/is concept emp/asiEed t/e use o ta&les and t/en linking t/e ta&les t/roug/ commands. Most o todayKs data&ase management systems implements t/e relational model

5.:

SUMMARY

@/e relational model or data&ase management is a data&ase model &ased on irst4order predicate logic' irst ormulated and proposed in +:.: &y Edgar Codd @/e undamental assumption o t/e relational model is t/at all data is represented as mat/ematical n4ary r"('t on!' an n4ary relation &eing a su&set o t/e Cartesian product o n domains. @o ully appreciate t/e relational model o data it is essential to understand t/e intended interpretation o a relation. ( t)<" as used in a typical relational data&ase mig/t &e t/e set o integers' t/e set o c/aracter strings' t/e set o dates' or t/e t$o &oolean values true and false' and so on Ot/er models are t/e /ierarc/ical model and net$ork model. 2ome systems using t/ese older arc/itectures are still in use today in data centers @/e relational model $as invented &y E.?. B@edC Codd as a general model o data' and su&se*uently maintained and developed &y C/ris "ate and )ug/ "ar$en among ot/ers. 2FL' initially pus/ed as t/e standard language or relational data&ases' deviates rom t/e relational model in several places. @/ere /ave &een several attempts to produce a true implementation o t/e relational data&ase model as originally de ined &y Codd and eDplained &y "ate' "ar$en and ot/ers' &ut none /ave &een popular successes so ar

.8

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

"ata&ase normaliEation is usually per ormed $/en designing a relational data&ase' to improve t/e logical consistency o t/e data&ase design %asic notions in t/e relational model are relation names and attribute names. One o t/e simplest and most important types o relation constraints is t/e key constraint.

?.:
+. 8.

TUTOR@MARAED ASSIGNMENT
%rie ly discuss Interpretation in 7elational Model. Mention 1 $ays in $/ic/ relational model di ers rom an 2FL

7.:

REFERENCESBFURTCER READINGS

:Derivability& )edundancy& and "onsistency of )elations 0tored in 7arge Data $anks:' E.?. Codd' I%M 7esearc/ 7eport' +:.:. :A )elational Model of Data for 7arge 0hared Data $anks: ' in Communications o t/e (CM' +:;9. W/ite' Colin. In the $eginning: An )D$M0 9istory. @eradata MagaEine Online. 2eptem&er 899, edition. U7L5 /ttp5--$$$.teradata.com-t-page-+8;91;. Codd' E.?. B+:;9C. M( 7elational Model o "ata or Large 2/ared "ata %anksM. "ommunications of the A"M += B.C5 =;;O=<;. doi5 +9.++,1-=.8=<,.=.8.<1. "ate' C. #.' "ar$en' ). B8999C. 4oundation for 4uture Database 0ystems: The Third Manifesto' 8nd edition' (ddison4Wesley !ro essional. I2%N 9489+4;9:8<4;. "ate' C. #. B899=C. Introduction to Database 0ystems. <t/ edition' (ddison4Wesley. I2%N 94=8+4+:;<,4,.

UNIT ?
CONTENTS

BASIC COMPONENTS OF DBMS

.=

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

+.9 8.9 =.9

,.9 1.9 ..9 ;.9

Introduction O&jectives Main Content =.+ Concurrency Controls =.8 #ava "ata&ase Connectivity =.= Fuery OptimiEer =., Open "ata&ase Connectivity =.1 "ata "ictionary Conclusion 2ummary @utor4Marked (ssignment 7e erences-?urt/er 7eadings

*.:

INTRODUCTION

@o &e discussed in t/ese units are t/e &asic components o any data&ase. @/ese components ensure proper control o data' access o data' *uery or data as $ell as met/ods o accessing data&ase management systems.

+.:

OB;ECTIVES

(t t/e end o t/is unit' you s/ould &e a&le to5


kno$ t/e rules guiding transaction (CI" kno$ $/at is concurrency control in data&ases mention t/e di erent met/ods o concurrency control de ine and interpret t/e acronymn #"%C ans$er t/e *uestion o t/e types and drivers o #"%C de ine *uery optimiEer' and its applications and cost estimation

,.: ,.*

MAIN CONTENT Concurr"nc) Contro(!

In data&ases' concurr"nc) contro( ensures t/at correct results or concurrent operations are generated' $/ile getting t/ose results as *uickly as possi&le. Concurr"nc) Contro( n D't'$'!"! Concurrency control in data&ase management systems B"%M2C ensures t/at data&ase transactions are per ormed concurrently $it/out t/e concurrency violating t/e data integrity o a data&ase. EDecuted transactions s/ould ollo$ t/e (CI" rules' as descri&ed &elo$. @/e "%M2 must guarantee t/at only serialiEa&le Bunless 2erialiEa&ility is intentionally relaDedC' recovera&le sc/edules are generated. It also
.,

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

guarantees t/at no e ect o committed transactions is lost' and no e ect o a&orted Brolled &ackC transactions remains in t/e related data&ase. Tr'n!'ct on ACID Ru("! (tomicity 4 Eit/er t/e e ects o all or none o its operations remain $/en a transaction is completed 4 in ot/er $ords' to t/e outside $orld t/e transaction appears to &e indivisi&le' atomic. Consistency 4 Every transaction must leave t/e data&ase in a consistent state. Isolation 4 @ransactions cannot inter ere $it/ eac/ ot/er. !roviding isolation is t/e main goal o concurrency control. "ura&ility 4 2uccess ul transactions must persist t/roug/ cras/es. Concurr"nc) Contro( M"c1'n !# @/e main categories o concurrency control mec/anisms are5
O<t # !t c 4 "elay t/e sync/roniEation or a transaction until it is end

$it/out &locking Bread' $riteC operations' and t/en a&ort transactions t/at violate desired sync/roniEation rules. P"!! # !t c 4 %lock operations o transaction t/at $ould cause violation o sync/roniEation rules. @/ere are several met/ods or concurrency control. (mong t/em5 @$o4p/ase locking 2trict t$o4p/ase locking Conservative t$o4p/ase locking IndeD locking Multiple granularity locking ( Lock is a data&ase system o&ject associated $it/ a data&ase o&ject Btypically a data itemC t/at prevents undesired Btypically sync/roniEation rule violatingC operations o ot/er transactions &y &locking t/em. "ata&ase system operations c/eck or lock eDistence' and /alt $/en noticing a lock type t/at is intended to &lock t/em.

@/ere are also non4lock concurrency control met/ods' among t/em5 Con lict BserialiEa&ility' precedenceC grap/ c/ecking @imestamp ordering

.1

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

commitment ordering (lso Optimistic concurrency control met/ods typically do not use locks. (lmost all currently implemented lock4&ased and non4lock4&ased concurrency control mec/anisms guarantee sc/edules t/at are con lict serialiEa&le Bunless relaDed orms o serialiEa&ility are neededC. )o$ever' t/ere are many researc/ teDts encouraging vie$ serialiEa&le sc/edules or possi&le gains in per ormance' especially $/en not too many con licts eDist Band not too many a&orts o completely eDecuted transactions occurC' due to reducing t/e considera&le over/ead o &locking mec/anisms. Concurr"nc) Contro( n O<"r't n- S)!t"#! Operating systems' especially real4time operating systems' need to maintain t/e illusion t/at many tasks are all running at t/e same time. 2uc/ multitasking is airly simple $/en all tasks are independent rom eac/ ot/er. )o$ever' $/en several tasks try to use t/e same resource' or $/en tasks try to s/are in ormation' it can lead to con usion and inconsistency. @/e task o concurrent computing is to solve t/at pro&lem. 2ome solutions involve MlocksM similar to t/e locks used in data&ases' &ut t/ey risk causing pro&lems o t/eir o$n suc/ as deadlock. Ot/er solutions are lock4 ree and $ait4 ree algorit/ms.

,.+

;'&' D't'$'!" Conn"ct & t)

;'&' D't'$'!" Conn"ct & t) B#"%CC is an (!I or t/e #ava programming language t/at de ines /o$ a client may access a data&ase. It provides met/ods or *uerying and updating data in a data&ase. #"%C is oriented to$ards relational data&ases. O&"r& "2 #"%C /as &een part o t/e #ava 2tandard Edition since t/e release o #"3 +.+. @/e #"%C classes are contained in t/e #ava package java.s*l. 2tarting $it/ version =.9' #"%C /as &een developed under t/e #ava Community !rocess. #27 1, speci ies #"%C =.9 Bincluded in #82E +.,C' #27 ++, speci ies t/e #"%C 7o$set additions' and #27 88+ is t/e speci ication o #"%C ,.9 Bincluded in #ava 2E .C. #"%C allo$s multiple implementations to eDist and &e used &y t/e same application. @/e (!I provides a mec/anism or dynamically loading t/e correct #ava packages and registering t/em $it/ t/e #"%C "river Manager. @/e "river Manager is used as a connection actory or
..

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

creating #"%C connections. #"%C connections support creating and eDecuting statements. @/ese may &e update statements suc/ as 2FLIs C7E(@E' IN2E7@' U!"(@E and "ELE@E' or t/ey may &e *uery statements suc/ as 2ELEC@. (dditionally' stored procedures may &e invoked t/roug/ a #"%C connection. #"%C represents statements using one o t/e ollo$ing classes5 2tatement O t/e statement is sent to t/e data&ase server eac/ and every time. !repared2tatement O t/e statement is cac/ed and t/en t/e eDecution pat/ is pre determined on t/e data&ase server allo$ing it to &e eDecuted multiple times in an e icient manner. Calla&le2tatement O used or eDecuting stored procedures on t/e data&ase. Update statements suc/ as IN2E7@' U!"(@E and "ELE@E return an update count t/at indicates /o$ many ro$s $ere a ected in t/e data&ase. @/ese statements do not return any ot/er in ormation. Fuery statements return a #"%C ro$ result set. @/e ro$ result set is used to $alk over t/e result set. Individual columns in a ro$ are retrieved eit/er &y name or &y column num&er. @/ere may &e any num&er o ro$s in t/e result set. @/e ro$ result set /as metadata t/at descri&es t/e names o t/e columns and t/eir types. @/ere is an eDtension to t/e &asic #"%C (!I in t/e javaD.s*l package t/at allo$s or scrolla&le result sets and cursor support among ot/er t/ings. ;DBC Dr &"r! #"%C "rivers are client4side adaptors Bt/ey are installed on t/e client mac/ine' not on t/e serverC t/at convert re*uests rom #ava programs to a protocol t/at t/e "%M2 can understand. T)<"!: @/ere are commercial and ree drivers availa&le or most relational data&ase servers. @/ese drivers all into one o t/e ollo$ing types5 @ype +'t/e #"%C4O"%C &ridge @ype 8' t/e Native4(!I driver @ype =' t/e net$ork4protocol driver @ype , t/e native4protocol drivers
.;

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

Internal #"%C driver' driver em&edded $it/ #7E in #ava4ena&led 2FL data&ases. Used or #ava stored procedures. @/is does not &elong to t/e a&ove classi ication' alt/oug/ it $ould likely &e eit/er a type 8 or type , driver Bdepending on $/et/er t/e data&ase itsel is implemented in #ava or notC. (n eDample o t/is is t/e 3!7% driver supplied $it/ Oracle 7"%M2. Mjd&c5de ault5connectionM is a relatively standard $ay o re erring making suc/ a connection Bat least Oracle and (pac/e "er&y support itC. @/e distinction /ere is t/at t/e #"%C client is actually running as part o t/e data&ase &eing accessed' so access can &e made directly rat/er t/an t/roug/ net$ork protocols. Sourc"! 2FL2ummit.com pu&lis/es list o drivers' including #"%C drivers and vendors 2un Microsystems provides a list o some #"%C drivers and vendors 2im&a @ec/nologies s/ips an 2"3 or &uilding custom #"%C "rivers or any custom-proprietary relational data source "ata"irect @ec/nologies provides a compre/ensive suite o ast @ype , #"%C drivers or all major data&ase I"2 2o t$are provides a @ype = #"%C driver or concurrent access to all major data&ases. 2upported eatures include resultset cac/ing' 22L encryption' custom data source' d&2/ield. i4net so t$are provides ast @ype , #"%C drivers or all major data&ases OpenLink 2o t$are s/ips #"%C "rivers or a variety o data&ases' including %ridges to ot/er data access mec/anisms Be.g.' O"%C' #"%CC $/ic/ can provide more unctionality t/an t/e targeted mec/anism #"%access is a #ava persistence li&rary or My2FL and Oracle $/ic/ de ines major data&ase access operations in an easy usa&le (!I a&ove #"%C #Net"irect provides a suite o ully 2un #8EE certi ied /ig/ per ormance #"%C drivers. )2FLis a 7"%M2 $it/ a #"%C driver and is availa&le under a %2" license.

,.,

=u"r) O<t # >"r

@/e Du"r) o<t # >"r is t/e component o a data&ase management system t/at attempts to determine t/e most e icient $ay to eDecute a *uery. @/e optimiEer considers t/e possi&le *uery plans or a given input *uery' and attempts to determine $/ic/ o t/ose plans $ill &e t/e most e icient. Cost4&ased *uery optimiEers assign an estimated McostM to eac/ possi&le *uery plan' and c/oose t/e plan $it/ t/e smallest cost.
.<

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

Costs are used to estimate t/e runtime cost o evaluating t/e *uery' in terms o t/e num&er o I-O operations re*uired' t/e C!U re*uirements' and ot/er actors determined rom t/e data dictionary. @/e set o *uery plans eDamined is ormed &y eDamining t/e possi&le access pat/s Be.g. indeD scan' se*uential scanC and join algorit/ms Be.g. sort4merge join' /as/ join' nested loopsC. @/e searc/ space can &ecome *uite large depending on t/e compleDity o t/e 2FL *uery. @/e *uery optimiEer cannot &e accessed directly &y users. Instead' once *ueries are su&mitted to data&ase server' and parsed &y t/e parser' t/ey are t/en passed to t/e *uery optimiEer $/ere optimiEation occurs. I#<("#"nt't on Most *uery optimiEers represent *uery plans as a tree o Mplan nodesM. ( plan node encapsulates a single operation t/at is re*uired to eDecute t/e *uery. @/e nodes are arranged as a tree' in $/ic/ intermediate results lo$ rom t/e &ottom o t/e tree to t/e top. Eac/ node /as Eero or more c/ild nodes 44 t/ose are nodes $/ose output is ed as input to t/e parent node. ?or eDample' a join node $ill /ave t$o c/ild nodes' $/ic/ represent t/e t$o join operands' $/ereas a sort node $ould /ave a single c/ild node Bt/e input to &e sortedC. @/e leaves o t/e tree are nodes $/ic/ produce results &y scanning t/e disk' or eDample &y per orming an indeD scan or a se*uential scan. Co!t E!t #'t on One o t/e /ardest pro&lems in *uery optimiEation is to accurately estimate t/e costs o alternative *uery plans. OptimiEers cost *uery plans using a mat/ematical model o *uery eDecution costs t/at relies /eavily on estimates o t/e cardinality' or num&er o tuples' lo$ing t/roug/ eac/ edge in a *uery plan. Cardinality estimation in turn depends on estimates o t/e selection actor o predicates in t/e *uery. @raditionally' data&ase systems estimate selectivities t/roug/ airly detailed statistics on t/e distri&ution o values in eac/ column' suc/ as /istograms @/is tec/ni*ue $orks $ell or estimation o selectivities o individual predicates. )o$ever many *ueries /ave conjunctions o predicates suc/ as select count B`C rom 7 $/ere 7.makeHI)ondaI and 7.modelHI(ccordI. Fuery predicates are o ten /ig/ly correlated B or eDample' modelHI(ccordI implies makeHI)ondaIC' and it is very /ard to estimate t/e selectivity o t/e conjunct in general. !oor cardinality estimates and uncaug/t correlation are one o t/e main reasons $/y *uery optimiEers pick poor *uery plans. @/is is one reason $/y a "%( s/ould regularly update t/e data&ase statistics' especially a ter major data loads-unloads.

.:

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

,.8

O<"n D't'$'!" Conn"ct & t)

In computing' O<"n D't'$'!" Conn"ct & t) BODBCC provides a standard so t$are (!I met/od or using data&ase management systems B"%M2C. @/e designers o O"%C aimed to make it independent o programming languages' data&ase systems' and operating systems. O&"r& "2 @/e !7(@(! speci ication o ers a procedural (!I or using 2FL *ueries to access data. (n implementation o O"%C $ill contain one or more applications' a core O"%C M"river ManagerM li&rary' and one or more Mdata&ase driversM. @/e "river Manager' independent o t/e applications and "%M2' acts as an MinterpreterM &et$een t/e applications and t/e data&ase drivers' $/ereas t/e data&ase drivers contain t/e "%M24speci ic details. @/us a programmer can $rite applications t/at use standard types and eatures $it/out concern or t/e speci ics o eac/ "%M2 t/at t/e applications may encounter. Like$ise' data&ase driver implementors need only kno$ /o$ to attac/ to t/e core li&rary. @/is makes O"%C modular. @o $rite O"%C code t/at eDploits "%M24speci ic eatures re*uires more advanced programming5 an application must use introspection' calling O"%C metadata unctions t/at return in ormation a&out supported eatures' availa&le types' syntaD' limits' isolation levels' driver capa&ilities and more. Even $/en programmers use adaptive tec/ni*ues' /o$ever' O"%C may not provide some advanced "%M2 eatures. @/e O"%C =.D (!I operates $ell $it/ traditional 2FL applications suc/ as OL@!' &ut it /as not evolved to support ric/er types introduced &y 2FL5 +::: and 2FL5899= O"%C provides t/e standard o u&i*uitous data access &ecause /undreds o O"%C drivers eDist or a large variety o data sources. O"%C operates $it/ a variety o operating systems and drivers eDist or non4relational data suc/ as spreads/eets' teDt and QML iles. %ecause O"%C dates &ack to +::8' it o ers connectivity to a $ider variety o data sources t/an ot/er data4access (!Is. More drivers eDist or O"%C t/an drivers or providers eDist or ne$er (!Is suc/ as OLE "%' #"%C' and ("O.NE@. "espite t/e &ene its o u&i*uitous connectivity and plat orm4 independence' systems designers may perceive O"%C as /aving certain dra$&acks. (dministering a large num&er o client mac/ines can involve a diversity o drivers and "LLs. @/is compleDity can increase system4administration over/ead. Large organiEations $it/ t/ousands o !Cs /ave o ten turned to O"%C server tec/nology Balso kno$n as MMulti4@ier O"%C "riversMC to simpli y t/e administration pro&lems.

;9

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

"i erences &et$een drivers and driver maturity can also raise important issues. Ne$er O"%C drivers do not al$ays /ave t/e sta&ility o drivers already deployed or years. Aears o testing and deployment mean a driver may contain e$er &ugs. "evelopers needing eatures or types not accessi&le $it/ O"%C can use ot/er 2FL (!Is. W/en not aiming or plat orm4independence' developers can use proprietary (!Is' $/et/er "%M24speci ic Bsuc/ as @ransact2FLC or language4speci ic B or eDample5 #"%C or #ava applicationsC. Br d- n- con/ -ur't on! ;DBC@ODBC Br d-"! ( #"%C4O"%C &ridge consists o a #"%C driver $/ic/ employs an O"%C driver to connect to a target data&ase. @/is driver translates #"%C met/od calls into O"%C unction calls. !rogrammers usually use suc/ a &ridge $/en a particular data&ase lacks a #"%C driver. 2un Microsystems included one suc/ &ridge in t/e #0M' &ut vie$ed it as a stop4gap measure $/ile e$ #"%C drivers eDisted. 2un never intended its &ridge or production environments' and generally recommends against its use. Independent data4access vendors no$ deliver #"%C4 O"%C &ridges $/ic/ support current standards or &ot/ mec/anisms' and $/ic/ ar outper orm t/e #0M &uilt4in. ODBC@;DBC Br d-"! (n O"%C4#"%C &ridge consists o an O"%C driver $/ic/ uses t/e services o a #"%C driver to connect to a data&ase. @/is driver translates O"%C unction calls into #"%C met/od calls. !rogrammers usually use suc/ a &ridge $/en t/ey lack an O"%C driver or a particular data&ase &ut /ave access to a #"%C driver. I#<("#"nt't on! O"%C implementations run on many operating systems' including Microso t Windo$s' UniD' LinuD' O2-8' O2-,99' I%M i1-O2' and Mac O2 Q. )undreds o O"%C drivers eDist' including drivers or Oracle' "%8' Microso t 2FL 2erver' 2y&ase' !ervasive 2FL' I%M Lotus "omino' My2FL' !ostgre2FL' and desktop data&ase products suc/ as ?ileMaker' and Microso t (ccess.

,.5

D't' D ct on'r)

( d't' d ct on'r)' as de ined in t/e I$M Dictionary of "omputing is a

;+

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

McentraliEed repository o in ormation a&out data suc/ as meaning' relations/ips to ot/er data' origin' usage' and ormat. @/e term may /ave one o several closely related meanings pertaining to data&ases and data&ase management systems B"%M2C5 a document descri&ing a data&ase or collection o data&ases an integral component o a "%M2 t/at is re*uired to determine its structure a piece o middle$are t/at eDtends or supplants t/e native data dictionary o a "%M2 D't' D ct on'r) Docu#"nt't on "ata&ase users and application developers can &ene it rom an aut/oritative data dictionary document t/at catalogs t/e organiEation' contents' and conventions o one or more data&ases @/is typically includes t/e names and descriptions o various ta&les and ields in eac/ data&ase' plus additional details' like t/e type and lengt/ o eac/ data element. @/ere is no universal standard as to t/e level o detail in suc/ a document' &ut it is primarily a distillation o metadata a&out data&ase structure' not t/e data itsel . ( data dictionary document also may include urt/er in ormation descri&ing /o$ data elements are encoded. One o t/e advantages o $ell4designed data dictionary documentation is t/at it /elps to esta&lis/ consistency t/roug/out a compleD data&ase' or across a large collection o ederated data&ases D't' D ct on'r) M dd("2'r" In t/e construction o data&ase applications' it can &e use ul to introduce an additional layer o data dictionary so t$are' i.e. middle$are' $/ic/ communicates $it/ t/e underlying "%M2 data dictionary. 2uc/ a M/ig/4 levelM data dictionary may o er additional eatures and a degree o leDi&ility t/at goes &eyond t/e limitations o t/e native Mlo$4levelM data dictionary' $/ose primary purpose is to support t/e &asic unctions o t/e "%M2' not t/e re*uirements o a typical application. ?or eDample' a /ig/4level data dictionary can provide alternative entity4relations/ip models tailored to suit di erent applications t/at s/are a common data&ase. EDtensions to t/e data dictionary also can assist in *uery optimiEation against distri&uted data&ases 2o t$are rame$orks aimed at rapid application development sometimes include /ig/4level data dictionary acilities' $/ic/ can su&stantially reduce t/e amount o programming re*uired to &uild menus' orms' reports' and ot/er components o a data&ase application' including t/e data&ase itsel . ?or eDample' !)!Lens includes a !)! class li&rary to automate t/e creation o ta&les' indeDes' and oreign key constraints porta&ly or multiple data&ases. (not/er !)!4&ased data dictionary' part o t/e 7("ICO7E toolkit' automatically generates
;8

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

program o&jects' scripts' and 2FL code or menus and orms $it/ data validation and compleD #OINs ?or t/e (2!.NE@ environment' %ase OneIs data dictionary provides cross4"%M2 acilities or automated data&ase creation' data validation' per ormance en/ancement Bcac/ing and indeD utiliEationC' application security' and eDtended data types.

8.:

CONCLUSION

@/e &asic components o any data&ase management system serve to ensure t/e availa&ility o data as $ell as t/e e iciency in accessing t/e data. @/ey include mainly' a data dictionary' *uery optimiEers' and #ava data&ase connectivity.

5.:

SUMMARY
In data&ases' concurr"nc) contro( ensures t/at correct results or concurrent operations are generated' $/ile getting t/ose results as *uickly as possi&le. ;'&' D't'$'!" Conn"ct & t) B#"%CC is an (!I or t/e #ava programming language t/at de ines /o$ a client may access a data&ase. It provides met/ods or *uerying and updating data in a data&ase. #"%C is oriented to$ards relational data&ases. @/e Du"r) o<t # >"r is t/e component o a data&ase management system t/at attempts to determine t/e most e icient $ay to eDecute a *uery. @/e optimiEer considers t/e possi&le *uery plans or a given input *uery' and attempts to determine $/ic/ o t/ose plans $ill &e t/e most e icient. In computing' O<"n D't'$'!" Conn"ct & t) BODBCC provides a standard so t$are (!I met/od or using data&ase management systems B"%M2C. @/e designers o O"%C aimed to make it independent o programming languages' data&ase systems' and operating systems. ( d't' d ct on'r)' as de ined in t/e I$M Dictionary of "omputing is a McentraliEed repository o in ormation a&out data suc/ as meaning' relations/ips to ot/er data' origin' usage' and ormat In t/e construction o data&ase applications' it can &e use ul to introduce an additional layer o data dictionary so t$are' i.e. middle$are' $/ic/ communicates $it/ t/e underlying "%M2 data dictionary

?.:
+. 8.

TUTOR@MARAED ASSIGNMENT
"e ine t/e @ransaction (CI" rules. List and de ine types o #"%C "river.

7.:

REFERENCESBFURTCER READINGS
;=

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

(CM' I%M "ictionary o Computing' +9t/ edition' +::= @ec/@arget' 0earch02A' W/at is a "ata "ictionaryN ()IM( !ractice %rie ' Guidelines or "eveloping a "ata "ictionary' ;ournal of A9IMA ;;' no.8 B?e&ruary 899.C5 .,(4". U.2. !atent ,;;,..+' "ata&ase management system $it/ active data dictionary' ++-+:-+:<1' (@L@ U.2. !atent ,;.:;;8' (utomated Fuery OptimiEation Met/od using &ot/ Glo&al and !arallel Local OptimiEations or MaterialiEation access !lanning or "istri&uted "ata&ases' 98-8<-+:<1' )oney$ell %ull. !)!Lens' ("Od& "ata "ictionary Li&rary or !)! 7("ICO7E' $/at is a "ata "ictionaryN %ase One International Corp.' %ase One "ata "ictionary C/aud/uri' 2urajit B+::<C. M(n Overvie$ o Fuery OptimiEation in 7elational 2ystemsU. (roceedings of the A"M 0ymposium on (rinciples of Database 0ystems5 pages =,O,=. doi5 +9.++,1-8;1,<;.8;1,:8. Ioannidis' Aannis BMarc/ +::.C. MFuery optimiEationM. "omputing 0urveys +8 B+C5 +8+O+8=. +9.++,1-8=,=+=.8=,=.;. A"M doi5

2elinger' !atricia' et al. B+:;:C. M(ccess !at/ 2election in a 7elational "ata&ase Management 2ystemM. (roceedings of the !<5< A"M 0I M2D International "onference on Management of Data5 8=4=,. doi5+9.++,1-1<89:1.1<89::. !arkes' Clara ). B(pril +::.C. M!o$er to t/e !eopleM' D$M0 Maga3ine' Miller ?reeman' Inc.

MODULE +
Unit + Unit 8 Unit = Unit , "evelopment and "esign4O "ata&ase 2tructured Fuery Languages B2FLC "ata&ase and In ormation 7elational 2ystems "ata&ase (dministrator and (dministration

UNIT *
;,

DEVELOPMENT AND DESIGN@OF DATABASE

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

CONTENTS +.9 8.9 =.9 Introduction O&jectives Main Content =.+ "ata&ase "evelopment =.+.+ "ata !lanning and "ata&ase "esign =.8 "esign o "ata&ase =.8.+ "ata&ase NormaliEation =.8.8 C !tor) ,., Nor#'( For#! ,.8 D"nor#'( >'t on ,.5 Non@/ r!t nor#'( /or# 5NFG or N*NF6 Conclusion 2ummary @utor4Marked (ssignment 7e erences-?urt/er 7eadings

,.9 1.9 ..9 ;.9

*.:

INTRODUCTION

"ata&ase design is t/e process o deciding /o$ to organiEe data into recordstypes and /o$ t/e record types and /o$ t/e record types and /o$t/e record types $ill relate to eac/ ot/er. @/e "%M2 mirrorKs t/e organiEationKs data structure and process transactions e iciently. "eveloping small' personal data&ases is relatively easy using microcomputer "%M2 packages or $iEards. )o$ever' developing a large data&ase o compleD o compleD data types can &e a compleD task. In many companies' developing and managing large corporate data&ases are t/e primary responsi&ility o t/e data&ase administrator and data&ase design analysts. @/ey $ork $it/ end users and systems analyst to model &usiness processes and t/e data re*uired. @/en t/ey determine5 +. W/at data de initions s/ould &e included in t/e data&ases 8. W/at structures or relations/ips s/ould eDist among t/e data elementsN

+.:

OB;ECTIVES

(t t/e end o t/is unit' you s/ould &e a&le to5 understand t/e concept o data planning and data&ase design kno$ t/e steps in t/e development o data&ases identi y t/e unctions o eac/ step o t/e design process de ine data&ase normaliEation kno$ t/e pro&lems addressed &y normaliEations
;1

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

de ine normal orms rom +st to .t/ orms de ine and understand t/e term denormaliEation

,.: ,.*

MAIN CONTENT D't'$'!" D"&"(o<#"nt

,.*.* D't' P('nn n- 'nd D't'$'!" D"! -n


(s igure + illustrates' data&ase development may start $it/ a top4do$n d't' <('nn n- <roc"!!. "ata&ase administrators and designers $ork $it/ corporate and end user management to develop an "nt"r<r !" #od"( t/at de ines t/e &asic &usiness process o t/e enterprise. @/en t/ey de ine t/e in ormation needs o end4users in a &usiness process suc/ as t/e purc/asing- receiving process t/at all &usiness /as. NeDt' end users must identi y t/e key data elements t/at are needed to per orm t/e speci ic &usiness activities. @/is re*uently involves developing entity relations/ips among t/e diagrams BE7"sC t/at model t/e relations/ips among t/e many entities involved in t/e &usiness processes. End users and data&ase designers could use E7" availa&le to identi y $/at suppliers and product data are re*uired to activate t/eir purc/asing-receiving and ot/er &usiness processes using enterprise resource planning BE7!C or supply c/ain management B2CMC so t$are. 2uc/ usersK vie$s are a major part o a d't' #od"( n- process $/ere t/e relations/ips &et$een data elements are identi ied. Eac/ data model de ines t/e logical relations/ips among t/e data elements needed to support a &asic &usiness process. ?or eDample' can a supplier provide more t/an t/e type o product to useN Can a customer /ave more t/an one type o product to useN Can a customer /ave more t/an one type o account $it/ usN Can an employee /ave several pay rates or &e assigned to several projects or $orkgroupN (ns$ering suc/ *uestions $ill identi y data relations/ips t/at /ave to &e represented in a data model t/at supports a &usiness process. @/ese data models t/en serves as logical rame$orks Bcalled sc/emas and su& sc/emasC on $/ic/ to &ase t/e p/ysical design o data&ases and t/e development o application programs to support &usiness processes o t/e organiEation. ( sc/ema is an overall logical vie$ o t/e relations/ip among t/e data elements in a data&ase' $/ile t/e su& sc/ema is a logical vie$ o t/e data relations/ips needed to support speci ic end user application programs t/at $ill access t/at data&ase. 7emem&er t/at data models represent logical vie1s o data and relations/ips o t/e data&ase. !/ysical data&ase design takes a physical
;.

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

vie1 o t/e data Balso called internal vie$C t/at descri&es /o$ data are to &e p/ysically stored and accessed on t/e storage devices o a computer system. ?or eDample' igure 8 illustrates t/ese di erent vie$s and t/e so t$are inter ace o a &ank data&ase processing system. In t/is eDample' c/ecking' saving and installment lending are t/e &usiness process $/ere data models are part o a &anking services data model t/at serves as a logical data rame$ork or all &ank services.

,.+

D"! -n o/ D't'$'!"

,.+.* D't'$'!" Nor#'( >'t on


2ometimes re erred to as canonical synthesis' is a tec/ni*ue or designing relational data&ase ta&les to minimiEe duplication o in ormation and' in so doing' to sa eguard t/e data&ase against certain types o logical or structural pro&lems' namely data anomalies. ?or eDample' $/en multiple instances o a given piece o in ormation occur in a ta&le' t/e possi&ility eDists t/at t/ese instances $ill not &e kept consistent $/en t/e data $it/in t/e ta&le is updated' leading to a loss o *. D't' P('nn n- t/at is su iciently normaliEed !/ysical "ata Modes data integrity. ( ta&le is less vulnera&le to "evelops a model o storage representation pro&lems o t/is kind' &ecause its structure re lects t/e &asic &usiness process andsame access met/ods assumptions or $/en multiple instances o t/e in ormation s/ould &e represented &y a single instance only. )ig/er degrees o normaliEation typically involve more ta&les and create t/e need or a models larger o num&er o joins' $/ic/ can reduce 5. P1)! c'( D"! per -n ormance. Enterprise (ccordingly' more $it/ /ig/ly normaliEed ta&les are typically "etermines t/e data used in %usiness process structures and processBe.g. an 2torage documentation data&ase applications involving many isolated transactions met/ods (utomated teller mac/ineC' $/ile less normaliEed ta&les tend to &e used in data&ase applications t/at need to map compleD relations/ips &et$een data entities and data attri&utes Be.g. a reporting application' or a ull4 teDt searc/ applicationC.
+. R"Du r"#"nt S<"c / c't on Logical "ata Models "e ine in ormation needs o end e.g. relational' net$ork "ata&ase descri&es Uses in a t/eory &usiness process a ta&leIs degree o normaliEation in terms o /ierarc/ical' multidimensional normal orms o successively /ig/er degrees o strictness. ( ta&le in Or o&ject4oriented @/ird Normal ?orm B,NFC' or eDample' is conse*uently models in 2econd

Normal ?orm B+NFC as $ellJ &ut t/e reverse is not necessarily t/e case. F"escription -ur" *: D't'$'!" D"&"(o<#"nt Structur" o user needs
May &e represented in natural Language or using t/e tools o !articular design met/odology 8. Lo- c'( D"! -n @ranslates t/e conceptual models into t/e data model o a "%M2

,. Conc"<tu'( D"! -n EDpresses all in ormation re*uirements in t/e orm o a /ig/4level model

Conceptual "ata Model O ten ;; eDpressed as entity relations/ip models

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

Not": "ata&ase development involves data planning and data&ase design activities. "ata models t/at support &usiness process are used to develop data&ases t/at meet t/e in ormation needs o users.

;<

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

F -ur" +: E7'#<("! o/ t1" (o- c'( 'nd <1)! c'( d't'$'!" & "2! 'nd t1" !o/t2'r" nt"r/'c" o/ ' $'n0 n- !"r& c" n/or#'t on !)!t"#. C1"c0 nS'& n-!
In!t'((#"nt Lo'n A<<( c't on

C/ecking and 2avings "ata Model

Installment Loan "ata Model

Lo- c'( u!"r V "2 "ata elements and relations Bt/e su& sc/emasC needed or c/ecking' savings' or installment loan processing

%anking 2ervice "ata Model

D't' element and relations/ips Bt/e sc/emaC needed or support all &anking services

"ata&ase Management 2ystem

So/t2'r" Int"r/'c" @/e "%M2 provides access to t/e &anks data&ases

%ank "ata&ases

P1)! c'( D't' V "2! organiEation and location o "ata on t/e storage media.

(lt/oug/ t/e normal orms are o ten de ined in ormally in terms o t/e c/aracteristics o ta&les' rigorous de initions o t/e normal orms are concerned $it/ t/e c/aracteristics o mat/ematical constructs kno$n as relations. W/enever in ormation is represented relationally' it is meaning ul to consider t/e eDtent to $/ic/ t/e representation is normaliEed. Pro$("#! 'ddr"!!"d $) nor#'( >'t on (n U<d't" Ano#'(). Employee 1+: is s/o$n as /aving di erent addresses on di erent records. (n In!"rt on Ano#'(). Until t/e ne$ aculty mem&er is assigned to teac/ at least one course' /is details cannot &e recorded.

;:

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

( D"("t on Ano#'(). (ll in ormation a&out "r. Giddens is lost $/en /e temporarily ceases to &e assigned to any courses. ( ta&le t/at is not su iciently normaliEed can su er rom logical inconsistencies o various types' and rom anomalies involving data operations. In suc/ a ta&le5

@/e same in ormation can &e eDpressed on multiple recordsJ t/ere ore updates to t/e ta&le may result in logical inconsistencies. ?or eDample' eac/ record in an MEmployeesI 2killsM ta&le mig/t contain an Employee I"' Employee (ddress' and 2killJ t/us a c/ange o address or a particular employee $ill potentially need to &e applied to multiple records Bone or eac/ o /is skillsC. I t/e update is not carried t/roug/ success ullySi ' t/at is' t/e employeeIs address is updated on some records &ut not ot/ersSt/en t/e ta&le is le t in an inconsistent state. 2peci ically' t/e ta&le provides con licting ans$ers to t/e *uestion o $/at t/is particular employeeIs address is. @/is p/enomenon is kno$n as an u<d't" 'no#'(). @/ere are circumstances in $/ic/ certain acts cannot &e recorded at all. ?or eDample' eac/ record in a M?aculty and @/eir CoursesM ta&le mig/t contain a ?aculty I"' ?aculty Name' ?aculty )ire "ate' and Course CodeSt/us $e can record t/e details o any aculty mem&er $/o teac/es at least one course' &ut $e cannot record t/e details o a ne$ly4/ired aculty mem&er $/o /as not yet &een assigned to teac/ any courses. @/is p/enomenon is kno$n as an n!"rt on 'no#'(). @/ere are circumstances in $/ic/ t/e deletion o data representing certain acts necessitates t/e deletion o data representing completely di erent acts. @/e M?aculty and @/eir CoursesM ta&le descri&ed in t/e previous eDample su ers rom t/is type o anomaly' or i a aculty mem&er temporarily ceases to &e assigned to any courses' $e must delete t/e last o t/e records on $/ic/ t/at aculty mem&er appears. @/is p/enomenon is kno$n as a d"("t on 'no#'().

Ideally' a relational data&ase ta&le s/ould &e designed in suc/ a $ay as to eDclude t/e possi&ility o update' insertion' and deletion anomalies. @/e normal orms o relational data&ase t/eory provide guidelines or deciding $/et/er a particular design $ill &e vulnera&le to suc/ anomalies. It is possi&le to correct an unnormaliEed design so as to make it ad/ere to t/e demands o t/e normal orms5 t/is is called normaliEation. 7emoval o redundancies o t/e ta&les $ill lead to several ta&les' $it/ re erential integrity restrictions &et$een t/em.

<9

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

NormaliEation typically involves decomposing an unnormaliEed ta&le into t$o or more ta&les t/at' $ere t/ey to &e com&ined BjoinedC' $ould convey eDactly t/e same in ormation as t/e original ta&le. B'c0-round to nor#'( >'t on: d"/ n t on!

Funct on'( D"<"nd"nc)5 (ttri&ute % /as a unctional dependency on attri&ute ( i.e. A H B i ' or eac/ value o attri&ute (' t/ere is eDactly one value o attri&ute %. I value o ( is repeating in tuples t/en value o % $ill also repeat. In our eDample' Employee (ddress /as a unctional dependency on Employee I"' &ecause a particular Employee I" value corresponds to one and only one Employee (ddress value. BNote t/at t/e reverse need not &e true5 several employees could live at t/e same address and t/ere ore one Employee (ddress value could correspond to more t/an one Employee I". Employee I" is t/ere ore not unctionally dependent on Employee (ddress.C (n attri&ute may &e unctionally dependent eit/er on a single attri&ute or on a com&ination o attri&utes. It is not possi&le to determine t/e eDtent to $/ic/ a design is normaliEed $it/out understanding $/at unctional dependencies apply to t/e attri&utes $it/in its ta&lesJ understanding t/is' in turn' re*uires kno$ledge o t/e pro&lem domain. ?or eDample' an Employer may re*uire certain employees to split t/eir time &et$een t$o locations' suc/ as Ne$ Aork City and London' and t/ere ore $ant to allo$ Employees to /ave more t/an one Employee (ddress. In t/is case' Employee (ddress $ould no longer &e unctionally dependent on Employee I". Tr & '( Funct on'( D"<"nd"nc)5 ( trivial unctional dependency is a unctional dependency o an attri&ute on a superset o itsel . WEmployee I"' Employee (ddressX a WEmployee (ddressX is trivial' as is WEmployee (ddressX a WEmployee (ddressX. Fu(( Funct on'( D"<"nd"nc)5 (n attri&ute is ully unctionally dependent on a set o attri&utes Q i it is unctionally dependent on Q' and not unctionally dependent on any proper su&set o Q. WEmployee (ddressX /as a unctional dependency on WEmployee I"' 2killX' &ut not a full unctional dependency' &ecause is also dependent on WEmployee I"X. Tr'n! t &" D"<"nd"nc)5 ( transitive dependency is an indirect unctional dependency' one in $/ic/ =a> only &y virtue o =a? and ?a>.

4 4

<+

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

Mu(t &'(u"d D"<"nd"nc)5 ( multivalued dependency is a constraint according to $/ic/ t/e presence o certain ro$s in a ta&le implies t/e presence o certain ot/er ro$s5 see t/e Multivalued "ependency article or a rigorous de inition. ;o n D"<"nd"nc)5 ( ta&le T is su&ject to a join dependency i T can al$ays &e recreated &y joining multiple ta&les eac/ /aving a su&set o t/e attri&utes o T. Su<"rA")5 ( superkey is an attri&ute or set o attri&utes t/at uni*uely identi ies ro$s $it/in a ta&leJ in ot/er $ords' t$o distinct ro$s are al$ays guaranteed to /ave distinct superkeys. WEmployee I"' Employee (ddress' 2killX $ould &e a superkey or t/e MEmployeesI 2killsM ta&leJ WEmployee I"' 2killX $ould also &e a superkey. C'nd d't" A")5 ( candidate key is a minimal superkey' t/at is' a superkey or $/ic/ $e can say t/at no proper su&set o it is also a superkey. WEmployee Id' 2killX $ould &e a candidate key or t/e MEmployeesI 2killsM ta&le. Non@Pr #" Attr $ut"5 ( non4prime attri&ute is an attri&ute t/at does not occur in any candidate key. Employee (ddress $ould &e a non4prime attri&ute in t/e MEmployeesI 2killsM ta&le. Pr #'r) A")5 Most "%M2s re*uire a ta&le to &e de ined as /aving a single uni*ue key' rat/er t/an a num&er o possi&le uni*ue keys. ( primary key is a key $/ic/ t/e data&ase designer /as designated or t/is purpose.

,.+.+ C !tor)
Edgar ?. Codd irst proposed t/e process o normaliEation and $/at came to &e kno$n as t/e *!t nor#'( /or#5 @/ere is' in act' a very simple elimination procedure $/ic/ $e s/all call normaliEation. @/roug/ decomposition non4simple domains are replaced &y Mdomains 1hose elements are atomic -non#decomposable. values.M SEdgar ?. Codd' ( 7elational Model o "ata or Large 2/ared "ata %anks In /is paper' Edgar ?. Codd used t/e term Mnon4simpleM domains to descri&e a /eterogeneous data structure' &ut later researc/ers $ould re er to suc/ a structure as an a&stract data type.

,.,

Nor#'( For#!

@/e nor#'( /or#! Ba&&rev. NFC o relational data&ase t/eory provide criteria or determining a ta&leIs degree o vulnera&ility to logical inconsistencies and anomalies. @/e /ig/er t/e normal orm applica&le to a ta&le' t/e less vulnera&le it is to inconsistencies and anomalies. Eac/
<8

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

ta&le /as a M1 -1"!t nor#'( /or#M BCNFC5 &y de inition' a ta&le al$ays meets t/e re*uirements o its )N? and o all normal orms lo$er t/an its )N?J also &y de inition' a ta&le ails to meet t/e re*uirements o any normal orm /ig/er t/an its )N?. F r!t nor#'( /or#: ( ta&le is in irst normal orm B+N?C i and only i it represents a relation. Given t/at data&ase ta&les em&ody a relation4like orm' t/e de ining c/aracteristic o one in irst normal orm is t/at it does not allo$ duplicate ro$s or nulls. 2imply put' a ta&le $it/ a uni*ue key B$/ic/' &y de inition' prevents duplicate ro$sC and $it/out any nulla&le columns is in +N?. S"cond nor#'( /or#: T1" cr t"r ' /or second normal orm 58N?6 'r": @/e ta&le must &e in +N?. None o t/e non4prime attri&utes o t/e ta&le are unctionally dependent on a part Bproper su&setC o a candidate keyJ in ot/er $ords' all unctional dependencies o non4prime attri&utes on candidate keys are ull unctional dependencies. ?or eDample' consider an MEmployeesI 2killsM ta&le $/ose attri&utes are Employee I"' Employee Name' and 2killJ and suppose t/at t/e com&ination o Employee I" and 2kill uni*uely identi ies records $it/in t/e ta&le. Given t/at Employee Name depends on only one o t/ose attri&utes O namely' Employee I" O t/e ta&le is not in 8N?. In simple' a ta&le is 8N? i it is in +N? and all ields are dependant on t/e $/ole o t/e primary key' or a relation is in 8N? i it is in +N? and every non4key attri&ute is ully dependent on eac/ candidate key o t/e relation. Note t/at i none o a +N? ta&leIs candidate keys are composite O i.e. every candidate key consists o just on" attri&ute O t/en $e can say immediately t/at t/e ta&le is in 8N?. (ll columns must &e a act a&out t/e entire key' and not a su&set o t/e key.

T1 rd Nor#'( For#: T1" cr t"r ' /or t/ird normal orm 5=N?6 'r": @/e ta&le must &e in 8N?. @ransitive dependencies must &e eliminated. (ll attri&utes must rely only on t/e primary key. 2o' i a data&ase /as a ta&le $it/ columns 2tudent I"' 2tudent' Company' and Company !/one Num&er' it is not in =N?. @/is is &ecause t/e !/one num&er relies on t/e Company. 2o' or it to &e in =N?' t/ere must &e a second ta&le $it/ Company and Company !/one Num&er columnsJ t/e !/one Num&er column in t/e irst ta&le $ould &e removed.

<=

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

Fourt1 nor#'( /or#: ( ta&le is in ourt/ normal orm B,N?C i and only i ' or every one o its non4trivial multivalued dependencies = ?' = is a superkeySt/at is' = is eit/er a candidate key or a superset t/ereo . ?or eDample' i you can /ave t$o p/one num&ers values and t$o email address values' t/en you s/ould not /ave t/em in t/e same ta&le.

F /t1 nor#'( /or#: T1" cr t"r ' /or i t/ normal orm 51N? 'nd '(!o !#-N?6 'r": @/e ta&le must &e in ,N?. @/ere must &e no non4trivial join dependencies t/at do not ollo$ rom t/e key constraints. ( ,N? ta&le is said to &e in t/e 1N? i and only i every join dependency in it is implied &y t/e candidate keys.

Do#' nB0") Nor#'( For# Bor DANFC re*uires t/at a ta&le not &e su&ject to any constraints ot/er t/an domain constraints and key constraints. S 7t1 Nor#'( For#: (ccording to t/e de inition &y C/ristop/er #. "ate and ot/ers' $/o eDtended data&ase t/eory to take account o temporal and ot/er interval data' a ta&le is in siDt/ normal orm B.N?C i and only i it satis ies no non4trivial Bin t/e ormal senseC join dependencies at all' ' meaning t/at t/e i t/ normal orm is also satis ied. W/en re erring to MjoinM in t/is conteDt it s/ould &e noted t/at "ate et al. additionally use generaliEed de initions o relational operators t/at also take account o interval data Be.g. rom4date to4dateC &y conceptually &reaking t/em do$n BMunpackingM t/emC into atomic units Be.g. individual daysC' $it/ de ined rules or joining interval data' or instance.

,.8

D"nor#'( >'t on

"ata&ases intended or Online @ransaction !rocessing BOL@!C are typically more normaliEed t/an data&ases intended or Online (nalytical !rocessing BOL(!C. OL@! (pplications are c/aracteriEed &y a /ig/ volume o small transactions suc/ as updating a sales record at a super market c/eckout counter. @/e eDpectation is t/at eac/ transaction $ill leave t/e data&ase in a consistent state. %y contrast' data&ases intended or OL(! operations are primarily Mread mostlyM data&ases. OL(! applications tend to eDtract /istorical data t/at /as accumulated over a long period o time. ?or suc/ data&ases' redundant or MdenormaliEedM data may acilitate %usiness Intelligence applications. 2peci ically' dimensional ta&les in a star sc/ema o ten contain denormaliEed data. @/e denormaliEed or redundant data must &e care ully controlled during
<,

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

E@L processing' and users s/ould not &e permitted to see t/e data until it is in a consistent state. @/e normaliEed alternative to t/e star sc/ema is t/e sno$ lake sc/ema. It /as never &een proven t/at t/is denormaliEation itsel provides any increase in per ormance' or i t/e concurrent removal o data constraints is $/at increases t/e per ormance. In many cases' t/e need or denormaliEation /as $aned as computers and 7"%M2 so t$are /ave &ecome more po$er ul' &ut since data volumes /ave generally increased along $it/ /ard$are and so t$are per ormance' OL(! data&ases o ten still use denormaliEed sc/emas. "enormaliEation is also used to improve per ormance on smaller computers as in computeriEed cas/4registers and mo&ile devices' since t/ese may use t/e data or look4up only Be.g. price lookupsC. "enormaliEation may also &e used $/en no 7"%M2 eDists or a plat orm Bsuc/ as !almC' or no c/anges are to &e made to t/e data and a s$i t response is crucial.

,.5

Non@/ r!t nor#'( /or# 5NFG or N*NF6

In recognition t/at denormaliEation can &e deli&erate and use ul' t/e non4 irst normal orm is a de inition o data&ase designs $/ic/ do not con orm to t/e irst normal orm' &y allo$ing Msets and sets o sets to &e attri&ute domainsM B2c/ek +:<8C. @/is eDtension is a Bnon4optimalC $ay o implementing /ierarc/ies in relations. 2ome academics /ave du&&ed t/is practitioner developed met/od' M?irst (&4normal ?ormM' Codd de ined a relational data&ase as using relations' so any ta&le not in +N? could not &e considered to &e relational. Consider t/e ollo$ing ta&le5 Non@F r!t Nor#'( For# P"r!on F'&or t" Co(or! %o& #ane &lue' red green' yello$' red

(ssume a person /as several avorite colors. O&viously' avorite colors consist o a set o colors modeled &y t/e given ta&le. @o trans orm t/is N?b ta&le into a +N? an MunnestM operator is re*uired $/ic/ eDtends t/e relational alge&ra o t/e /ig/er normal orms. @/e reverse operator is called MnestM $/ic/ is not al$ays t/e mat/ematical inverse o MunnestM' alt/oug/ MunnestM is t/e mat/ematical inverse to MnestM. (not/er constraint re*uired is or t/e operators to &e &ijective' $/ic/ is covered &y t/e !artitioned Normal ?orm B!N?C.
<1

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

8.:

CONCLUSION

In t/e design and development o data&ase management systems' organiEations may use one kind o "%M2 or daily transactions' and t/en move t/e detail unto anot/er computer t/at uses anot/er "%M2 &etter suited or in*uiries and analysis. Overall systems design decisions are per ormed &y data&ase administrators. @/e t/ree most common organiEations are /ierarc/ical' net$ork and relational models. ( "%M2 may provide one' t$o or all t/ree models in designing data&ase management systems.

5.:

SUMMARY
"ata&ase design is t/e process o deciding /o$ to organiEe data into records types and /o$ t/e record types $ill relate to eac/ ot/er "ata&ase development may start $it/ a top4do$n data planning process. "ata&ase administrators and designers $ork $it/ corporate and end user management to develop an enterprise model t/at de ines t/e &asic &usiness process o t/e enterprise D't'$'!" nor#'( >'t on' sometimes re erred to as canonical synthesis' is a tec/ni*ue or designing relational data&ase ta&les to minimiEe duplication o in ormation and' in so doing' to sa eguard t/e data&ase against certain types o logical or structural pro&lems' namely data anomalies Edgar ?. Codd irst proposed t/e process o normaliEation and $/at came to &e kno$n as t/e *!t nor#'( /or#5 @/e nor#'( /or#! Ba&&rev. NFC o relational data&ase t/eory provide criteria or determining a ta&leIs degree o vulnera&ility to logical inconsistencies and anomalies. "ata&ases intended or Online @ransaction !rocessing BOL@!C are typically more normaliEed t/an data&ases intended or Online (nalytical !rocessing BOL(!C. OL@! (pplications are c/aracteriEed &y a /ig/ volume o small transactions suc/ as updating a sales record at a super market c/eckout counter. In recognition t/at denormaliEation can &e deli&erate and use ul' t/e non4 irst normal orm is a de inition o data&ase designs $/ic/ do not con orm to t/e irst normal orm' &y allo$ing Msets and sets o sets to &e attri&ute domainsM

?.:
+. 8.

TUTOR@MARAED ASSIGNMENT
Mention t/e 1 p/ases in t/e development o data&ase. Identi y t/e criteria or t/e second normal orm B8N?C.

7.:
<.

REFERENCESBFURTCER READINGS

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

Codd' E.?. B#une +:;9C. M( 7elational Model o "ata or Large 2/ared "ata %anksM. "ommunications of the A"M *, B.C5 =;;O=<;. "ate' C.#. MW/at ?irst Normal ?orm 7eally MeansM in Date on Database: 8ritings /@@@#/@@A B2pringer40erlag' 899.C' p. +8<. Codd' E.?. MIs Aour "%M2 7eally 7elationalNM Computer$orld' Octo&er +,' +:<1. Coles' M. S c S"#<"r Nu((. 899;. 2FL 2erver Central. 7edgate 2o t$are. 3ent' William. M( 2imple Guide to ?ive Normal ?orms in 7elational "ata&ase @/eoryM' "ommunications of the A"M +? B8C' ?e&. +:<=' pp. +894+81. Codd' E.?. M?urt/er NormaliEation o t/e "ata %ase 7elational Model.M B!resented at Courant Computer 2cience 2ymposia 2eries .' M"ata %ase 2ystems'M Ne$ Aork City' May 8,t/481t/' +:;+.C I%M 7esearc/ 7eport 7#:9: B(ugust =+st' +:;+C. 7epu&lis/ed in 7andall #. 7ustin Bed.C' Data $ase 0ystems: "ourant "omputer 0cience 0ymposia 0eries A. !rentice4)all' +:;8. Codd' E. ?. M7ecent Investigations into 7elational "ata %ase 2ystems.M I%M 7esearc/ 7eport 7#+=<1 B(pril 8=rd' +:;,C. 7epu&lis/ed in (roc. !<5B "ongress B2tock/olm' 2$eden' +:;,C. Ne$ Aork' N.A.5 Nort/4)olland B+:;,C. ?agin' 7onald B2eptem&er +:;;C. MMultivalued "ependencies and a Ne$ Normal ?orm or 7elational "ata&asesM. A"M Transactions on Database 0ystems + B+C5 8.;. doi5+9.++,1-=8911;.=891;+. "ate' C/ris #.J )ug/ "ar$en' Nikos (. LorentEos c#anuary 899=Z. MC/apter +9 "ata&ase "esign' 2ection +9.,5 2iDt/ Normal ?ormM' Temporal Data and the )elational Model: A Detailed Investigation into the Application of Interval and )elation Theory to the (roblem of Temporal Database Management. OD ord5 Elsevier L@"' p+;.. I2%N +11<.9<11: OK%rien (. #ames' B899=C. B++t/ EditionC. Introduction to In ormation 2ystems' McGr$4)ill. \imyani' E. B#une 899.C. M@emporal (ggregates and @emporal Universal Fuanti ication in 2tandard 2FLM. A"M 0I M2D )ecord& volume CD& number /. (CM.
<;

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

UNIT +
CONTENTS +.9 8.9 ,.:

STRUCTURED =UERY LANGUAGE 5S=L6

,.9 1.9 ..9 ;.9

Introduction O&jectives M' n Cont"nt ,.* C !tor) ,.+ St'nd'rd >'t on ,., Sco<" 'nd E7t"n! on! =., L'n-u'-" E("#"nt! ,.5 Cr t c !#! o/ S=L ,.? A(t"rn't &"! to S=L Conclusion 2ummary @utor4Marked (ssignment 7e erences-?urt/er 7eadings

*.:

INTRODUCTION

S=L BStructur"d =u"r) L'n-u'-"C is a data&ase computer language designed or t/e retrieval and management o data in relational data&ase management systems B7"%M2C' data&ase sc/ema creation and modi ication' and data&ase o&ject access control management. 2FL is a standard interactive and programming language or *uerying and modi ying data and managing data&ases. (lt/oug/ 2FL is &ot/ an (N2I and an I2O standard' many data&ase products support 2FL $it/ proprietary eDtensions to t/e standard language. @/e core o 2FL is ormed &y a command language t/at allo$s t/e retrieval' insertion' updating' and deletion o data' and per orming management and administrative unctions. 2FL also includes a Call Level Inter ace B2FL-CLIC or accessing and managing data and data&ases remotely. @/e irst version o 2FL $as developed at I%M &y "onald ". C/am&erlin and 7aymond ?. %oyce in t/e early +:;9s. @/is version' initially called SE=UEL' $as designed to manipulate and retrieve data stored in I%MIs original relational data&ase product' 2ystem 7. @/e 2FL language $as later ormally standardiEed &y t/e (merican National 2tandards Institute B(N2IC in +:<.. 2u&se*uent versions o t/e 2FL standard /ave &een released as International OrganiEation or 2tandardiEation BI2OC standards. Originally designed as a declarative *uery and data manipulation language' variations o 2FL /ave &een created &y 2FL data&ase management system B"%M2C vendors t/at add procedural constructs'
<<

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

control4o 4 lo$ statements' user4de ined data types' and various ot/er language eDtensions. Wit/ t/e release o t/e 2FL5 +::: standard' many suc/ eDtensions $ere ormally adopted as part o t/e 2FL language via t/e 2FL !ersistent 2tored Modules B2FL-!2MC portion o t/e standard. Common criticisms o 2FL include a perceived lack o cross4plat orm porta&ility &et$een vendors' inappropriate /andling o missing data Bsee 'ull -0E7.' and unnecessarily compleD and occasionally am&iguous language grammar and semantics. S=L P'r'd -# A<<"'r"d n D"! -n"d $) D"&"(o<"r L't"!t r"("'!" T)< n- d !c <( n" M'%or #<("#"nt't on! D '("ct! Multi4paradigm +:;, "onald ". C/am&erlin and 7aymond ?. %oyce I%M 2FL5899.- 899. static' strong Many 2FL4<.' 2FL4<:' 2FL4:8' 2FL5+:::' 2FL5 899=' 2FL5899. "atalog CFL' LINF' Windo$s !o$er2/ell Cross4plat orm

In/(u"nc"d $) In/(u"nc"d OS

<:

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

+.:

OB;ECTIVES

(t t/e end o t/is unit' you s/ould &e a&le to5

d"/ n" !tructur" Du"r) ('n-u'-" 5S=L6 trace t/e /istory and development process o 2FL kno$ t/e scope and eDtension o 2FL identi y t/e vital indices o 2FL kno$ $/at are t/e language elements kno$ some o t/e criticism o 2FL ans$er t/e *uestion o alternatives to 2FL

,.: ,.*

MAIN CONTENT C !tor)

"uring t/e +:;9s' a group at I%M 2an #ose 7esearc/ La&oratory developed t/e 2ystem 7 relational data&ase management system' &ased on t/e model introduced &y Edgar ?. Codd in /is in luential paper' A R"('t on'( Mod"( o/ D't' /or L'r-" S1'r"d D't' B'n0!. "onald ". C/am&erlin and 7aymond ?. %oyce o I%M su&se*uently created t/e Structur"d En-( !1 =u"r) L'n-u'-" B2EFUELC to manipulate and manage data stored in 2ystem 7. @/e acronym 2EFUEL $as later c/anged to 2FL &ecause M2EFUELM $as a trademark o t/e U34&ased )a$ker 2iddeley aircra t company. @/e irst non4commercial non42FL 7"%M2' Ingres' $as developed in +:;, at t/e U.C. %erkeley. Ingres implemented a *uery language kno$n as FUEL' $/ic/ $as later supplanted in t/e marketplace &y 2FL. In t/e late +:;9s' 7elational 2o t$are' Inc. Bno$ Oracle CorporationC sa$ t/e potential o t/e concepts descri&ed &y Codd' C/am&erlin' and %oyce and developed t/eir o$n 2FL4&ased 7"%M2 $it/ aspirations o selling it to t/e U.2. Navy' CI(' and ot/er government agencies. In t/e summer o +:;:' 7elational 2o t$are' Inc. introduced t/e irst commercially availa&le implementation o 2FL' Oracle 08 B0ersion8C or 0(Q computers. 2racle 6/ &eat I%MIs release o t/e 2ystem-=< 7"%M2 to market &y a e$ $eeks. ( ter testing 2FL at customer test sites to determine t/e use ulness and practicality o t/e system' I%M &egan developing commercial products &ased on t/eir 2ystem 7 prototype including 2ystem-=<' 2FL-"2' and "%8' $/ic/ $ere commercially availa&le in +:;:' +:<+' and +:<=' respectively.

:9

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

,.+

St'nd'rd >'t on

2FL $as adopted as a standard &y (N2I in +:<. and I2O in +:<;. In t/e original 2FL standard' (N2I declared t/at t/e o icial pronunciation or 2FL is Mes *ueue elM. )o$ever' many Englis/4speaking data&ase pro essionals still use t/e nonstandard pronunciation -dsiek$fl- Blike t/e $ord Mse*uelMC. 2EFUEL $as an earlier I%M data&ase language' a predecessor to t/e 2FL language. Until +::.' t/e National Institute o 2tandards and @ec/nology BNI2@C data management standards program $as tasked $it/ certi ying 2FL "%M2 compliance $it/ t/e 2FL standard. In +::.' /o$ever' t/e NI2@ data management standards program $as dissolved' and vendors are no$ relied upon to sel 4certi y t/eir products or compliance. @/e 2FL standard /as gone t/roug/ a num&er o revisions' as s/o$n &elo$5
Y"'r N'#" +:<. 2FL4<. +:<: 2FL4<: +::8 2FL4:8 A( '! 2FL4<; ?I!2 +8;4+ 2FL8' +8;48 Co##"nt! ?irst pu&lis/ed &y (N2I. 7ati ied &y I2O in +:<;. Minor revision' adopted as ?I!2 +8;4+. ?I!2 Major revision BI2O :9;1C' +ntry 7evel 2FL4:8 adopted as ?I!2 +8;48. (dded regular eDpression matc/ing' recursive *ueries' triggers' support or procedural and control4o 4 lo$ statements' non4scalar types' and some o&ject4oriented eatures. Introduced QML4related eatures' 1indo1 functions' standardiEed se*uences' and columns $it/ auto4generated values Bincluding identity4columnsC. I2O-IEC :9;14+,5899. de ines $ays in $/ic/ 2FL can &e used in conjunction $it/ QML. It de ines $ays o importing and storing QML data in an 2FL data&ase' manipulating it $it/in t/e data&ase and pu&lis/ing &ot/ QML and conventional 2FL4data in QML orm. In addition' it provides acilities t/at permit applications to integrate into t/eir 2FL code t/e use o QFuery' t/e QML Fuery Language pu&lis/ed &y t/e World Wide We& Consortium BW=CC' to concurrently access ordinary 2FL4data and QML documents.

+::: 2FL5+::: 2FL=

899= 2FL5899=

899. 2FL5899.

:+

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

@/e 2FL standard is not reely availa&le. 2FL5 899= and 2FL5 899. may &e purc/ased rom I2O or (N2I. ( late dra t o 2FL5 899= is reely availa&le as a Eip arc/ive' /o$ever' rom W/itemars/ In ormation 2ystems Corporation. @/e Eip arc/ive contains a num&er o !"? iles t/at de ine t/e parts o t/e 2FL5 899= speci ication.

,.,

Sco<" 'nd E7t"n! on!

Proc"dur'( E7t"n! on! 2FL is designed or a speci ic purpose5 to *uery data contained in a relational data&ase. 2FL is a set4&ased' declarative *uery language' not an imperative language suc/ as C or %(2IC. )o$ever' t/ere are eDtensions to 2tandard 2FL $/ic/ add procedural programming language unctionality' suc/ as control4o 4 lo$ constructs. @/ese are5
Sourc" (N2I-I2O 2tandard I%M Microso t2y&ase My2FL Oracle !ostgre2FL !ostgre2FL Co##on Fu(( N'#" N'#" 2FL-!2M 2FL-!ersistent 2tored Modules 2FL !L @42FL 2FL !rocedural Language Bimplements 2FL-!2MC @ransact42FL

2FL-!2M 2FL-!ersistent 2tored Module Bas in I2O 2FL5899=C !L-2L !L-pg2FL !L-!2M !rocedural Language-2FL B&ased on (daC !rocedural Language-!ostgre2FL 2tructured Language B&ased on Oracle !L-2FLC !rocedural Language-!ersistent Bimplements 2FL-!2MC 2tored Fuery Modules

In addition to t/e standard 2FL-!2M eDtensions and proprietary 2FL eDtensions' procedural and o&ject4oriented programma&ility is availa&le on many 2FL plat orms via "%M2 integration $it/ ot/er languages. @/e 2FL standard de ines 2FL-#7@ eDtensions B2FL 7outines and @ypes or t/e #ava !rogramming LanguageC to support #ava code in 2FL data&ases. 2FL 2erver 8991 uses t/e 2FLCL7 B2FL 2erver Common Language 7untimeC to /ost managed .NE@ assem&lies in t/e data&ase' $/ile prior versions o 2FL 2erver $ere restricted to using unmanaged eDtended stored procedures $/ic/ $ere primarily $ritten in C. Ot/er data&ase plat orms' like My2FL and !ostgres' allo$ unctions to &e $ritten in a $ide variety o languages including !erl' !yt/on' @cl' and C. Add t on'( E7t"n! on! 2FL5 899= also de ines several additional eDtensions to t/e standard to increase 2FL unctionality overall. @/ese eDtensions include5
:8

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

@/e 2FL-CLI' or C'((@L"&"( Int"r/'c"' eDtension is de ined in I2O-IEC :9;14=5899=. @/is eDtension de ines common inter acing components Bstructures and proceduresC t/at can &e used to eDecute 2FL statements rom applications $ritten in ot/er programming languages. @/e 2FL-CLI eDtension is de ined in suc/ a $ay t/at 2FL statements and 2FL-CLI procedure calls are treated as separate rom t/e calling applicationIs source code. @/e 2FL-ME"' or M'n'-"#"nt o/ E7t"rn'( D't'' eDtension is de ined &y I2O-IEC :9;14:5899=. 2FL-ME" provides eDtensions to 2FL t/at de ine oreign4data $rappers and datalink types to allo$ 2FL to manage eDternal data. EDternal data is data t/at is accessi&le to' &ut not managed &y' an 2FL4&ased "%M2. @/e 2FL-OL%' or O$%"ct L'n-u'-" B nd n-!' eDtension is de ined &y I2O-IEC :9;14+95899=. 2FL-OL% de ines t/e syntaD and symantics o 2FL#' $/ic/ is 2FL em&edded in #ava. @/e standard also descri&es mec/anisms to ensure &inary porta&ility o 2FL# applications' and speci ies various #ava packages and t/eir contained classes. @/e 2FL-2c/emata' or In/or#'t on 'nd D"/ n t on Sc1"#'!' eDtension is de ined &y I2O-IEC :9;14++5899=. 2FL-2c/emata de ines t/e In ormation 2c/ema and "e inition 2c/ema' providing a common set o tools to make 2FL data&ases and o&jects sel 4descri&ing. @/ese tools include t/e 2FL o&ject identi ier' structure and integrity constraints' security and aut/oriEation speci ications' eatures and packages o I2O-IEC :9;1' support o eatures provided &y 2FL4&ased "%M2 implementations' 2FL4&ased "%M2 implementation in ormation and siEing items' and t/e values supported &y t/e "%M2 implementations. @/e 2FL-#7@' or S=L Rout n"! 'nd T)<"! /or t1" ;'&' Pro-r'## n- L'n-u'-"' eDtension is de ined &y I2O-IEC :9;14+=5899=. 2FL-#7@ speci ies t/e a&ility to invoke static #ava met/ods as routines rom $it/in 2FL applications. It also calls or t/e a&ility to use #ava classes as 2FL structured user4de ined types. @/e 2FL-QML' or IML@R"('t"d S<"c / c't on!' eDtension is de ined &y I2O-IEC :9;14+,5899=. 2FL-QML speci ies 2FL4&ased eDtensions or using QML in conjunction $it/ 2FL. @/e QML data type is introduced' as $ell as several routines' unctions' and QML4to42FL data type mappings to support manipulation and storage o QML in an 2FL data&ase. @/e 2FL-!2M' or P"r! !t"nt Stor"d Modu("!' eDtension is de ined &y I2O-IEC :9;14,5899=. 2FL-!2M standardiEes procedural eDtensions or 2FL' including lo$ o control' condition /andling' statement condition

:=

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

signals and resignals' cursors and local varia&les' and assignment o eDpressions to varia&les and parameters. In addition' 2FL-!2M ormaliEes declaration and maintenance o persistent data&ase language routines Be.g.' Mstored proceduresMC.

,.8

L'n-u'-" E("#"nt!

@/is c/art s/o$s several o t/e 2FL language elements t/at compose a single statement. @/e 2FL language is su&4divided into several language elements' including5

0tatements $/ic/ may /ave a persistent e ect on sc/emas and data' or $/ic/ may control transactions' program lo$' connections' sessions' or diagnostics. Eueries $/ic/ retrieve data &ased on speci ic criteria. +xpressions $/ic/ can produce eit/er scalar values or ta&les consisting o columns and ro$s o data. (redicates $/ic/ speci y conditions t/at can &e evaluated to 2FL t/ree4valued logic B=0LC %oolean trut/ values and $/ic/ are used to limit t/e e ects o statements and *ueries' or to c/ange program lo$. "lauses' $/ic/ are in some cases optional' constituent components o statements and *ueries. W/itespace is generally ignored in 2FL statements and *ueries' making it easier to ormat 2FL code or reada&ility. 2FL statements also include t/e semicolon BMJMC statement terminator. @/oug/ not re*uired on every plat orm' it is de ined as a standard part o t/e 2FL grammar.

=u"r "! @/e most common operation in 2FL data&ases is t/e *uery' $/ic/ is per ormed $it/ t/e declarative 2ELEC@ key$ord. 2ELEC@ retrieves data rom a speci ied ta&le' or multiple related ta&les' in a data&ase. W/ile o ten grouped $it/ "ata Manipulation Language B"MLC statements' t/e standard 2ELEC@ *uery is considered separate rom 2FL "ML' as it /as no persistent e ects on t/e data stored in a data&ase. Note t/at t/ere are some plat orm4speci ic variations o 2ELEC@ t/at can persist t/eir e ects in a data&ase' suc/ as t/e 2ELEC@ IN@O syntaD t/at eDists in some data&ases. 2FL *ueries allo$ t/e user to speci y a description o t/e desired result set' &ut it is le t to t/e devices o t/e data&ase management system B"%M2C to plan' optimiEe' and per orm t/e p/ysical operations
:,

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

necessary to produce t/at result set in as e icient a manner as possi&le. (n 2FL *uery includes a list o columns to &e included in t/e inal result immediately ollo$ing t/e 2ELEC@ key$ord. (n asterisk BM`MC can also &e used as a M$ildcardM indicator to speci y t/at all availa&le columns o a ta&le Bor multiple ta&lesC are to &e returned. 2ELEC@ is t/e most compleD statement in 2FL' $it/ several optional key$ords and clauses' including5

@/e ?7OM clause $/ic/ indicates t/e source ta&le or ta&les rom $/ic/ t/e data is to &e retrieved. @/e ?7OM clause can include optional #OIN clauses to join related ta&les to one anot/er &ased on user4speci ied criteria. @/e W)E7E clause includes a comparison predicate' $/ic/ is used to restrict t/e num&er o ro$s returned &y t/e *uery. @/e W)E7E clause is applied &e ore t/e G7OU! %A clause. @/e W)E7E clause eliminates all ro$s rom t/e result set $/ere t/e comparison predicate does not evaluate to @rue. @/e G7OU! %A clause is used to com&ine' or group' ro$s $it/ related values into elements o a smaller set o ro$s. G7OU! %A is o ten used in conjunction $it/ 2FL aggregate unctions or to eliminate duplicate ro$s rom a result set. @/e )(0ING clause includes a comparison predicate used to eliminate ro$s a ter t/e G7OU! %A clause is applied to t/e result set. %ecause it acts on t/e results o t/e G7OU! %A clause' aggregate unctions can &e used in t/e )(0ING clause predicate. @/e O7"E7 %A clause is used to identi y $/ic/ columns are used to sort t/e resulting data' and in $/ic/ order t/ey s/ould &e sorted Boptions are ascending or descendingC. @/e order o ro$s returned &y an 2FL *uery is never guaranteed unless an O7"E7 %A clause is speci ied.

D't' D"/ n t on @/e second group o key$ords is t/e "ata "e inition Language B""LC. ""L allo$s t/e user to de ine ne$ ta&les and associated elements. Most commercial 2FL data&ases /ave proprietary eDtensions in t/eir ""L' $/ic/ allo$ control over nonstandard eatures o t/e data&ase system. @/e most &asic items o ""L are t/e C7E(@E' (L@E7' 7EN(ME' @7UNC(@E and "7O! statements5

:1

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

C7E(@E causes an o&ject Ba ta&le' or eDampleC to &e created $it/in t/e data&ase. "7O! causes an eDisting o&ject $it/in t/e data&ase to &e deleted' usually irretrieva&ly. @7UNC(@E deletes all data rom a ta&le Bnon4standard' &ut common 2FL statementC. (L@E7 statement permits t/e user to modi y an eDisting o&ject in various $ays 44 or eDample' adding a column to an eDisting ta&le.

D't' Contro( @/e t/ird group o 2FL key$ords is t/e "ata Control Language B"CLC. "CL /andles t/e aut/oriEation aspects o data and permits t/e user to control $/o /as access to see or manipulate data $it/in t/e data&ase. Its t$o main key$ords are5

G7(N@ aut/oriEes one or more users to per orm an operation or a set o operations on an o&ject. 7E0O3E removes or restricts t/e capa&ility o a user to per orm an operation or a set o operations.

,.5

Cr t c !#! o/ S=L

@ec/nically' 2FL is a declarative computer language or use $it/ M2FL data&asesM. @/eorists and some practitioners note t/at many o t/e original 2FL eatures $ere inspired &y' &ut violated' t/e relational model or data&ase management and its tuple calculus realiEation. 7ecent eDtensions to 2FL ac/ieved relational completeness' &ut /ave $orsened t/e violations' as documented in The Third Manifesto. In addition' t/ere are also some criticisms a&out t/e practical use o 2FL5 Implementations are inconsistent and' usually' incompati&le &et$een vendors. In particular date and time syntaD' string concatenation' nulls' and comparison case sensitivity o ten vary rom vendor to vendor. @/e language makes it too easy to do a Cartesian join Bjoining all possi&le com&inationsC' $/ic/ results in Mrun4a$ayM result sets $/en W)E7E clauses are mistyped. Cartesian joins are so rarely used in practice t/at re*uiring an eDplicit C(7@E2I(N key$ord may &e $arranted.

0E7 !<</ introduced t/e C7O22 #OIN key$ord t/at allo$s t/e user to make clear t/at a cartesian join is intended' &ut t/e s/ort/and Mcomma4 joinM $it/ no predicate is still accepta&le syntaD.

:.

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

It is also possi&le to misconstruct a W)E7E on an update or delete' t/ere&y a ecting more ro$s in a ta&le t/an desired. @/e grammar o 2FL is per/aps unnecessarily compleD' &orro$ing a CO%OL4like key$ord approac/' $/en a unction4 in luenced syntaD could result in more re4use o e$er grammar and syntaD rules. @/is is per/aps due to I%MIs early goal o making t/e language more Englis/4like so t/at it is more approac/a&le to t/ose $it/out a mat/ematical or programming &ackground. B!redecessors to 2FL $ere more mat/ematical.C

R"'!on! /or ('c0 o/ <ort'$ ( t) !opular implementations o 2FL commonly omit support or &asic eatures o 2tandard 2FL' suc/ as t/e "(@E or @IME data types' pre erring variations o t/eir o$n. (s a result' 2FL code can rarely &e ported &et$een data&ase systems $it/out modi ications. @/ere are several reasons or t/is lack o porta&ility &et$een data&ase systems5 @/e compleDity and siEe o t/e 2FL standard means t/at most data&ases do not implement t/e entire standard. @/e standard does not speci y data&ase &e/avior in several important areas Be.g. indeDes' ile storage...C' leaving it up to implementations o t/e data&ase to decide /o$ to &e/ave. @/e 2FL standard precisely speci ies t/e syntaD t/at a con orming data&ase system must implement. )o$ever' t/e standardIs speci ication o t/e semantics o language constructs is less $ell4de ined' leading to areas o am&iguity. Many data&ase vendors /ave large eDisting customer &asesJ $/ere t/e 2FL standard con licts $it/ t/e prior &e/avior o t/e vendorIs data&ase' t/e vendor may &e un$illing to &reak &ack$ard compati&ility.

,.?

A(t"rn't &"! to S=L

( distinction s/ould &e made &et$een alternatives to relational *uery languages and alternatives to 2FL. @/e lists &elo$ are proposed alternatives to 2FL' &ut are still BnominallyC relational. 2ee navigational data&ase or alternatives to relational5 I%M %usiness 2ystem +8 BI%M %2+8C
:;

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

@utorial " )i&ernate Fuery Language B)FLC 4 ( #ava4&ased tool t/at uses modi ied 2FL Fuel introduced in +:;, &y t/e U.C. %erkeley Ingres project. O&ject Fuery Language "atalog .FL 4 o&ject4oriented "atalog LINF FLC 4 Fuery Inter ace to Mnesia' E@2' "ets' etc BErlang programming languageC ," Fuery Language B," FLC F%E BFuery %y EDampleC created &y Mos/g \loo ' I%M +:;; (ldat 7elational (lge&ra and "omain alge&ra

8.:

CONCLUSION

@/e structured *uery language B2FLC /as &ecome t/e o icial dominant language or $riting data&ase management system. @/is language di ers rom conventional met/ods o computer language $riting' &ecause it is not necessarily procedural. (n 2FL statement is not really a command to computer &ut it is rat/er a description o some o t/e daatcotained in a data&ase. 2FL is not procedural &ecause it does not give step4&y4step commands to t/e computer or data&ase. It descri&es data and sometimes instructs t/e data&ase to do somet/ing $it/ t/e data. Irrespective o t/is' 2FL /as it o$n criticism.

5.:

SUMMARY
S=L BStructur"d =u"r) L'n-u'-"C is a data&ase computer language designed or t/e retrieval and management o data in relational data&ase management systems B7"%M2C' data&ase sc/ema creation and modi ication' and data&ase o&ject access control management. "uring t/e +:;9s' a group at I%M 2an #ose 7esearc/ La&oratory developed t/e 2ystem 7 relational data&ase management system' &ased on t/e model introduced &y Edgar ?. Codd in /is in luential paper' A R"('t on'( Mod"( o/ D't' /or L'r-" S1'r"d D't' B'n0!. 2FL $as adopted as a standard &y (N2I in +:<. and I2O in +:<;. In t/e original 2FL standard' (N2I declared t/at t/e o icial pronunciation or 2FL is Mes *ueue elM.

:<

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

2FL is designed or a speci ic purpose5 to *uery data contained in a relational data&ase. 2FL is a set4&ased' declarative *uery language' not an imperative language suc/ as C or %(2IC. @/is c/art s/o$s several o t/e 2FL language elements t/at compose a single statement. @ec/nically' 2FL is a declarative computer language or use $it/ M2FL data&asesM. @/eorists and some practitioners note t/at many o t/e original 2FL eatures $ere inspired &y' &ut violated' t/e relational model or data&ase management and its tuple calculus realiEation. ( distinction s/ould &e made &et$een alternatives to relational *uery languages and alternatives to 2FL

?.:

TUTOR@MARAED ASSIGNMENT

List and discuss t/e su&4divisions o t/e language o structures *uery language

7.:

REFERENCESBFURTCER READINGS

C/apple' Mike. M2FL ?undamentals B)@MLC. About.com: Databases. (&out.com. M2tructured Fuery Language B2FLCM B)@MLC. International %usiness Mac/ines BOcto&er 8;' 899.C. Codd' E.?. B#une +:;9C. M( 7elational Model o "ata or Large 2/ared "ata %anksM. "ommunications of the A"M *, BNo. .C5 pp. =;;O =<;. (ssociation or Computing Mac/inery. doi5 +9.++,1-=.8=<,.=.8.<1. C/am&erlin' "onald ".J %oyce' 7aymond ?. B+:;,C. M2EFUEL5 ( 2tructured Englis/ Fuery LanguageM. (roceedings of the !<5B A"M 0I 4ID+T 8orkshop on Data Description& Access and "ontrol5 pp. 8,:O8.,. (ssociation or Computing Mac/inery.
a b

Oppel' (ndy BMarc/ +' 899,C. Databases Demystified. 2an ?rancisco' C(5 McGra$4)ill Os&orne Media' pp. :94:+. I2%N 949;4881=.,4:.

M)istory o I%M' +:;< B)@MLC. I$M Archives. I%M.

::

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

C/apple' Mike BNC. M2FL ?undamentalsM B)@MLC. About.com. (&out.com' ( Ne$ Aork @imes Company. 7etrieved on 899;49<4=9. Melton' #imJ (lan 7 2imon "omplete uide. +11<.98,1=. Tc/apter pronounced Mess cue Mse*uelMC' is a...U B+::=C. *nderstanding the 'e1 0E7: A Morgan 3au mann' 1=.. I2%N5 +.8 W/at is 2FLN 2FL Bcorrectly ell'M instead o t/e some$/at common

MUnderstand 2FLM. $$$. a*s.org-docs-. "oll' 2/elley B#une +:' 8998C. MIs 2FL a 2tandard (nymoreNM B)@MLC. Tech)epublicFs $uilder.com. @ec/7epu&lic. 7etrieved on 899;49.49:. I02GI+" <@5D#!!:/@@C: Information -0E7G0chemata.' 899=' pp. p. +. and Definition 0chemas

(N2I-I2O-IEC International 2tandard BI2C. "ata&ase Language 2FLS !art 85 ?oundation B2FL-?oundationC. +:::. MIN@O Clause B@ransact42FLCM B)@MLC. 0E7 0erver /@@D $ooks 2nline. Microso t B899;C. 7etrieved on 899;49.4+;. M. Negri' G. !elagatti' L. 2&attella B+:<:C 0emantics and problems of universal quantification in 0E7. Claudio ?ratarcangeli B+::+C Technique for universal quantification in 0E7. #alal 3a$as/ "omplex quantification in 0tructured Euery 7anguage -0E7.: a Tutorial *sing )elational "alculus 4 #ournal o Computers in Mat/ematics and 2cience @eac/ing I22N 9;=+4:81< 0olume 8=' Issue 8' 899, ((CE Nor olk' 0(.

+99

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

UNIT ,
CONTENTS +.9 8.9 ,.:

DATABASE AND INFORMATION SYSTEMS SECURITY

,.9 1.9 ..9 ;.9

Introduction O&jectives M' n Cont"nt ,.* B'! c Pr nc <("! =.8 "ata&ase 2ecurity =.= 7elational "%M2 2ecurity =., !roposed OO"%M2 2ecurity Models ,.5 S"cur t) C('!! / c't on /or In/or#'t on ,.? Cr)<to-r'<1) ,.7 D !'!t"r R"co&"r) P('nn nConclusion 2ummary @utor4Marked (ssignment 7e erences-?urt/er 7eadings

*.:

INTRODUCTION

D't' !"cur t) is t/e means o ensuring t/at data is kept sa e rom corruption and t/at access to it is suita&ly controlled. @/us data security /elps to ensure privacy. It also /elps in protecting personal data. In/or#'t on !"cur t) means protecting in ormation and in ormation systems rom unaut/oriEed access' use' disclosure' disruption' modi ication' or destruction. @/e terms in ormation security' computer security and in ormation assurance are re*uently used interc/angea&ly. @/ese ields are interrelated and s/are t/e common goals o protecting t/e con identiality' integrity and availa&ility o in ormationJ /o$ever' t/ere are some su&tle di erences &et$een t/em. @/ese di erences lie primarily in t/e approac/ to t/e su&ject' t/e met/odologies used' and t/e areas o concentration. In ormation security is concerned $it/ t/e con identiality' integrity and availa&ility o data regardless o t/e orm t/e data may take5 electronic' print' or ot/er orms. Governments' military' inancial institutions' /ospitals' and private &usinesses amass a great deal o con idential in ormation a&out t/eir employees' customers' products' researc/' and inancial status. Most o t/is in ormation is no$ collected' processed and stored on electronic computers and transmitted across net$orks to ot/er computers. 2/ould con idential in ormation a&out a &usinesses customers or inances or ne$ product line all into t/e /ands o a competitor' suc/ a &reac/ o security could lead to lost &usiness' la$ suits or even &ankruptcy o t/e
+9+

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

&usiness. !rotecting con idential in ormation is a &usiness re*uirement' and in many cases also an et/ical and legal re*uirement. ?or t/e individual' in ormation security /as a signi icant e ect on privacy' $/ic/ is vie$ed very di erently in di erent cultures. @/e ield o in ormation security /as gro$n and evolved signi icantly in recent years. (s a career c/oice t/ere are many $ays o gaining entry into t/e ield. It o ers many areas or specialiEation including In ormation 2ystems (uditing' %usiness Continuity !lanning and "igital ?orensics 2cience' to name a e$.

+.:

OB;ECTIVES

(t t/e end o t/e unit' you s/ould &e a&le to5 understand t/e concepts o t/e CI( @rade in respect o in ormation systems security kno$ t/e components o t/e "onn !arker model or t/e classic @riad identi y t/e di erent types o in ormation access control and /o$ t/ey di er rom eac/ ot/er di erentiate "iscretionary and Mandatory (ccess Control !olicies kno$ t/e !roposed OO"%M2 2ecurity Models di erentiate &et$een t/e OO"%M2 models de ining appropriate procedures and protection re*uirements or in ormation security de ine cryptograp/y and kno$ its applications in data security.

,.: ,.*

MAIN CONTENT B'! c Pr nc <("!

,.*.* A") Conc"<t!


?or over t$enty years in ormation security /as /eld t/at con identiality' integrity and availa&ility Bkno$n as t/e CI( @riadC are t/e core principles o in ormation system security. Confidentiality Con identiality is t/e property o preventing disclosure o in ormation to unaut/oriEed individuals or systems. ?or eDample' a credit card transaction on t/e Internet re*uires t/e credit card num&er to &e transmitted rom t/e &uyer to t/e merc/ant and rom t/e merc/ant to a transaction processing net$ork. @/e system attempts to en orce con identiality &y encrypting t/e card num&er during transmission' &y limiting t/e places $/ere it mig/t appear Bin data&ases' log iles'
+98

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

&ackups' printed receipts' and so onC' and &y restricting access to t/e places $/ere it is stored. I an unaut/oriEed party o&tains t/e card num&er in any $ay' a &reac/ o con identiality /as occurred. %reac/es o con identiality take many orms. !ermitting someone to look over your s/oulder at your computer screen $/ile you /ave con idential data displayed on it could &e a &reac/ o con identiality. I a laptop computer containing sensitive in ormation a&out a companyIs employees is stolen or sold' it could result in a &reac/ o con identiality. Giving out con idential in ormation over t/e telep/one is a &reac/ o con identiality i t/e caller is not aut/oriEed to /ave t/e in ormation. Con identiality is necessary B&ut not su icientC or maintaining t/e privacy o t/e people $/ose personal in ormation a system /olds. Integrity In in ormation security' integrity means t/at data cannot &e modi ied $it/out aut/oriEation. B@/is is not t/e same t/ing as re erential integrity in data&ases.C Integrity is violated $/en an employee Baccidentally or $it/ malicious intentC deletes important data iles' $/en a computer virus in ects a computer' $/en an employee is a&le to modi y /is o$n salary in a payroll data&ase' $/en an unaut/oriEed user vandaliEes a $e& site' $/en someone is a&le to cast a very large num&er o votes in an online poll' and so on. vailability ?or any in ormation system to serve its purpose' t/e in ormation must &e availa&le $/en it is needed. @/is means t/at t/e computing systems used to store and process t/e in ormation' t/e security controls used to protect it' and t/e communication c/annels used to access it must &e unctioning correctly. )ig/ availa&ility systems aim to remain availa&le at all times' preventing service disruptions due to po$er outages' /ard$are ailures' and system upgrades. Ensuring availa&ility also involves preventing denial4o 4service attacks. In 8998' "onn !arker proposed an alternative model or t/e classic CI( triad t/at /e called t/e siD atomic elements o in ormation. @/e elements are con identiality' possession' integrity' aut/enticity' availa&ility' and utility. @/e merits o t/e !arkerian /eDad are a su&ject o de&ate amongst security pro essionals.

,.*.+ Aut1"nt c t)

+9=

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

In computing' e4%usiness and in ormation security it is necessary to ensure t/at t/e data' transactions' communications or documents Belectronic or p/ysicalC are genuine Bi.e. t/ey /ave not &een orged or a&ricated.C

,.*., Non@R"<ud 't on


In la$' non4repudiation implies ones intention to ul ill t/eir o&ligations to a contract. It also implies t/at one party o a transaction can not deny /aving received a transaction nor can t/e ot/er party deny /aving sent a transaction. Electronic commerce uses tec/nology suc/ as digital signatures and encryption to esta&lis/ aut/enticity and non4repudiation.

,.*.8 R !0 M'n'-"#"nt
2ecurity is everyoneKs responsi&ility. 2ecurity a$areness poster. U.2. "epartment o Commerce-O ice o 2ecurity. ( compre/ensive treatment o t/e topic o risk management is &eyond t/e scope o t/is article. We $ill /o$ever' provide a use ul de inition o risk management' outline a commonly used process or risk management' and de ine some &asic terminology. @/e CI2( 7evie$ Manual 899. provides t/e ollo$ing de inition o risk management5 :)isk management is the process of identifying vulnerabilities and threats to the information resources used by an organi3ation in achieving business ob%ectives& and deciding 1hat countermeasures& if any& to take in reducing risk to an acceptable level& based on the value of the information resource to the organi3ation.: @/ere are t$o t/ings in t/is de inition t/at may need some clari ication. ?irst' t/e process o risk management is an ongoing iterative process. It must &e repeated inde initely. @/e &usiness environment is constantly c/anging and ne$ t/reats and vulnera&ilities emerge every day. 2econd' t/e c/oice o countermeasures BcontrolsC used to manage risks must strike a &alance &et$een productivity' cost' e ectiveness o t/e countermeasure' and t/e value o t/e in ormational asset &eing protected. R !0 is t/e likeli/ood t/at somet/ing &ad $ill /appen t/at causes /arm to an in ormational asset Bor t/e loss o t/e assetC. ( &u(n"r'$ ( t) is a $eakness t/at could &e used to endanger or cause /arm to an in ormational asset. ( t1r"'t is anyt/ing Bman made or act o natureC t/at /as t/e potential to cause /arm.

+9,

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

@/e likeli/ood t/at a t/reat $ill use a vulnera&ility to cause /arm creates a risk. W/en a t/reat does use a vulnera&ility to in lict /arm' it /as an impact. In t/e conteDt o in ormation security' t/e impact is a loss o availa&ility' integrity' and con identiality' and possi&ly ot/er losses Blost income' loss o li e' loss o real propertyC. It s/ould &e pointed out t/at it is not possi&le to identi y all risks' nor is it possi&le to eliminate all risk. @/e remaining risk is called residual risk. ( risk assessment is carried out &y a team o people $/o /ave kno$ledge o speci ic areas o t/e &usiness. Mem&ers/ip o t/e team may vary over time as di erent parts o t/e &usiness are assessed. @/e assessment may use a su&jective Du'( t't &" analysis &ased on in ormed opinion' or $/ere relia&le dollar igures and /istorical in ormation is availa&le' t/e analysis may use Du'nt t't &" analysis.

,.*.5 Contro(!
W/en Management c/ooses to mitigate a risk' t/ey $ill do so &y implementing one or more o t/ree di erent types o controls. dministrative (dministrative controls Balso called procedural controlsC consist o approved $ritten policies' procedures' standards and guidelines. (dministrative controls orm t/e rame$ork or running t/e &usiness and managing people. @/ey in orm people on /o$ t/e &usiness is to &e run and /o$ day to day operations are to &e conducted. La$s and regulations created &y government &odies are also a type o administrative control &ecause t/ey in orm t/e &usiness. 2ome industry sectors /ave policies' procedures' standards and guidelines t/at must &e ollo$ed 4 t/e !ayment Card Industry B!CIC "ata 2ecurity 2tandard re*uired &y 0isa and Master Card is suc/ an eDample. Ot/er eDamples o administrative controls include t/e corporate security policy' pass$ord policy' /iring policies' and disciplinary policies. (dministrative controls orm t/e &asis or t/e selection and implementation o logical and p/ysical controls. Logical and p/ysical controls are mani estations o administrative controls. (dministrative controls are o paramount importance. !ogical Logical controls Balso called tec/nical controlsC use so t$are and data to monitor and control access to in ormation and computing systems. ?or eDample5 pass$ords' net$ork and /ost &ased ire$alls' net$ork intrusion detection systems' access control lists' and data encryption are logical controls.
+91

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

(n important logical control t/at is re*uently overlooked is t/e <r nc <(" o/ ("'!t <r & ("-". @/e principle o least privilege re*uires t/at an individual' program or system process is not granted any more access privileges t/an are necessary to per orm t/e task. ( &latant eDample o t/e ailure to ad/ere to t/e principle o least privilege is logging into Windo$s as user (dministrator to read Email and sur t/e We&. 0iolations o t/is principle can also occur $/en an individual collects additional access privileges over time. @/is /appens $/en employeesI jo& duties c/ange' or t/ey are promoted to a ne$ position' or t/ey trans er to anot/er department. @/e access privileges re*uired &y t/eir ne$ duties are re*uently added onto t/eir already eDisting access privileges $/ic/ may no longer &e necessary or appropriate. "hysical !/ysical controls monitor and control t/e environment o t/e $ork place and computing acilities. @/ey also monitor and control access to and rom suc/ acilities. ?or eDample5 doors' locks' /eating and air conditioning' smoke and ire alarms' ire suppression systems' cameras' &arricades' encing' security guards' ca&le locks' etc. 2eparating t/e net$ork and $ork place into unctional areas are also p/ysical controls. (n important p/ysical control t/at is re*uently overlooked is t/e !"<'r't on o/ dut "!. 2eparation o duties ensures t/at an individual can not complete a critical task &y /imsel . ?or eDample5 an employee $/o su&mits a re*uest or reim&ursement s/ould not also &e a&le to aut/oriEe payment or print t/e c/eck. (n applications programmer s/ould not also &e t/e server administrator or t/e data&ase administrator 4 t/ese roles and responsi&ilities must &e separated rom one anot/er.

,.+

D't'$'!" S"cur t)

"ata&ase security is primarily concerned $it/ t/e secrecy o data. 2ecrecy means protecting a data&ase rom unaut/oriEed access &y users and so t$are applications. 2ecrecy' in t/e conteDt o data &ase security' includes a variety o t/reats incurred t/roug/ unaut/oriEed access. @/ese t/reats range rom t/e intentional t/e t or destruction o data to t/e ac*uisition o in ormation t/roug/ more su&tle measures' suc/ as in erence. @/ere are t/ree generally accepted categories o secrecy4related pro&lems in data &ase systems5 *. T1" #<ro<"r r"("'!" o/ n/or#'t on /ro# r"'d n- d't' t1't 2'! nt"nt on'(() or 'cc d"nt'(() 'cc"!!"d $) un'ut1or >"d u!"r!. 2ecuring data &ases rom unaut/oriEed access is more

+9.

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

di icult t/an controlling access to iles managed &y operating systems. @/is pro&lem arises rom t/e iner granularity t/at is used &y data&ases $/en /andling iles' attri&utes' and values. @/is type o pro&lem also includes t/e violations to secrecy t/at result rom t/e pro&lem o in erence' $/ic/ is t/e deduction o unaut/oriEed in ormation rom t/e o&servation o aut/oriEed in ormation. In erence is one o t/e most di icult actors to control in any attempts to secure data. %ecause t/e in ormation in a data&ase is semantically related' it is possi&le to determine t/e value o an attri&ute $it/out accessing it directly. In erence pro&lems are most serious in statistical data&ases $/ere users can trace &ack in ormation on individual entities rom t/e statistical aggregated data. +. T1" I#<ro<"r Mod / c't on o/ D't'. @/is t/reat includes violations o t/e security o data t/roug/ mis/andling and modi ications &y unaut/oriEed users. @/ese violations can result rom errors' viruses' sa&otage' or ailures in t/e data t/at arise rom access &y unaut/oriEed users. D"n '(@O/@S"r& c" T1r"'t!. (ctions t/at could prevent users rom using system resources or accessing data are among t/e most serious. @/is t/reat /as &een demonstrated to a signi icant degree recently $it/ t/e 2AN looding attacks against net$ork service providers.

,.

D !cr"t on'r) &!. M'nd'tor) Acc"!! Contro( Po( c "! %ot/ traditional relational data &ase management system B7"%M2C security models and OO data &ase models make use o t$o general types o access control policies to protect t/e in ormation in multilevel systems. @/e irst o t/ese policies is t/e discretionary policy. In t/e discretionary access control B"(CC policy' access is restricted &ased on t/e aut/oriEations granted to t/e user. @/e mandatory access control BM(CC policy secures in ormation &y assigning sensitivity levels' or la&els' to data entities. M(C policies are generally more secure t/an "(C policies and t/ey are used in systems in $/ic/ security is critical' suc/ as military applications. )o$ever' t/e price t/at is usually paid or t/is tig/tened security is reduced per ormance o t/e data &ase management system. Most M(C policies also incorporate "(C measures as $ell.

,.,

R"('t on'( DBMS S"cur t)

@/e principal met/ods o security in traditional 7"%M2s are t/roug/ t/e appropriate use and manipulation o vie$s and t/e structured *uery

+9;

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

language B2FLC G7(N@ and 7E0O3E statements. @/ese measures are reasona&ly e ective &ecause o t/eir mat/ematical oundation in relational alge&ra and relational calculus.

,.,.* V "2@B'!"d Acc"!! Contro(


0ie$s allo$ t/e data&ase to &e conceptually divided into pieces in $ays t/at allo$ sensitive data to &e /idden rom unaut/oriEed users. In t/e relational model' vie$s provide a po$er ul mec/anism or speci ying data4dependent aut/oriEations or data retrieval. (lt/oug/ t/e individual user $/o creates a vie$ is t/e o$ner and is entitled to drop t/e vie$' /e or s/e may not &e aut/oriEed to eDecute all privileges on it. @/e aut/oriEations t/at t/e o$ner may eDercise depend on t/e vie$ semantics and on t/e aut/oriEations t/at t/e o$ner is allo$ed to implement on t/e ta&les directly accessed &y t/e vie$. ?or t/e o$ner to eDercise a speci ic aut/oriEation on a vie$ t/at /e or s/e creates' t/e o$ner must possess t/e same aut/oriEation on all ta&les t/at t/e vie$ uses. @/e privileges t/e o$ner possesses on t/e vie$ are determined at t/e time o vie$ de inition. Eac/ privilege t/e o$ner possesses on t/e ta&les is de ined or t/e vie$. I ' later on' t/e o$ner receives additional privileges on t/e ta&les used &y t/e vie$' t/ese additional privileges $ill not &e passed onto t/e vie$. In order to use t/e ne$ privileges $it/in a vie$' t/e o$ner $ill need to create a ne$ vie$. @/e &iggest pro&lem $it/ vie$4&ased mandatory access controls is t/at it is impractical to veri y t/at t/e so t$are per orms t/e vie$ interpretation and processing. I t/e correct aut/oriEations are to &e assured' t/e system must contain some type o mec/anism to veri y t/e classi ication o t/e sensitivity o t/e in ormation in t/e data&ase. @/e classi ication must &e done automatically' and t/e so t$are t/at /andles t/e classi ication must &e trusted. )o$ever' any trusted so t$are or t/e automatic classi ication process $ould &e eDtremely compleD. ?urt/ermore' attempting to use a *uery language suc/ as 2FL to speci y classi ications *uickly &ecome convoluted and compleD. Even $/en t/e compleDity o t/e classi ication sc/eme is overcome' t/e vie$ can do not/ing more t/an limit $/at t/e user sees S it cannot restrict t/e operations t/at may &e per ormed on t/e vie$s.

,.8

Pro<o!"d OODBMS S"cur t) Mod"(!

Currently only a e$ models use discretionary access control measures in secure o&ject4oriented data &ase management systems. E7<( c t Aut1or >'t on! @/e O7ION aut/oriEation model permits access to data on t/e &asis o eDplicit aut/oriEations provided to eac/ group o users. @/ese
+9<

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

aut/oriEations are classi ied as positive aut/oriEations &ecause t/ey speci ically allo$ a user access to an o&ject. 2imilarly' a negative aut/oriEation is used to speci ically deny a user access to an o&ject. @/e placement o an individual into one or more groups is &ased on t/e role t/at t/e individual plays in t/e organiEation. In addition to t/e positive aut/oriEations t/at are provided to users $it/in eac/ group' t/ere are a variety o implicit aut/oriEations t/at may &e granted &ased on t/e relations/ips &et$een su&jects and access modes. D't'@C d n- Mod"( ( similar discretionary access control secure model is t/e data4/iding model proposed &y "r. Elisa %ertino o t/e UniversitaK di Genova. @/is model distinguis/es &et$een pu&lic met/ods and private met/ods. @/e data4/iding model is &ased on aut/oriEations or users to eDecute met/ods on o&jects. @/e aut/oriEations speci y $/ic/ met/ods t/e user is aut/oriEed to invoke. (ut/oriEations can only &e granted to users on pu&lic met/ods. )o$ever' t/e act t/at a user can access a met/od does not automatically mean t/at t/e user can eDecute all actions associated $it/ t/e met/od. (s a result' several access controls may need to &e per ormed during t/e eDecution' and all o t/e aut/oriEations or t/e di erent accesses must eDist i t/e user is to complete t/e processing. 2imilar to t/e use o G7(N@ statements in traditional relational data &ase management systems' t/e creator o an o&ject is a&le to grant aut/oriEations to t/e o&ject to di erent users. @/e TcreatorU is also a&le to revoke t/e aut/oriEations rom users in a manner similar to 7E0O3E statements. )o$ever' unlike traditional 7"%M2 G7(N@ statements' t/e data4/iding model includes t/e notion o protection mode. W/en aut/oriEations are provided to users in t/e protection mode' t/e aut/oriEations actually c/ecked &y t/e system are t/ose o t/e creator and not t/e individual eDecuting t/e met/od. (s a result' t/e creator is a&le to grant a user access to a met/od $it/out granting t/e user t/e aut/oriEations or t/e met/ods called &y t/e original met/od. In ot/er $ords' t/e creator can provide a user access to speci ic data $it/out &eing orced to give t/e user complete access to all related in ormation in t/e o&ject.

,.5

S"cur t) C('!! / c't on /or In/or#'t on

(n important aspect o in ormation security and risk management is recogniEing t/e value o in ormation and de ining appropriate

+9:

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

procedures and protection re*uirements or t/e in ormation. Not all in ormation is e*ual and so not all in ormation re*uires t/e same degree o protection. @/is re*uires in ormation to &e assigned a security classi ication. 2ome actors t/at in luence $/ic/ classi ication in ormation s/ould &e assigned include /o$ muc/ value t/at in ormation /as to t/e organiEation' /o$ old t/e in ormation is and $/et/er or not t/e in ormation /as &ecome o&solete. La$s and ot/er regulatory re*uirements are also important considerations $/en classi ying in ormation. Common in ormation security classi ication la&els used &y t/e &usiness sector are5 <u$( cE !"n! t &"E <r &'t"E con/ d"nt '(. Common in ormation security classi ication la&els used &y government are5 Unc('!! / "d' S"n! t &" But Unc('!! / "d' R"!tr ct"d' Con/ d"nt '(' S"cr"t' To< S"cr"t and t/eir non4Englis/ e*uivalents. (ll employees in t/e organiEation' as $ell as &usiness partners' must &e trained on t/e classi ication sc/ema and understand t/e re*uired security controls and /andling procedures or eac/ classi ication. @/e classi ication a particular in ormation asset /as &een assigned s/ould &e revie$ed periodically to ensure t/e classi ication is still appropriate or t/e in ormation and to ensure t/e security controls re*uired &y t/e classi ication are in place. Acc"!! contro(:(ccess to protected in ormation must &e restricted to people $/o are aut/oriEed to access t/e in ormation. @/e computer programs' and in many cases t/e computers t/at process t/e in ormation' must also &e aut/oriEed. @/is re*uires t/at mec/anisms &e in place to control t/e access to protected in ormation. @/e sop/istication o t/e access control mec/anisms s/ould &e in parity $it/ t/e value o t/e in ormation &eing protected 4 t/e more sensitive or valua&le t/e in ormation t/e stronger t/e control mec/anisms need to &e. @/e oundation on $/ic/ access control mec/anisms are &uilt start $it/ identi ication and aut/entication. Id"nt / c't on is an assertion o $/o someone is or $/at somet/ing is. I a person makes t/e statement :9ello& my name is ;ohn Doe.: t/ey are making a claim o $/o t/ey are. )o$ever' t/eir claim may or may not &e true. %e ore #o/n "oe can &e granted access to protected in ormation it $ill &e necessary to veri y t/at t/e person claiming to &e #o/n "oe really is #o/n "oe. Aut1"nt c't on is t/e act o veri ying a claim o identity. W/en #o/n "oe goes into a &ank to make a $it/dra$al' /e tells t/e &ank teller /e is #o/n "oe Ba claim o identityC. @/e &ank teller asks to see a p/oto I"' so
++9

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

/e /ands t/e teller /is driversK license. @/e &ank teller c/ecks t/e license to make sure it /as #o/n "oe printed on it and compares t/e p/otograp/ on t/e license against t/e person claiming to &e #o/n "oe. I t/e p/oto and name matc/ t/e person' t/en t/e teller /as aut/enticated t/at #o/n "oe is $/o /e claimed to &e. On computer systems in use today' t/e Username is t/e most common orm o identi ication and t/e !ass$ord is t/e most common orm o aut/entication. Usernames and pass$ords /ave served t/eir purpose &ut in our modern $orld t/ey are no longer ade*uate. Usernames and pass$ords are slo$ly &eing replaced $it/ more sop/isticated aut/entication mec/anisms. ( ter a person' program or computer /as success ully &een identi ied and aut/enticated t/en it must &e determined $/at in ormational resources t/ey are permitted to access and $/at actions t/ey $ill &e allo$ed to per orm Brun' vie$' create' delete' or c/angeC. @/is is called 'ut1or >'t on. (ut/oriEation to access in ormation and ot/er computing services &egins $it/ administrative policies and procedures. @/e polices prescri&e $/at in ormation and computing services can &e accessed' &y $/om' and under $/at conditions. @/e access control mec/anisms are t/en con igured to en orce t/ese policies. "i erent computing systems are e*uipped $it/ di erent kinds o access control mec/anisms' some may o er a c/oice o di erent access control mec/anisms. @/e access control mec/anism a system o ers $ill &e &ased upon one o t/ree approac/es to access control or it may &e derived rom a com&ination o t/e t/ree approac/es. @/e non4discretionary approac/ consolidates all access control under a centraliEed administration. @/e access to in ormation and ot/er resources is usually &ased on t/e individuals unction BroleC in t/e organiEation or t/e tasks t/e individual must per orm. @/e discretionary approac/ gives t/e creator or o$ner o t/e in ormation resource t/e a&ility to control access to t/ose resources. In t/e Mandatory access control approac/' access is granted or denied &ases upon t/e security classi ication assigned to t/e in ormation resource.

+++

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

,.?

Cr)<to-r'<1)

In ormation security uses cryptograp/y to trans orm usa&le in ormation into a orm t/at renders it unusa&le &y anyone ot/er t/an an aut/oriEed userJ t/is process is called encryption. In ormation t/at /as &een encrypted Brendered unusa&leC can &e trans ormed &ack into its original usa&le orm &y an aut/oriEed user' $/o possesses t/e cryptograp/ic key' t/roug/ t/e process o decryption. Cryptograp/y is used in in ormation security to protect in ormation rom unaut/oriEed or accidental discloser $/ile t/e in ormation is in transit Beit/er electronically or p/ysicallyC and $/ile in ormation is in storage. Cryptograp/y provides in ormation security $it/ ot/er use ul applications as $ell including improved aut/entication met/ods' message digests' digital signatures' non4repudiation' and encrypted net$ork communications. Cryptograp/y can introduce security pro&lems $/en it is not implemented correctly. Cryptograp/ic solutions need to &e implemented using industry accepted solutions t/at /ave undergone rigorous peer revie$ &y independent eDperts in cryptograp/y. @/e lengt/ and strengt/ o t/e encryption key is also an important consideration. ( key t/at is $eak or too s/ort $ill produce $eak encryption. @/e keys used or encryption and decryption must &e protected $it/ t/e same degree o rigor as any ot/er con idential in ormation. @/ey must &e protected rom unaut/oriEed disclosure and destruction and t/ey must &e availa&le $/en needed. Proc"!! @/e terms r"'!on'$(" 'nd <rud"nt <"r!on' du" c'r" and du" d ( -"nc" /ave &een used in t/e ields o ?inance' 2ecurities' and La$ or many years. In recent years t/ese terms /ave ound t/eir $ay into t/e ields o computing and in ormation security. U.2.(. ?ederal 2entencing Guidelines no$ make it possi&le to /old corporate o icers lia&le or ailing to eDercise due care and due diligence in t/e management o t/eir in ormation systems. In t/e &usiness $orld' stock/olders' customers' &usiness partners and governments /ave t/e eDpectation t/at corporate o icers $ill run t/e &usiness in accordance $it/ accepted &usiness practices and in compliance $it/ la$s and ot/er regulatory re*uirements. @/is is o ten descri&ed as t/e Mreasona&le and prudent personM rule.

++8

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

,.7

D !'!t"r R"co&"r) P('nn nW/at is "isaster 7ecovery !lanning "isaster 7ecovery !lanning is all a&out continuing an I@ service.

Aou need 8 or more sites' one o t/em is primary' $/ic/ is planned to &e recovered. @/e alternate site may &e online...meaning production data is simultaneously trans erred to &ot/ sites Bsometime called as )O@ 2itesC' may &e o line...meaning data is tran erred a ter a certain delay t/roug/ ot/er means' Bsometimes called as a W(7M siteC or even may not &e trans erred at all' &ut may /ave a replica I@ system o t/e original site' $/ic/ $ill &e started $/enever t/e primary site aces a disaster Bsometimes called a COL" siteC. )o$ are "7! and %C! di erent @/oug/ "7! is part o t/e %C! process' "7! ocusses on I@ systems recovery and %C! on t/e entire &usiness. )o$ are "7! and %C! related

"7! is one o t/e recovery activities during eDecution o a %usiness Continuity !lan.

8.:

CONCLUSION

"ata and in ormation systems security is t/e ongoing process o eDercising due care and due diligence to protect in ormation' and in ormation systems' rom unaut/oriEed access' use' disclosure' destruction' modi ication' or disruption or distri&ution. T1" n"&"r "nd n- <roc"!! o in ormation security involves ongoing training' assessment' protection' monitoring L detection' incident response L repair' documentation' and revie$.

5.:

SUMMARY

@/is unit can &e summariEed as ollo$s5 D't' !"cur t) is t/e means o ensuring t/at data is kept sa e rom corruption and t/at access to it is suita&ly controlled

In/or#'t on S"cur t) means protecting in ormation and in ormation systems rom unaut/oriEed access' use' disclosure' disruption' modi ication' or destruction. @/e terms in ormation security' computer security and in ormation assurance are re*uently used interc/angea&ly.

++=

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

?or over t$enty years in ormation security /as /eld t/at con identiality' integrity and availa&ility Bkno$n as t/e CI( @riadC are t/e core principles o in ormation system security. @/e principal met/ods o security in traditional 7"%M2s are t/roug/ t/e appropriate use and manipulation o vie$s and t/e structured *uery language B2FLC G7(N@ and 7E0O3E statements. Aut1"nt c't on is t/e act o veri ying a claim o identity. Currently only a e$ models use discretionary access control measures in secure o&ject4oriented data &ase management systems. (n important aspect o in ormation security and risk management is recogniEing t/e value o in ormation and de ining appropriate procedures and protection re*uirements or t/e in ormation. In ormation security uses cryptograp/y to trans orm usa&le in ormation into a orm t/at renders it unusa&le &y anyone ot/er t/an an aut/oriEed userJ t/is process is called encryption. "isaster 7ecovery !lanning is all a&out continuing an I@ service. Aou need 8 or more sites' one o t/em is primary' $/ic/ is planned to &e recovered.

?.:
+. 8.

TUTOR@MARAED ASSIGNMENT
List "onn !arkerKs . atomic elements o CI( @riad o in ormation security. %rie ly discuss "isaster 7ecovery !lanning in t/e security o "%M2.

7.:

REFERENCESBFURTCER READINGS

,, U.2.C h =1,8 B&C B+C B899.C %lack$ell Encyclopedia o Management In ormation 2ystem' 0ol. III' Edited &y Gordon %. "avis. )arris' 2/on B899=C. All#in#one "I00( "ertification +xam uide' 8nd Ed.' -dmirror-/ttp-en.$ikipedia.org-$-Emeryville' C(5 McGra$4 )ill-Os&orne. I2(C( B899.C. "I0A )evie1 Manual /@@A. In ormation 2ystems (udit and Control (ssociation' p. <1. I2%N +4:==8<,4+14=. Fuist' (rvin 2. B8998C. M0ecurity "lassification of Information B)@MLC. 0olume +. Introduction' )istory' and (dverse Impacts. Oak 7idge Classi ication (ssociates' LLC. 7etrieved on 899;49+4++.

++,

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

UNIT 8
CONTENTS +.9 8.9 =.9

DATABASE ADMINISTRATOR AND ADMINISTRATION

,.9 1.9 ..9 ;.9

Introduction O&jectives Main Content =.+ "uties o "ata&ase (dministrator =.8 @ypical Work (ctivities =.= "ata&ase (dministrations and (utomation =.=.+ @ypes o "ata&ase (dministration =.=.8 Nature o "ata&ase (dministration =.=.= "ata&ase (dministration @ools =.=., @/e Impact o I@ (utomation on "ata&ase (dministration =.=.1 Learning "ata&ase (dministration Conclusion 2ummary @utor4Marked (ssignment 7e erences-?urt/er 7eadings

*.:

INTRODUCTION

( d't'$'!" 'd# n !tr'tor BDBAC is a person $/o is responsi&le or t/e environmental aspects o a data&ase. In general' t/ese include5 7ecovera&ility 4 Creating and testing %ackups Integrity 4 0eri ying or /elping to veri y data integrity 2ecurity 4 "e ining and-or implementing access controls to t/e data (vaila&ility 4 Ensuring maDimum uptime !er ormance 4 Ensuring maDimum per ormance "evelopment and testing support 4 )elping programmers and engineers to e iciently utiliEe t/e data&ase. @/e role o a data&ase administrator /as c/anged according to t/e tec/nology o data&ase management systems B"%M2sC as $ell as t/e needs o t/e o$ners o t/e data&ases. ?or eDample' alt/oug/ logical and p/ysical data&ase designs are traditionally t/e duties o a d't'$'!" 'n'()!t or d't'$'!" d"! -n"r' a "%( may &e tasked to per orm t/ose duties.

++1

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

+.:

OB;ECTIVES

(t t/e end o t/is unit' you s/ould &e a&le to5 ans$er t/e *uestion o $/o is a data&ase administrator identi y t/e various unctions o data&ase administrator kno$ t/e di erent types o data&ase administration understand t/e nature o data&ase administration kno$ t/e tools used in data&ase administration.

,.: ,.*

MAIN CONTENT Dut "! o/ D't'$'!" Ad# n !tr'tor

@/e duties o a data&ase administrator vary and depend on t/e jo& description' corporate and In ormation @ec/nology BI@C policies and t/e tec/nical eatures and capa&ilities o t/e "%M2 &eing administered. @/ey nearly al$ays include disaster recovery B&ackups and testing o &ackupsC' per ormance analysis and tuning' data dictionary maintenance' and some data&ase design. 2ome o t/e roles o t/e "%( may include5 Installation o ne$ so t$are S It is primarily t/e jo& o t/e "%( to install ne$ versions o "%M2 so t$are' application so t$are' and ot/er so t$are related to "%M2 administration. It is important t/at t/e "%( or ot/er I2 sta mem&ers test t/is ne$ so t$are &e ore it is moved into a production environment. Con iguration o /ard$are and so t$are $it/ t/e system administrator S In many cases t/e system so t$are can only &e accessed &y t/e system administrator. In t/is case' t/e "%( must $ork closely $it/ t/e system administrator to per orm so t$are installations' and to con igure /ard$are and so t$are so t/at it unctions optimally $it/ t/e "%M2. 2ecurity administration S One o t/e main duties o t/e "%( is to monitor and administer "%M2 security. @/is involves adding and removing users' administering *uotas' auditing' and c/ecking or security pro&lems. "ata analysis S @/e "%( $ill re*uently &e called on to analyEe t/e data stored in t/e data&ase and to make recommendations relating to per ormance and e iciency o t/at data storage. @/is mig/t relate to t/e more e ective use o indeDes' ena&ling M!arallel FueryM eDecution' or ot/er "%M2 speci ic eatures. "ata&ase design BpreliminaryC S @/e "%( is o ten involved at t/e preliminary data&ase4design stages. @/roug/ t/e involvement o t/e "%(' many pro&lems t/at mig/t occur can &e eliminated. @/e "%(

++.

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

kno$s t/e "%M2 and system' can point out potential pro&lems' and can /elp t/e development team $it/ special per ormance considerations. "ata modeling and optimiEation S &y modeling t/e data' it is possi&le to optimiEe t/e system layout to take t/e most advantage o t/e I-O su&system. 7esponsi&le or t/e administration o eDisting enterprise data&ases and t/e analysis' design' and creation o ne$ data&ases. 4 4 4 4 "ata modeling' data&ase optimiEation' understanding and implementation o sc/emas' and t/e a&ility to interpret and $rite compleD 2FL *ueries !roactively monitor systems or optimum per ormance and capacity constraints Esta&lis/ standards and &est practices or 2FL Interact $it/ and coac/ developers in 2FL scripting

R"co&"r'$ ( t) 7ecovera&ility means t/at' i a data entry error' program &ug or /ard$are ailure occurs' t/e "%( can &ring t/e data&ase &ack$ard in time to its state at an instant o logical consistency &e ore t/e damage $as done. 7ecovera&ility activities include making data&ase &ackups and storing t/em in $ays t/at minimiEe t/e risk t/at t/ey $ill &e damaged or lost' suc/ as placing multiple copies on remova&le media and storing t/em outside t/e a ected area o an anticipated disaster. 7ecovera&ility is t/e "%(Ks most important concern. @/e &ackup o t/e data&ase consists o data $it/ timestamps com&ined $it/ data&ase logs to c/ange t/e data to &e consistent to a particular moment in time. It is possi&le to make a &ackup o t/e data&ase containing only data $it/out timestamps or logs' &ut t/e "%( must take t/e data&ase o line to do suc/ a &ackup. @/e recovery tests o t/e data&ase consist o restoring t/e data' t/en applying logs against t/at data to &ring t/e data&ase &ackup to consistency at a particular point in time up to t/e last transaction in t/e logs. (lternatively' an o line data&ase &ackup can &e restored simply &y placing t/e data in4place on anot/er copy o t/e data&ase. I a "%( Bor any administratorC attempts to implement a recovera&ility plan $it/out t/e recovery tests' t/ere is no guarantee t/at t/e &ackups are at all valid. In practice' in all &ut t/e most mature 7"%M2 packages' &ackups rarely are valid $it/out eDtensive testing to &e sure t/at no &ugs or /uman error /ave corrupted t/e &ackups. S"cur t)

++;

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

2ecurity means t/at usersK a&ility to access and c/ange data con orms to t/e policies o t/e &usiness and t/e delegation decisions o its managers. Like ot/er metadata' a relational "%M2 manages security in ormation in t/e orm o ta&les. @/ese ta&les are t/e Tkeys to t/e kingdomU and so it is important to protect t/em rom intruders. so t/at is $/y t/e security is more and more important or t/e data&ases. P"r/or#'nc" !er ormance means t/at t/e data&ase does not cause unreasona&le online response times' and it does not cause unattended programs to run or an un$orka&le period o time. In compleD client-server and t/ree4tier systems' t/e data&ase is just one o many elements t/at determine t/e per ormance t/at online users and unattended programs eDperience. !er ormance is a major motivation or t/e "%( to &ecome a generalist and coordinate $it/ specialists in ot/er parts o t/e system outside o traditional &ureaucratic reporting lines. @ec/ni*ues or data&ase per ormance tuning /ave c/anged as "%(Is /ave &ecome more sop/isticated in t/eir understanding o $/at causes per ormance pro&lems and t/eir a&ility to diagnose t/e pro&lem. In t/e +::9s' "%(s o ten ocused on t/e data&ase as a $/ole' and looked at data&ase4$ide statistics or clues t/at mig/t /elp t/em ind out $/y t/e system $as slo$. (lso' t/e actions "%(s took in t/eir attempts to solve per ormance pro&lems $ere o ten at t/e glo&al' data&ase level' suc/ as c/anging t/e amount o computer memory availa&le to t/e data&ase' or c/anging t/e amount o memory availa&le to any data&ase program t/at needed to sort data. "%(Is no$ understand t/at per ormance pro&lems initially must &e diagnosed' and t/is is &est done &y eDamining individual 2FL statements' ta&le process' and system arc/itecture' not t/e data&ase as a $/ole. 0arious tools' some included $it/ t/e data&ase and some availa&le rom t/ird parties' provide a &e/ind t/e scenes look at /o$ t/e data&ase is /andling t/e 2FL statements' s/edding lig/t on $/atIs taking so long. )aving identi ied t/e pro&lem' t/e individual 2FL statement can &e D"&"(o<#"ntBT"!t n- Su<<ort "evelopment and testing support is typically $/at t/e data&ase administrator regards as /is or /er least important duty' $/ile results4 oriented managers consider it t/e "%(Ks most important duty. 2upport activities include collecting sample production data or testing ne$ and

++<

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

c/anged programs and loading it into test data&asesJ consulting $it/ programmers a&out per ormance tuningJ and making ta&le design c/anges to provide ne$ kinds o storage or ne$ program unctions. )ere are some I@ roles t/at are related to t/e role o data&ase administrator5 (pplication programmer or so t$are engineer 2ystem administrator "ata administrator "ata arc/itect

,.+

T)< c'( 3or0 Act & t "!

@/e $ork o data&ase administrator B"%(C varies according to t/e nature o t/e employing organiEation and level o responsi&ility associated $it/ post. @/e $ork may &e pure maintenance or it may also involve specialiEing in data&ase development. @ypical responsi&ility includes some or all o t/e ollo$ing5 esta&lis/ing t/e needs o t/e users and monitoring users access and security monitoring per ormance and managing parameters to provide ast *uery responses to P ront endK users mapping out t/e conceptual design or a planned data&ase in outline considering &ot/ &ack end organiEation o data and ront end accessi&ility or t/e end user re ining t/e logical design so t/at it can translated into speci ic data model urt/er re ining t/e p/ysical design to meet systems storage re*uirements installing and testing ne$ versions o t/e data&ase management system maintaining data standards including ad/erence to t/e "ata !rotection (ct $riting data&ase documentation' including data standards' procedures and de initions or t/e data dictionary BmetadataC controlling access permissions and privileges developing' managing and testing &ackup recovery plans ensuring t/at storage ' arc/iving' and &ackup procedures are unctioning properly capacity planning $orking closely $it/ I@ project manager' data&ase programmers' and $e& developers

++:

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

communicating regularly $it/ tec/nical applications and operational sta to ensure data&ase integrity and security commissioning and installing ne$ applications %ecause o t/e increasing level o /acking and t/e sensitive nature o data stored' security and recovera&ility or disaster recovery /as &ecome increasingly important aspects o t/e $ork.

,.,

D't'$'!" Ad# n !tr't on! 'nd Auto#'t on

D't'$'!" Ad# n !tr't on is t/e unction o managing and maintaining data&ase management systems B"%M2C so t$are. Mainstream "%M2 so t$are suc/ as Oracle' I%M "%8 and Microso t 2FL 2erver need ongoing management. (s suc/' corporations t/at use "%M2 so t$are o ten /ire specialiEed I@ BIn ormation @ec/nologyC personnel called "ata&ase (dministrators or "%(s.

,.,.* T)<"! o/ D't'$'!" Ad# n !tr't on


@/ere are t/ree types o "%(s5 +. 8. =. 2ystems "%(s Bsometimes also re erred to as !/ysical "%(s' Operations "%(s or !roduction 2upport "%(sC "evelopment "%(s (pplication "%(s

"epending on t/e "%( type' t/eir unctions usually vary. %elo$ is a &rie description o $/at di erent types o "%(s do5 2ystems "%(s usually ocus on t/e p/ysical aspects o data&ase administration suc/ as "%M2 installation' con iguration' patc/ing' upgrades' &ackups' restores' re res/es' per ormance optimiEation' maintenance and disaster recovery. "evelopment "%(s usually ocus on t/e logical and development aspects o data&ase administration suc/ as data model design and maintenance' ""L Bdata de inition languageC generation' 2FL $riting and tuning' coding stored procedures' colla&orating $it/ developers to /elp c/oose t/e most appropriate "%M2 eature- unctionality and ot/er pre4production activities. (pplication "%(s are usually ound in organiEations t/at /ave purc/ased =rd party application so t$are suc/ as E7! Benterprise resource planningC and C7M Bcustomer relations/ip managementC systems. EDamples o suc/ application so t$are include Oracle (pplications' 2ie&el and !eople2o t B&ot/ no$ part o Oracle Corp.C and 2(!. (pplication "%(s straddle t/e ence &et$een t/e "%M2 and t/e application so t$are and are

+89

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

responsi&le or ensuring t/at t/e application is ully optimiEed or t/e data&ase and vice versa. @/ey usually manage all t/e application components t/at interact $it/ t/e data&ase and carry out activities suc/ as application installation and patc/ing' application upgrades' data&ase cloning' &uilding and running data cleanup routines' data load process management' etc. W/ile individuals usually specialiEe in one type o data&ase administration' in smaller organiEations' it is not uncommon to ind a single individual or group per orming more t/an one type o data&ase administration.

,.,.+ N'tur" o/ D't'$'!" Ad# n !tr't on


@/e degree to $/ic/ t/e administration o a data&ase is automated dictates t/e skills and personnel re*uired to manage data&ases. On one end o t/e spectrum' a system $it/ minimal automation $ill re*uire signi icant eDperienced resources to manageJ per/aps 14+9 data&ases per "%(. (lternatively an organiEation mig/t c/oose to automate a signi icant amount o t/e $ork t/at could &e done manually t/ere ore reducing t/e skills re*uired to per orm tasks. (s automation increases' t/e personnel needs o t/e organiEation splits into /ig/ly skilled $orkers to create and manage t/e automation and a group o lo$er skilled MlineM "%(s $/o simply eDecute t/e automation. "ata&ase administration $ork is compleD' repetitive' time4consuming and re*uires signi icant training. 2ince data&ases /old valua&le and mission4critical data' companies usually look or candidates $it/ multiple years o eDperience. "ata&ase administration o ten re*uires "%(s to put in $ork during o 4/ours B or eDample' or planned a ter /ours do$ntime' in t/e event o a data&ase4related outage or i per ormance /as &een severely degradedC. "%(s are commonly $ell compensated or t/e long /ours.

,.,., D't'$'!" Ad# n !tr't on Too(!


O ten' t/e "%M2 so t$are comes $it/ certain tools to /elp "%(s manage t/e "%M2. 2uc/ tools are called native tools. ?or eDample' Microso t 2FL 2erver comes $it/ 2FL 2erver Enterprise Manager and Oracle /as tools suc/ as 2FL`!lus and Oracle Enterprise Manager-Grid Control. In addition' =rd parties suc/ as %MC' Fuest 2o t$are' Em&arcadero and 2FL Maestro Group o er GUI tools to monitor t/e "%M2 and /elp "%(s carry out certain unctions inside t/e data&ase more easily. (not/er kind o data&ase so t$are eDists to manage t/e provisioning o ne$ data&ases and t/e management o eDisting data&ases and t/eir

+8+

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

related resources. @/e process o creating a ne$ data&ase can consist o /undreds or t/ousands o uni*ue steps rom satis ying prere*uisites to con iguring &ackups $/ere eac/ step must &e success ul &e ore t/e neDt can start. ( /uman cannot &e eDpected to complete t/is procedure in t/e same eDact $ay time a ter time 4 eDactly t/e goal $/en multiple data&ases eDist. (s t/e num&er o "%(s gro$s' $it/out automation t/e num&er o uni*ue con igurations re*uently gro$s to &e costly-di icult to support. (ll o t/ese complicated procedures can &e modeled &y t/e &est "%(s into data&ase automation so t$are and eDecuted &y t/e standard "%(s. 2o t$are /as &een created speci ically to improve t/e relia&ility and repeata&ility o t/ese procedures suc/ as 2trataviaIs "ata !alette and Grid(pp 2ystems Clarity.

,.,.8 T1" I#<'ct o/ IT Auto#'t on on D't'$'!" Ad# n !tr't on


7ecently' automation /as &egun to impact t/is area signi icantly. Ne$er tec/nologies suc/ as )!-Ops$areIs 2(2 B2erver (utomation 2ystemC and 2trataviaIs "ata !alette suite /ave &egun to increase t/e automation o servers and data&ases respectively causing t/e reduction o data&ase related tasks. )o$ever at &est t/is only reduces t/e amount o mundane' repetitive activities and does not eliminate t/e need or "%(s. @/e intention o "%( automation is to ena&le "%(s to ocus on more proactive activities around data&ase arc/itecture and deployment.

,.,.5 L"'rn n- D't'$'!" Ad# n !tr't on


@/ere are several education institutes t/at o er pro essional courses' including late4nig/t programs' to allo$ candidates to learn data&ase administration. (lso' "%M2 vendors suc/ as Oracle' Microso t and I%M o er certi ication programs to /elp companies to /ire *uali ied "%( practitioners.

8.:

CONCLUSION

"ata&ase management system B"%M2C is so important in an organiEation t/at a special manager is o ten appointed to oversee its activities. @/e data&ase administrator is responsi&le or t/e installation and coordination o "%M2. @/ey are responsi&le or managing one o t/e most valua&le resources o any organiEation' its data. @/e data&ase administrator must /ave a sound kno$ledge o t/e structure o t/e data&ase and o t/e "%M2. @/e "%( must &e t/oroug/ly conversant $it/ t/e organiEation' itKs system and t/e in ormation need o managers.

5.:

SUMMARY

+88

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

( D't'$'!" 'd# n !tr'tor BDBAC is a person $/o is responsi&le or t/e environmental aspects o a data&ase @/e duties o a data&ase administrator vary and depend on t/e jo& description' corporate and In ormation @ec/nology BI@C policies and t/e tec/nical eatures and capa&ilities o t/e "%M2 &eing administered. @/ey nearly al$ays include disaster recovery B&ackups and testing o &ackupsC' per ormance analysis and tuning' data dictionary maintenance' and some data&ase design. @ec/ni*ues or data&ase per ormance tuning /ave c/anged as "%(Is /ave &ecome more sop/isticated in t/eir understanding o $/at causes per ormance pro&lems and t/eir a&ility to diagnose t/e pro&lem @/e $ork o data&ase administrator B"%(C varies according to t/e nature o t/e employing organiEation and level o responsi&ility associated $it/ post. D't'$'!" Ad# n !tr't on is t/e unction o managing and maintaining data&ase management systems B"%M2C so t$are. @/e degree to $/ic/ t/e administration o a data&ase is automated dictates t/e skills and personnel re*uired to manage data&ases

?.:

TUTOR@MARAED ASSIGNMENT

+. Mention 1 roles o data&ase administrator 8. Mention t/e types o data&ase administrations

7.:

REFERENCESBFURTCER READINGS

(ssociation or Computing Mac/inery 2IGI7 ?orum arc/ive 0olume ;' Issue ,. @/e Origins o t/e "ata %ase Concept' Early "%M2 2ystems including "2 and IM2' t/e "ata %ase @ask Group' and t/e )ierarc/ical' Net$ork and 7elational "ata Models are discussed in @/omas )aig/' MI( 0erita&le %ucket o ?acts5I Origins o t/e "ata %ase Management 2ystem'M (CM 2IGMO" 7ecord =158 B#une 899.C. )o$ "ata&ase 2ystems 2/are 2torage.

+8=

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

MODULE ,
Unit + Unit 8 Unit = 7elational "ata&ase Management 2ystems "ata Ware/ouse "ocument Management 2ystem

UNIT *
CONTENTS +.9 8.9 =.9

RELATIONAL SYSTEMS

DATABASE

MANAGEMENT

,.9 1.9 ..9 ;.9

Introduction O&jectives Main Content =.+ C !tor) o/ t1" T"r# ,.+ M'r0"t Structur" =.= ?eatures and 7esponsi&ilities o an 7"%M2 =., Comparison o 7elational "ata&ase Management 2ystems =.,.+ G"n"r'( In/or#'t on ,.8.+ O<"r't n- S)!t"# Su<<ort ,.8., Fund'#"nt'( F"'tur"! Conclusion 2ummary @utor4Marked (ssignment 7e erences-?urt/er 7eadings

*.:

INTRODUCTION

( 7elational data&ase management system B7"%M2C is a data&ase management system B"%M2C t/at is &ased on t/e relational model as introduced &y E. ?. Codd. Most popular commercial and open source data&ases currently in use are &ased on t/e relational model. ( s/ort de inition o an 7"%M2 may &e a "%M2 in $/ic/ data is stored in t/e orm o ta&les and t/e relations/ip among t/e data is also stored in t/e orm o ta&les.

+.:

OB;ECTIVES

(t t/e end o t/is unit' you s/ould &e a&le to5 de ine relational data&ase management system trace t/e origin and development o 7"%M2 identi y t/e market structure o 7"%M2 identi y t/e major types o relational management systems compare and contrast t/e types o 7"%M2 &ased on several criteria

+8,

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

,.: ,.*

MAIN CONTENT C !tor) o/ t1" T"r#

E. ?. Codd introduced t/e term in /is seminal paper M( 7elational Model o "ata or Large 2/ared "ata %anksM' pu&lis/ed in +:;9. In t/is paper and later papers /e de ined $/at /e meant &y r"('t on'(. One $ell4kno$n de inition o $/at constitutes a relational data&ase system is CoddIs +8 rules. )o$ever' many o t/e early implementations o t/e relational model did not con orm to all o CoddIs rules' so t/e term gradually came to descri&e a &roader class o data&ase systems. (t a minimum' t/ese systems5

presented t/e data to t/e user as relations Ba presentation in ta&ular orm' i.e. as a co(("ct on o ta&les $it/ eac/ ta&le consisting o a set o ro$s and columns' can satis y t/is propertyC provided relational operators to manipulate t/e data in ta&ular orm

@/e irst systems t/at $ere relatively ait/ ul implementations o t/e relational model $ere rom t/e University o Mic/iganJ Micro "%M2 B+:.:C and rom I%M U3 2cienti ic Centre at !eterleeJ I2+ B+:;9O;8C and its ollo$on !7@0 B+:;=O;:C. @/e irst system sold as an 7"%M2 $as Multics 7elational "ata 2tore' irst sold in +:;<. Ot/ers /ave &een %erkeley Ingres FUEL and I%M %2+8. @/e most popular de inition o an 7"%M2 is a product t/at presents a vie$ o data as a collection o ro$s and columns' even i it is not &ased strictly upon relational t/eory. %y t/is de inition' 7"%M2 products typically implement some &ut not all o CoddIs +8 rules. ( second' t/eory4&ased sc/ool o t/oug/t argues t/at i a data&ase does not implement all o CoddIs rules Bor t/e current understanding on t/e relational model' as eDpressed &y C/ristop/er # "ate' )ug/ "ar$en and ot/ersC' it is not relational. @/is vie$' s/ared &y many t/eorists and ot/er strict ad/erents to CoddIs principles' $ould dis*uali y most "%M2s as not relational. ?or clari ication' t/ey o ten re er to some 7"%M2s as Truly#)elational Database Management 0ystems B@7"%M2C' naming ot/ers (seudo#)elational Database Management 0ystems B!7"%M2C. (lmost all commercial relational "%M2s employ 2FL as t/eir *uery language. (lternative *uery languages /ave &een proposed and implemented' &ut very e$ /ave &ecome commercial products.

+81

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

,.+

M'r0"t Structur"

Given &elo$ is a list o top RDBMS &"ndor! n +::? $it/ igures in millions o United 2tates "ollars pu&lis/ed in an I"C study. V"ndor Oracle I%M G(o$'( R"&"nu" ;':+8 =',<=

Microso t ='918 2y&ase 18,9

@eradata ,1; Ot/ers Tot'( +'.8, *?E85+

Lo$ adoption costs associated $it/ open4source 7"%M2 products suc/ as My2FL and !ostgre2FL /ave &egun in luencing vendor pricing and licensing strategiesZ.

,.,

F"'tur"! 'nd R"!<on! $ ( t "! o/ 'n RDBMS

(s mentioned earlier' an 7"%M2 is so t$are t/at is used or creating and maintaining a data&ase. Maintaining involves several tasks t/at an 7"%M2 takes care o . @/ese tasks are as ollo$5 Contro( D't' R"dund'nc) 2ince data in an 7"%M2 is spread across several ta&les' repetition or redundancy is reduced. 7edundant data can &e eDtracted and stored in anot/er ta&le' along $it/ a ield t/at is common to &ot/ t/e ta&les. "ata can t/en &e eDtracted rom t/e t$o ta&les &y using t/e common ield. D't' A$!tr'ct on @/is $ould imply t/at t/e 7"%M2 /ides t/e actual $ay' in $/ic/ data is stored' $/ile providing t/e user $it/ a conceptual representation o t/e data. Su<<ort /or Mu(t <(" U!"r! ( true 7"%M2 allo$s e ective s/aring o data. @/at is' it ensures t/at several users can concurrently access t/e data in t/e data&ase $it/out a ecting t/e speed o t/e data access.
+8.

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

In a data&ase application' $/ic/ can &e used &y several users concurrently' t/ere is t/e possi&ility t/at t$o users may try to modi y a particular record at t/e same time. @/is could lead to one personKs c/anges &eing made $/ile t/e ot/ers are over$ritten. @o avoid suc/ con usion' most 7"%M2s provide a record4locking mec/anism. @/is mec/anism ensures t/at no t$o users could modi y a particular record at t/e same time. ( record is as it $ere TlockedU $/ile one user makes c/anges to it. (not/er user is t/ere ore not allo$ed to modi y it till t/e c/anges are complete and t/e record is saved. @/e TlockU is t/en released' and t/e record availa&le or editing again. Mu(t <(" 3')! o/ Int"r/"r n- to t1" S)!t"# @/is $ould re*uire t/e data&ase to &e a&le to &e accessi&le t/roug/ di erent *uery languages as $ell as programming languages. It $ould also mean t/at a variety o ront4end tools s/ould &e a&le to use t/e data&ase as a &ack4end. ?or eDample data stored in Microso t (ccess can &e displayed and manipulated using orms created in so t$are suc/ as 0isual %asic or ?ront !age 8999. R"!tr ct n- Un'ut1or >"d Acc"!! (n 7"%M2 provides a security mec/anism t/at ensures t/at data in t/e data&ase is protected rom unaut/oriEed access and malicious use. @/e security t/at is implemented in most 7"%M2s is re erred to as PUser4 level securityK' $/erein t/e various users o t/e data&ase are assigned usernames and pass$ords.' only $/en t/e user enters t/e correct username and pass$ord is /e a&le to access t/e data in t/e data&ase. In addition to t/is' a particular user could &e restricted to only vie$ t/e data' $/ile anot/er could /ave t/e rig/ts to modi y t/e data. ( t/ird user could /ave rig/t s to c/ange t/e structure o some ta&le itsel ' in addition to t/e rig/ts t/at t/e ot/er t$o /ave. W/en security is implemented properly' data is secure and cannot &e tampered $it/. En/orc n- Int"-r t) Con!tr' nt! 7"%M2 provide a set o rules t/at ensure t/at data entered into a ta&le is valid. @/ese rules must remain true or a data&ase to preserve integrity. PIntegrity constraintsK are speci ied at t/e time o creating t/e data&ase' and are en orced &y t/e 7"%M2. ?or eDample in a PMarks Pta&le' a constraint can &e added to ensure t/at t/e marks in eac/ su&ject &e &et$een 9 and +99. 2uc/ a constraint is called a PC/eckK constraint. It is a rule t/at can &e set &y t/e user to

+8;

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

ensure t/at only data t/at meets t/e criteria speci ied t/ere is allo$ed to enter t/e data&ase. @/e given eDample ensures t/at only a num&er &et$een 9 and +99 can &e entered into t/e marks column. B'c0u< 'nd R"co&"r) In spite o ensuring t/at t/e data&ase is secure rom unaut/oriEed accessuser as $ell as invalid entries' t/ere is al$ays a danger t/at t/e data in t/e data&ase could get lost. @/ey could /appen due to some /ard$are pro&lems or system cras/. It could t/ere ore result in a loss o all data. @o guard t/e data&ase rom t/is' most 7"%M2s /ave in&uilt &ackup and recovery tec/ni*ues t/at ensure t/at t/e data&ase is protected rom t/ese kinds o atalities too.

,.8

Co#<'r !on o/ R"('t on'( D't'$'!" M'n'-"#"nt S)!t"#!

@/e ollo$ing ta&les compare general and tec/nical in ormation or a num&er o relational data&ase management systems. Comparisons are &ased on t/e sta&le versions $it/out any add4ons' eDtensions or eDternal programs.

,.8.* G"n"r'( n/or#'t on


M' nt' n"r 8t1 D #"n! on ," s.a.s ADABAS 2o t$are (G Ad'<t &" S"r&"r 2y&ase Ent"r<r !" Ad&'nt'-" 2y&ase D't'$'!" S"r&"r A<'c1" D"r$) D't'co# DB+ DBISAM D't'2'!< E("&'t"DB F ("M'0"r F r"$ rd In/or# 7 (pac/e F r!t <u$( c L't"!t r"("'!" !t'$(" d't" &"r! on +:<, v++ 2FL +:;9 # +:<; +::8 899, +1.9 <.+ +9.,.+.= ++.8 :.1 ,.81 +.9.+ +.9+ : 8.+.9 ++.+9 So/t2'r" ( c"n!" !roprietary # !roprietary !roprietary (pac/e License !roprietary !roprietary !roprietary !roprietary !roprietary proprietary I!L and I"!L !roprietary

C( # I%M +:<8 Elevate # 2o t$are 2igni icant "ata (pril 899< 2ystems Elevate # 2o t$are ?ileMaker +:<, ?ire&ird project #uly 81' 8999 I%M +:<1

+8<

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

CS=LDB C+ In-r"! Int"rB'!" M'7DB M cro!o/t Acc"!! M cro!o/t V !u'( Fo7<ro M cro!o/t S=L S"r&"r Mon"tDB M)S=L CP NonSto< S=L O#n ! Stud o

)2FL "evelopment Group )8 2o t$are Ingres Corp. CodeGear 2(! (G Microso t Microso t Microso t @/e Monet"% "eveloper @eam 2un Microsystems )e$lett4 !ackard @igerLogic Inc Oracle Corporation Oracle Corporation !rogress 2o t$are Corporation OpenLink 2o t$are !ervasive 2o t$are ENE( (%

899+ 8991 +:;, +:<1 # +::8 # +:<:

+.<.9 +.9

%2"

E!L and modi ied M!L Ingres 899. G!L and r8 :.+.9 proprietary 899; !roprietary G!L or ;.. proprietary +8 B899;C !roprietary : B8991C !roprietary

Or'c(" Or'c(" Rd$ O<"nEd-" O<"nL n0 V rtuo!o P"r&'! &" PS=L Po()1"dr' DBMS Po!t-r"S=L P)rr1o DBMS RB'!"

:.99.=9,8 !roprietary B8991 2!8C Monet"% ,.+. B?e&. 899, !u&lic License 899;C v+.+ Novem&er G!L or 1.9..; +::. proprietary 2FL MQ +:<; !roprietary 8.9 ,.=.+ #uly +:<8 7elease + !roprietary BMay 899<C ++g Novem&er 7elease + !roprietary +:;: B2eptem&er 899;C +:<, +:<, +::< # +::= ;.8 +9.+C 1.9.1 B#anuary 899<C : <.9 B#uly 899<C !roprietary !roprietary G!L or proprietary !roprietary !roprietary

!ostgre2FL Glo&al <.=.= B+8 #une +:<: %2" "evelopment #une 899<C Group University o Novem&er 9.1 !roprietary !aisley 8991 7%ase # ;.. !roprietary +8:

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

RDM E#$"dd"d RDM S"r&"r Sc #or"DB S#'((S=L S=L An)21"r" S=L t" T"r'd't' V'("nt n'

%irdstep @ec/nology %irdstep @ec/nology 2cimore 2mall2FL 2y&ase ". )ipp

+:<, +::9 8991 (pril 8991 +::8 +.'

<.+ <.9 8.1 9.+:

!roprietary !roprietary ?ree$are LG!L

7ic/ard (ugust 8999 +:<, ?e&ruary +::<

@eradata !aradigma 2o t$are

+9.9 !roprietary =.1.; B+; +;' Marc/ !u&lic domain 899<C 0+8 !roprietary =.9.+ !roprietary

,.8.+ O<"r't n- !)!t"# !u<<ort


@/e operating systems t/e 7"%M2es can run on.
3 ndo2! 8t1 D #"n! on Aes ADABAS Aes Ad'<t &" S"r&"r Aes Ent"r<r !" Ad&'nt'-" D't'$'!" Aes S"r&"r A<'c1" D"r$) + Aes D't'Co# No D't'2'!< Aes 5 DB+ Aes F r"$ rd Aes + CS=LDB Aes C+ + Aes F ("M'0"r Aes In/or# 7 Aes In-r"! Aes Int"rB'!" Aes M'c OS L nu7 I Aes No No Aes No No Aes No No No Aes Aes Aes Aes Aes Aes Aes Aes Aes No No Aes Aes Aes Aes No Aes Aes BSD No No Aes No Aes No No No Aes Aes Aes No Aes Aes No No No No No No Aes No UNII No Aes Aes No Aes No No Aes Aes Aes Aes No Aes Aes Aes B2olarisC Aes No No No Aes Aes No >BOS * No Aes No No Aes Aes No Aes May&e Aes May&e No No !artial No May&e No No No No May&e No

Aes Aes No No No No Aes No No No

M'7DB Aes M cro!o/t Acc"!! Aes M cro!o/t V !u'( Aes Fo7<ro M cro!o/t S=L Aes S"r&"r Mon"tDB Aes M)S=L Aes O#n ! Stud o Aes

Aes Aes Aes Aes Aes Aes

+=9

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

Or'c(" Or'c(" Rd$ , O<"nEd-" O<"nL n0 V rtuo!o Po()1"dr' DBMS Po!t-r"S=L P)rr1o DBMS RB'!" RDM E#$"dd"d RDM S"r&"r Sc #or"DB S#'((S=L + S=L An)21"r" S=L t" T"r'd't' V'("nt n'

Aes No Aes Aes Aes Aes

Aes No No Aes No Aes

Aes B.NE@C No Aes Aes Aes Aes Aes Aes Aes Aes Aes No Aes Aes No Aes Aes Aes No Aes

Aes No Aes Aes Aes Aes Aes BMonoC No Aes Aes No Aes Aes Aes Aes Aes

No No No Aes No Aes No No Aes Aes No Aes No Aes No No

Aes No Aes Aes Aes Aes No No Aes Aes No Aes Aes Aes Aes No

Aes No No Aes No No No No No No No Aes No May&e No No

Note B+C5 Open source data&ases listed as UNIQ4compati&le $ill likely compile and run under E-O2Is &uilt4in UNIQ 2ystem 2ervices BU22C su&system. Most data&ases listed as LinuD4compati&le can run alongside E-O2 on t/e same server using LinuD on E2eries. Note B85 @/e data&ase availa&ility depends on #ava 0irtual Mac/ine not on t/e operatin system Note B=C5 Oracle 7d& $as originally developed &y "EC' and runs on Open0M2 Note B,C5 Oracle data&ase ++g also runs on Open0M2' )!-UQ and (IQ. +9g also supported %28999-O2" and E-O2 B=+4&itC' &ut t/at support /as &een discontinued in ++g. Earlier versions t/an +9g $ere availa&le on a $ide variety o plat orms. Note B1C5 "%8 is also availa&le or i1-O2' E-0M' E-02E. !revious versions $ere also availa&le or O2-8.

,.8., Fund'#"nt'( /"'tur"!


In ormation a&out $/at undamental 7"%M2 eatures are implemented natively.
ACID 8t1 D #"n! on ADABAS Ad'<t &" S"r&"r Ent"r<r !" Aes # Aes R"/"r"nt '( Tr'n!'ct on! nt"-r t) Aes # Aes Aes # Aes Un cod" Aes # Aes Int"r/'c" GUI L 2FL # #

+=+

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

Ad&'nt'-" D't'$'!" Aes S"r&"r A<'c1" Aes D"r$) D't'2'!< No DB+ Aes F r"$ rd Aes CS=LDB Aes C+ Aes In/or# 7 Aes In-r"! Aes Int"rB'!" Aes M'7DB Aes M cro!o/t No Acc"!! M cro!o/t No V !u'( Fo7<ro M cro!o/t Aes S=L S"r&"r Mon"tDB Aes M)S=L Aes . Or'c(" Aes Or'c(" Rd$ Aes O<"nEd-" O<"nL n0 V rtuo!o Po()1"dr' DBMS Po!t-r"S=L P)rr1o DBMS RDM E#$"dd"d RDM S"r&"r Sc #or"DB S=L An)21"r" S=L t" T"r'd't' V'("nt n' Aes Aes Aes Aes Aes Aes Aes Aes Aes Aes Aes No

Aes Aes Aes Aes Aes Aes Aes Aes Aes Aes Aes Aes Aes Aes Aes Aes . Aes Aes No ; Aes Aes Aes Aes Aes Aes Aes Aes No < Aes Aes

Aes Aes Aes Aes Aes Aes Aes Aes Aes Aes Aes Aes Aes Aes Aes Aes . Aes Aes Aes Aes Aes Aes Aes Aes Aes Aes Aes %asic < Aes No

No Aes Aes Aes Aes Aes Aes Aes Aes Aes Aes Aes No Aes Aes !artial Aes Aes Aes Aes Aes Aes Aes Aes Aes !artial Aes Aes Aes Aes

(!I L 2FL 2FL GUI GUI L 2FL 2FL 2FL 2FL # 2FL 2FL 2FL GUI L 2FL GUI L 2FL 2FL # 2FL 2FL # !rogress L 2FL # 2FL 2FL # 2FL L (!I 2FL L (!I 2FL # 2FL 2FL #

,GL

Note B.C5 ?or transactions and re erential integrity' t/e Inno"% ta&le type must &e usedJ Windo$s installer sets t/is as de ault i support or transactions is selected' on ot/er operating systems t/e de ault ta&le type is MyI2(M. )o$ever' even t/e Inno"% ta&le type permits storage o values t/at eDceed t/e data rangeJ some vie$ t/is as violating t/e Integrity constraint o (CI".

+=8

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

Note B;C5 ?O7EIGN 3EA constraints are parsed &ut are not en orced. @riggers can &e used instead. Nested transactions are not supported. Note B<C5 (vaila&le via @riggers.

8.:

CONCLUSION

@/e most dominant model in use today is t/e relational data&ase management systems' usually used $it/ t/e structured *uery language 2FL *uery language. Many "%M2 also support t/e Open "ata&ase Connectivitry t/at supports a standard $ay or programmers to access t/e data&ase management systems.

5.:

SUMMARY

( 7elational data&ase management system B7"%M2C is a data&ase management system B"%M2C t/at is &ased on t/e relational model as introduced &y E. ?. Codd. Most popular commercial and open source data&ases currently in use are &ased on t/e relational model. E. ?. Codd introduced t/e term in /is seminal paper M( 7elational Model o "ata or Large 2/ared "ata %anksM' pu&lis/ed in +:;9. In t/is paper and later papers /e de ined $/at /e meant &y r"('t on'(. One $ell4kno$n de inition o $/at constitutes a relational data&ase system is CoddIs +8 rules @/e most popular de inition o an 7"%M2 is a product t/at presents a vie$ o data as a collection o ro$s and columns' even i it is not &ased strictly upon relational t/eory (s mentioned earlier' an 7"%M2 is so t$are t/at is used or creating and maintaining a data&ase. Maintaining involves several tasks t/at an 7"%M2 takes care o Comparisons are &ased on t/e sta&le versions $it/out any add4ons' eDtensions or eDternal programs.

?.:
+. 8.

TUTOR@MARAED ASSIGNMENT
List 1 eatures o 7elational "ata&ase Management 2ystems Mention 1 criteria you can use to di erentiate types o 7"%M2s

7.:

REFERENCESBFURTCER READINGS

Comparison o di erent 2FL implementations against 2FL standards. Includes Oracle' "%8' Microso t 2FL 2erver' My2FL and !ostgre2FL. B9<-#un-899;C. Comparison o Oracle <-:i' My2FL ,.D and !ostgre2FL ;.D "%M2 against 2FL standards. B+,-Mar-8991C.

+==

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

Comparison o Oracle and 2FL 2erver. B899,C. Comparison o geometrical data /andling in !ostgre2FL' My2FL and "%8 B8:-2ep-899=C. Open 2ource "ata&ase 2o t$are Comparison BMar-8991C. !ostgre2FL vs. My2FL vs. Commercial "ata&ases5 ItIs (ll (&out W/at Aou Need B+8-(pr-899,C.

+=,

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

UNIT +
CONTENTS +.9 8.9 ,.:

DATA 3ARECOUSE

,.9 1.9 ..9 ;.9

Introduction O&jectives M' n Cont"nt ,.* C !tor) =.8 B"n"/ t! o/ D't' 3'r"1ou! n=.= D't' 3'r"1ou!" Arc1 t"ctur" =., Nor#'( >"d V"r!u! D #"n! on'( A<<ro'c1 to Stor'-" o/ D't' ,.5 Con/or# n- In/or#'t on ,? To<@Do2n &"r!u! Botto#@U< D"! -n M"t1odo(o- "! ,.7 D't' 3'r"1ou!"! &"r!u! O<"r't on'( S)!t"#! ,.8 E&o(ut on n Or-'n >'t on U!" o/ D't' 3'r"1ou!"! ,.F D !'d&'nt'-"! o/ D't' 3'r"1ou!"! =.+9 "ata Ware/ouse (ppliance =.++ T1" Futur" o/ D't' 3'r"1ou! nConclusion 2ummary @utor4Marked (ssignment 7e erences-?urt/er 7eadings

*.:

INTRODUCTION

( d't' 2'r"1ou!" is a repository o an organiEationIs electronically stored data. "ata $are/ouses are designed to acilitate reporting and analysis. @/is classic de inition o t/e data $are/ouse ocuses on data storage. )o$ever' t/e means to retrieve and analyEe data' to eDtract' trans orm and load data' and to manage t/e dictionary data are also considered essential components o a data $are/ousing system. Many re erences to data $are/ousing use t/is &roader conteDt. @/us' an eDpanded de inition or data $are/ousing includes &usiness intelligence tools' tools to eDtract' trans orm' and load data into t/e repository' and tools to manage and retrieve metadata. In contrast to data $are/ouses are operational systems $/ic/ per orm day4to4day transaction processing.

+=1

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

+.:

OB;ECTIVES

(t t/e end o t/is unit' you s/ould &e a&le to5


d"/ n" d't' 2'r"1ou!" trace t/e /istory and development process o data $are/ouse list various &ene its o data $are/ouse de ine t/e arc/itecture o a data $are/ouse compare and contrast "ata Ware/ouses and Operational 2ystems kno$ $/at is a data $are/ouse appliance' and t/e disadvantages o data $are/ouse /ave idea o $/at t/e uture /olds or data $are/ouse concept.

,.: ,.*

MAIN CONTENT C !tor)

@/e concept o data $are/ousing dates &ack to t/e late4+:<9s $/en I%M researc/ers %arry "evlin and !aul Murp/y developed t/e M&usiness data $are/ouseM. In essence' t/e data $are/ousing concept $as intended to provide an arc/itectural model or t/e lo$ o data rom operational systems to decision support environments. @/e concept attempted to address t/e various pro&lems associated $it/ t/is lo$ 4 mainly' t/e /ig/ costs associated $it/ it. In t/e a&sence o a data $are/ousing arc/itecture' an enormous amount o redundancy o in ormation $as re*uired to support t/e multiple decision support environment t/at usually eDisted. In larger corporations it $as typical or multiple decision support environments to operate independently. Eac/ environment served di erent users &ut o ten re*uired muc/ o t/e same data. @/e process o gat/ering' cleaning and integrating data rom various sources' usually long eDisting operational systems Busually re erred to as legacy systemsC' $as typically in part replicated or eac/ environment. Moreover' t/e operational systems $ere re*uently reeDamined as ne$ decision support re*uirements emerged. O ten ne$ re*uirements necessitated gat/ering' cleaning and integrating ne$ data rom t/e operational systems t/at $ere logically related to prior gat/ered data. %ased on analogies $it/ real4li e $are/ouses' data $are/ouses $ere intended as large4scale collection-storage-staging areas or corporate data. "ata could &e retrieved rom one central point or data could &e distri&uted to Mretail storesM or Mdata martsM $/ic/ $ere tailored or ready access &y users.

+=.

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

,.+

B"n"/ t! o/ D't' 3'r"1ou! n-

2ome o t/e &ene its t/at a data $are/ouse provides are as ollo$s5 ( data $are/ouse provides a common data model or all data o interest regardless o t/e dataIs source. @/is makes it easier to report and analyEe in ormation t/an it $ould &e i multiple data models $ere used to retrieve in ormation suc/ as sales invoices' order receipts' general ledger c/arges' etc. !rior to loading data into t/e data $are/ouse' inconsistencies are identi ied and resolved. @/is greatly simpli ies reporting and analysis. In ormation in t/e data $are/ouse is under t/e control o data $are/ouse users so t/at' even i t/e source system data is purged over time' t/e in ormation in t/e $are/ouse can &e stored sa ely or eDtended periods o time. %ecause t/ey are separate rom operational systems' data $are/ouses provide retrieval o data $it/out slo$ing do$n operational systems. "ata $are/ouses acilitate decision support system applications suc/ as trend reports Be.g.' t/e items $it/ t/e most sales in a particular area $it/in t/e last t$o yearsC' eDception reports' and reports t/at s/o$ actual per ormance versus goals. "ata $are/ouses can $ork in conjunction $it/ and' /ence' en/ance t/e value o operational &usiness applications' nota&ly customer relations/ip management BC7MC systems.

,.,

D't' 3'r"1ou!" Arc1 t"ctur"

(rc/itecture' in t/e conteDt o an organiEationIs data $are/ousing e orts' is a conceptualiEation o /o$ t/e data $are/ouse is &uilt. @/ere is no rig/t or $rong arc/itecture. @/e $ort/iness o t/e arc/itecture can &e judged in /o$ t/e conceptualiEation aids in t/e &uilding' maintenance' and usage o t/e data $are/ouse. One possi&le simple conceptualiEation o a data $are/ouse arc/itecture consists o t/e ollo$ing interconnected layers5 O<"r't on'( D't'$'!" L')"r @/e source data or t/e data $are/ouse 4 (n organiEationIs E7! systems all into t/is layer.

+=;

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

In/or#'t on'( Acc"!! L')"r @/e data accessed or reporting and analyEing and t/e tools or reporting and analyEing data 4 %usiness intelligence tools all into t/is layer. (nd t/e Inmon43im&all di erences a&out design met/odology' discussed later in t/is article' /ave to do $it/ t/is layer. D't' 'cc"!! L')"r @/e inter ace &et$een t/e operational and in ormational access layer 4 @ools to eDtract' trans orm' load data into t/e $are/ouse all into t/is layer. M"t'd't' L')"r @/e data directory 4 @/is is o ten usually more detailed t/an an operational system data directory. @/ere are dictionaries or t/e entire $are/ouse and sometimes dictionaries or t/e data t/at can &e accessed &y a particular reporting and analysis tool.

,.8

Nor#'( >"d V"r!u! D #"n! on'( A<<ro'c1 to Stor'-" o/ D't'

@/ere are t$o leading approac/es to storing data in a data $are/ouse 4 t/e dimensional approac/ and t/e normaliEed approac/. In t/e dimensional approac/' transaction data are partitioned into eit/er T actsU' $/ic/ are generally numeric transaction data' or MdimensionsM' $/ic/ are t/e re erence in ormation t/at gives conteDt to t/e acts. ?or eDample' a sales transaction can &e &roken up into acts suc/ as t/e num&er o products ordered and t/e price paid or t/e products' and into dimensions suc/ as order date' customer name' product num&er' order s/ip4to and &ill4to locations' and salesperson responsi&le or receiving t/e order. ( key advantage o a dimensional approac/ is t/at t/e data $are/ouse is easier or t/e user to understand and to use. (lso' t/e retrieval o data rom t/e data $are/ouse tends to operate very *uickly. @/e main disadvantages o t/e dimensional approac/ are5 +C In order to maintain t/e integrity o acts and dimensions' loading t/e data $are/ouse $it/ data rom di erent operational systems is complicated' and 8C It is di icult to modi y t/e data $are/ouse structure i t/e organiEation adopting t/e dimensional approac/ c/anges t/e $ay in $/ic/ it does &usiness. In t/e normaliEed approac/' t/e data in t/e data $are/ouse are stored ollo$ing' to a degree' t/e Codd normaliEation rule. @a&les are grouped toget/er &y !u$%"ct 'r"'! t/at re lect general data categories Be.g.' data
+=<

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

on customers' products' inance' etc.C @/e main advantage o t/is approac/ is t/at it is straig/t or$ard to add in ormation into t/e data&ase. ( disadvantage o t/is approac/ is t/at' &ecause o t/e num&er o ta&les involved' it can &e di icult or users &ot/ to +C join data rom di erent sources into meaning ul in ormation and t/en 8C access t/e in ormation $it/out a precise understanding o t/e sources o data and o t/e data structure o t/e data $are/ouse. @/ese approac/es are not eDact opposites o eac/ ot/er. "imensional approac/es can involve normaliEing data to a degree.

,.5

Con/or# n- In/or#'t on

(not/er important decision in designing a data $are/ouse is $/ic/ data to con orm and /o$ to con orm t/e data. ?or eDample' one operational system eeding data into t/e data $are/ouse may use MMM and M?M to denote seD o an employee $/ile anot/er operational system may use MMaleM and M?emaleM. @/oug/ t/is is a simple eDample' muc/ o t/e $ork in implementing a data $are/ouse is devoted to making similar meaning data consistent $/en t/ey are stored in t/e data $are/ouse. @ypically' eDtract' trans orm' load tools are used in t/is $ork.

,.?

To<@Do2n &"r!u! Botto#@U< D"! -n M"t1odo(o- "!

Botto#@U< D"! -n 7alp/ 3im&all' a $ell4kno$n aut/or on data $are/ousing' is a proponent o t/e bottom#up approac/ to data $are/ouse design. In t/e &ottom4up approac/ data marts are irst created to provide reporting and analytical capa&ilities or speci ic &usiness processes. "ata marts contain atomic data and' i necessary' summariEed data. @/ese data marts can eventually &e unioned toget/er to create a compre/ensive data $are/ouse. @/e com&ination o data marts is managed t/roug/ t/e implementation o $/at 3im&all calls Ma data $are/ouse &us arc/itectureM. %usiness value can &e returned as *uickly as t/e irst data marts can &e created. Maintaining tig/t management over t/e data $are/ouse &us arc/itecture is undamental to maintaining t/e integrity o t/e data $are/ouse. @/e most important management task is making sure dimensions among data marts are consistent. In 3im&all $ords' t/is means t/at t/e dimensions Mcon ormM. To<@Do2n D"! -n %ill Inmon' one o t/e irst aut/ors on t/e su&ject o data $are/ousing' /as de ined a data $are/ouse as a centraliEed repository or t/e entire
+=:

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

enterprise. Inmon is one o t/e leading proponents o t/e top#do1n approac/ to data $are/ouse design' in $/ic/ t/e data $are/ouse is designed using a normaliEed enterprise data model. M(tomicM data' t/at is' data at t/e lo$est level o detail' are stored in t/e data $are/ouse. "imensional data marts containing data needed or speci ic &usiness processes or speci ic departments are created rom t/e data $are/ouse. In t/e Inmon vision t/e data $are/ouse is at t/e center o t/e MCorporate In ormation ?actoryM BCI?C' $/ic/ provides a logical rame$ork or delivering &usiness intelligence B%IC and &usiness management capa&ilities. @/e CI? is driven &y data provided rom &usiness operations Inmon states t/at t/e data $are/ouse is5 Su$%"ct@Or "nt"d @/e data in t/e data $are/ouse is organiEed so t/at all t/e data elements relating to t/e same real4$orld event or o&ject are linked toget/er. T #"@V'r 'nt @/e c/anges to t/e data in t/e data $are/ouse are tracked and recorded so t/at reports can &e produced s/o$ing c/anges over time. Non@Vo('t (" "ata in t/e data $are/ouse is never over4$ritten or deleted 4 once committed' t/e data is static' read4only' and retained or uture reporting. Int"-r't"d @/e data $are/ouse contains data rom most or all o an organiEationIs operational systems and t/is data is made consistent. @/e top4do$n design met/odology generates /ig/ly consistent dimensional vie$s o data across data marts since all data marts are loaded rom t/e centraliEed repository. @op4do$n design /as also proven to &e ro&ust against &usiness c/anges. Generating ne$ dimensional data marts against t/e data stored in t/e data $are/ouse is a relatively simple task. @/e main disadvantage to t/e top4do$n met/odology is t/at it represents a very large project $it/ a very &road scope. @/e up4 ront cost or implementing a data $are/ouse using t/e top4do$n met/odology is signi icant' and t/e duration o time rom t/e start o project to t/e point t/at end users eDperience initial &ene its can &e su&stantial. In addition' t/e top4do$n met/odology can &e in leDi&le and unresponsive to c/anging departmental needs during t/e implementation p/ases.
+,9

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

C)$r d D"! -n Over time it /as &ecome apparent to proponents o &ottom4up and top4 do$n data $are/ouse design t/at &ot/ met/odologies /ave &ene its and risks. )y&rid met/odologies /ave evolved to take advantage o t/e ast turn4around time o &ottom4up design and t/e enterprise4$ide data consistency o top4do$n design

,.7

D't' 3'r"1ou!"! &"r!u! O<"r't on'( S)!t"#!

Operational systems are optimiEed or preservation o data integrity and speed o recording o &usiness transactions t/roug/ use o data&ase normaliEation and an entity4relations/ip model. Operational system designers generally ollo$ t/e Codd rules o data normaliEation in order to ensure data integrity. Codd de ined ive increasingly stringent rules o normaliEation. ?ully normaliEed data&ase designs Bt/at is' t/ose satis ying all ive Codd rulesC o ten result in in ormation rom a &usiness transaction &eing stored in doEens to /undreds o ta&les. 7elational data&ases are e icient at managing t/e relations/ips &et$een t/ese ta&les. @/e data&ases /ave very ast insert-update per ormance &ecause only a small amount o data in t/ose ta&les is a ected eac/ time a transaction is processed. ?inally' in order to improve per ormance' older data are usually periodically purged rom operational systems. "ata $are/ouses are optimiEed or speed o data retrieval. ?re*uently data in data $are/ouses are denormalised via a dimension4&ased model. (lso' to speed data retrieval' data $are/ouse data are o ten stored multiple times 4 in t/eir most granular orm and in summariEed orms called aggregates. "ata $are/ouse data are gat/ered rom t/e operational systems and /eld in t/e data $are/ouse even a ter t/e data /as &een purged rom t/e operational systems.

,.8

E&o(ut on n Or-'n >'t on U!" o/ D't' 3'r"1ou!"!

OrganiEations generally start o $it/ relatively simple use o data $are/ousing. Over time' more sop/isticated use o data $are/ousing evolves. @/e ollo$ing general stages o use o t/e data $are/ouse can &e distinguis/ed5 O// ( n" O<"r't on'( D't'$'!"! "ata $are/ouses in t/is initial stage are developed &y simply copying t/e data o an operational system to anot/er server $/ere t/e processing load o reporting against t/e copied data does not impact t/e operational systemIs per ormance.

+,+

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

O// ( n" D't' 3'r"1ou!" "ata $are/ouses at t/is stage are updated rom data in t/e operational systems on a regular &asis and t/e data $are/ouse data is stored in a data structure designed to acilitate reporting. R"'( T #" D't' 3'r"1ou!" "ata $are/ouses at t/is stage are updated every time an operational system per orms a transaction Be.g.' an order or a delivery or a &ooking.C Int"-r't"d D't' 3'r"1ou!" "ata $are/ouses at t/is stage are updated every time an operational system per orms a transaction. @/e data $are/ouses t/en generate transactions t/at are passed &ack into t/e operational systems.

,.F

D !'d&'nt'-"! o/ D't' 3'r"1ou!"!

@/ere are also disadvantages to using a data $are/ouse. 2ome o t/em are5 Over t/eir li e' data $are/ouses can /ave /ig/ costs. @/e data $are/ouse is usually not static. Maintenance costs are /ig/. "ata $are/ouses can get outdated relatively *uickly. @/ere is a cost o delivering su&optimal in ormation to t/e organiEation. @/ere is o ten a ine line &et$een data $are/ouses and operational systems. "uplicate' eDpensive unctionality may &e developed. Or' unctionality may &e developed in t/e data $are/ouse t/at' in retrospect' s/ould /ave &een developed in t/e operational systems and vice versa.

,.*: D't' 3'r"1ou!" A<<( 'nc"


( d't' 2'r"1ou!" '<<( 'nc" is an integrated set o servers' storage' O2' "%M2 and so t$are speci ically pre4installed and pre4optimiEed or data $are/ousing. (lternatively' t/e term is also used or similar so t$are4only systems t/at purportedly are very easy to install on speci ic recommended /ard$are con igurations. "W appliances provide solutions or t/e mid4to4large volume data $are/ouse market' o ering lo$4cost per ormance most commonly on data volumes in t/e tera&yte to peta&yte range.

+,8

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

T"c1no(o-) Pr #"r Most "W appliance vendors use massively parallel processing BM!!C arc/itectures to provide /ig/ *uery per ormance and plat orm scala&ility. M!! arc/itectures consist o independent processors or servers eDecuting in parallel. Most M!! arc/itectures implement a Ts/ared not/ing arc/itectureU $/ere eac/ server is sel 4su icient and controls its o$n memory and disk. 2/ared not/ing arc/itectures /ave a proven record or /ig/ scala&ility and little contention. "W appliances distri&ute data onto dedicated disk storage units connected to eac/ server in t/e appliance. @/is distri&ution allo$s "W appliances to resolve a relational *uery &y scanning data on eac/ server in parallel. @/e divide4 and4con*uer approac/ delivers /ig/ per ormance and scales linearly as ne$ servers are added into t/e arc/itecture. M!! data&ase arc/itectures are not ne$. @eradata' @andem' %ritton Lee' and 2e*uent o ered M!! 2FL4&ased arc/itectures in t/e +:<9s. @/e re4 emergence o M!! data $are/ouses /as &een aided &y open source and commodity components. (dvances in tec/nology /ave reduced costs and improved per ormance in storage devices' multi4core C!Us and net$orking components. Open source 7"%M2 products' suc/ as Ingres and !ostgre2FL' reduce so t$are license costs and allo$ "W appliance vendors to ocus on optimiEation rat/er t/an providing &asic data&ase unctionality. Open source LinuD provides a sta&le' $ell4implemented O2 or "W appliances. C !tor) Many consider @eradataKs initial product as t/e irst "W appliance Bor %ritton4LeeIs' &ut %ritton LeeSrenamed 2/are%aseS$as ac*uired &y @eradata in #une' +::9C. 2ome regard @eradataIs current o erings as still &eing ot/er appliances' $/ile ot/ers argue t/at t/ey all s/ort in ease o installation or administration. Interest in t/e data $are/ouse appliance category is generally dated to t/e emergence o NeteEEa in t/e early 8999s. More recently' a second generation o modern "W appliances /as emerged' marking t/e move to mainstream vendor integration. I%M integrated its In o2p/ere Ware/ouse B ormerly "%8 Ware/ouseC $it/ its o$n servers and storage to create t/e I%M In o2p/ere %alanced Ware/ouse. Ot/er "W appliance vendors /ave partnered $it/ major /ard$are vendors to /elp &ring t/eir appliances to market. "(@(llegro partners $it/ EMC and "ell and implements open source Ingres on LinuD. Greenplum /as a partners/ip $it/ 2un Microsystems and implements %iEgres Ba orm o !ostgre2FLC on 2olaris using t/e \?2 ile system. )! Neovie$ /as a $/olly4o$ned solution and uses )! Non2top 2FL.

+,=

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

3ognitio o ers a ro$4&ased TvirtualU data $are/ouse appliance $/ile 0ertica' and !ar(ccel o er column4&ased TvirtualU data $are/ouse appliances. Like Greenplum' !ar(ccel partners $it/ 2un Microsystems. @/ese solutions provide so t$are4only solutions deployed on clusters o commodity /ard$are. 3ognitioKs /omegro$n WQ8 data&ase runs on several &lade con igurations. Ot/er players in t/e "W appliance space include Calpont and "ataupia. 7ecently' t/e market /as seen t/e emergence o data $are/ouse &undles $/ere vendors com&ine t/eir /ard$are and data&ase so t$are toget/er as a data $are/ouse plat orm. @/e Oracle OptimiEed Ware/ouse Initiative com&ines t/e Oracle "ata&ase $it/ t/e industryKs leading computer manu acturers "ell' EMC' )!' I%M' 2GI and 2un Microsystems. OracleIs OptimiEed Ware/ouses are pre4validated con igurations and t/e data&ase so t$are comes pre4installed' t/oug/ some analysts di er as to $/et/er t/ese s/ould &e regarded as appliances. B"n"/ t! R"duct on n Co!t! @/e total cost o o$ners/ip B@COC o a data $are/ouse consists o initial entry costs' on4going maintenance costs and t/e cost o increasing capacity as t/e data $are/ouse gro$s. "W appliances o er lo$ entry and maintenance costs. Initial costs range rom i+9'999 to i+19'999 per tera&yte' depending on t/e siEe o t/e "W appliance installed. @/e resource cost or monitoring and tuning t/e data $are/ouse makes up a large part o t/e @CO' o ten as muc/ as <9G. "W appliances reduce administration or day4to4day operations' setup and integration. Many also o er lo$ costs or eDpanding processing po$er and capacity. Wit/ t/e increased ocus on controlling costs com&ined $it/ tig/t I@ %udgets' data $are/ouse managers need to reduce and manage eDpenses $/ile leveraging t/eir tec/nology as muc/ as possi&le making "W appliances a natural solution. P'r'(("( P"r/or#'nc" "W appliances provide a compelling price-per ormance ratio. Many support miDed4$orkloads $/ere a &road range o ad4/oc *ueries and reports run simultaneously $it/ loading. "W appliance vendors use several distri&ution and partitioning met/ods to provide parallel per ormance. 2ome "W appliances scan data using partitioning and se*uential I-O instead o indeD usage. Ot/er "W appliances use

+,,

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

standard data&ase indeDing. Wit/ /ig/ per ormance on /ig/ly granular data' "W appliances are a&le to address analytics t/at previously could not meet per ormance re*uirements. R"duc"d Ad# n !tr't on "W appliances provide a single vendor solution and take o$ners/ip or optimiEing t/e parts and so t$are $it/in t/e appliance. @/is eliminates t/e customerKs costs or integration and regression testing o t/e "%M2' storage and O2 on a tera&yte scale and avoids some o t/e compati&ility issues t/at arise rom multi4vendor solutions. ( single support point also provides a single source or pro&lem resolution and a simpli ied upgrade pat/ or so t$are and /ard$are. @/e care and eeding o "W appliances is less t/an many alternate data $are/ouse solutions. "W appliances reduce administration t/roug/ automated space allocation' reduced indeD maintenance and in most cases' reduced tuning and per ormance analysis. Bu (t@ n C -1 A&' ('$ ( t) "W appliance vendors provide &uilt4in /ig/ availa&ility t/roug/ redundancy on components $it/in t/e appliance. Many o er $arm4 stand&y servers' dual net$orks' dual po$er supplies' disk mirroring $it/ ro&ust ailover and solutions or server ailure. Sc'('$ ( t) "W appliances scale or &ot/ capacity and per ormance. Many "W appliances implement a modular design t/at data&ase administrators can add to incrementally' eliminating up4 ront costs or over4provisioning. In contrast' arc/itectures t/at do not support incremental eDpansion result in /ours o production do$ntime' during $/ic/ data&ase administrators eDport and re4load tera&ytes o data. In M!! arc/itectures' adding servers increases per ormance as $ell as capacity. @/is is not al$ays t/e case $it/ alternate solutions. R'< d T #"@to@V'(u" Companies increasingly eDpect to use &usiness analytics to improve t/e current cycle. "W appliances provide ast implementations $it/out t/e need or regression and integration testing. 7apid prototyping is possi&le &ecause o reduced tuning and indeD creation' ast loading and reduced needs or aggregation in some cases.

+,1

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

A<<( c't on U!"! D3 '<<( 'nc"! <ro& d" !o(ut on! /or #'n) 'n'()t c '<<( c't on u!"!E nc(ud n-: Enterprise data $are/ousing 2uper4siEed sand&oDes isolate po$er users $it/ resource intensive *ueries !ilot projects or projects re*uiring rapid prototyping and rapid time4to4 value O 4loading projects rom t/e enterprise data $are/ouseJ ie large analytical *uery projects t/at a ect t/e overall $orkload o t/e enterprise data $are/ouse (pplications $it/ speci ic per ormance or loading re*uirements "ata marts t/at /ave outgro$n t/eir present environment @urnkey data $are/ouses or data marts 2olutions or applications $it/ /ig/ data gro$t/ and /ig/ per ormance re*uirements (pplications re*uiring data $are/ouse encryption Tr"nd! T1" D3 '<<( 'nc" #'r0"t ! !1 /t n- tr"nd! n #'n) 'r"'! '! t "&o(&"!: 0endors are moving to$ard using commodity tec/nologies rat/er t/an proprietary assem&ly o commodity components. Implemented applications s/o$ usage eDpansion rom tactical and data mart solutions to strategic and enterprise data $are/ouse use. Mainstream vendor participation is no$ apparent. Wit/ a lo$er total cost o o$ners/ip' reduced maintenance and /ig/ per ormance to address &usiness analytics on gro$ing data volumes' most analysts &elieve t/at "W appliances $ill gain market s/are.

,.** T1" Futur" o/ D't' 3'r"1ou! n"ata $are/ousing' like any tec/nology nic/e' /as a /istory o innovations t/at did not receive market acceptance. ( 899; Gartner Group paper predicted t/e ollo$ing tec/nologies could &e disruptive to t/e &usiness intelligence market. 2ervice Oriented (rc/itecture 2earc/ capa&ilities integrated into reporting and analysis tec/nology

+,.

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

2o t$are as a 2ervice (nalytic tools t/at $ork in memory 0isualiEation

(not/er prediction is t/at data $are/ouse per ormance $ill continue to &e improved &y use o data $are/ouse appliances' many o $/ic/ incorporate t/e developments in t/e a orementioned Gartner Group report. ?inally' management consultant @/omas "avenport' among ot/ers' predicts t/at more organiEations $ill seek to di erentiate t/emselves &y using analytics ena&led &y data $are/ouses.

8.:

CONCLUSION

"ata $are/ouse is no$ emerging as very important in data&ase management systems. @/is is as a result t/e gro$t/ in t/e data&ase o large corporations. ( data $are/ouse no$ makes it easier or t/e /olding o data $/ile in use. )o$ever' t/ere are c/allenges are constraints in t/e acceptance and implementation o data $are/ouse' $/ic/ is a normal in t/e development o any concept. @/e uture o data $are/ouse is good as some organiEations $ill opt or it.

5.:

SUMMARY
( d't' 2'r"1ou!" is a repository o an organiEationIs electronically stored data. "ata $are/ouses are designed to acilitate reporting and analysis. @/e concept o data $are/ousing dates &ack to t/e late4+:<9s $/en I%M researc/ers %arry "evlin and !aul Murp/y developed t/e M&usiness data $are/ouseM. (rc/itecture' in t/e conteDt o an organiEationIs data $are/ousing e orts' is a conceptualiEation o /o$ t/e data $are/ouse is &uilt. @/ere are t$o leading approac/es to storing data in a data $are/ouse 4 t/e dimensional approac/ and t/e normaliEed approac/. (not/er important decision in designing a data $are/ouse is $/ic/ data to con orm and /o$ to con orm t/e data. 7alp/ 3im&all' a $ell4kno$n aut/or on data $are/ousing' is a proponent o t/e bottom#up approac/ to data $are/ouse design. Operational systems are optimiEed or preservation o data integrity and speed o recording o &usiness transactions t/roug/ use o data&ase normaliEation and an entity4relations/ip model. OrganiEations generally start o $it/ relatively simple use o data $are/ousing. Over time' more sop/isticated use o data $are/ousing evolves.

+,;

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

( d't' 2'r"1ou!" '<<( 'nc" is an integrated set o servers' storage' O2' "%M2 and so t$are speci ically pre4installed and pre4optimiEed or data $are/ousing "ata $are/ousing' like any tec/nology nic/e' /as a /istory o innovations t/at did not receive market acceptance.

?.:

Tutor@M'r0"d A!! -n#"nt

+. "iscuss t/e &ene its associated $it/ t/e use o data $are/ouse..

8. Mention 1 applications o data $are/ouse appliances

7.:

REFERENCESBFURTCER READINGS

Inmon' W.). Tech Topic: 8hat is a Data 8arehouseH !rism 2olutions. 0olume +. +::1. Aang' #un. 8are9ouse Information (rototype at 0tanford -89I(0.. 2tan ord University. #uly ;' +::<. Caldeira' C. M"ata Ware/ousing 4 Conceitos e ModelosM. Edijkes 2lla&o. 899<. I2%N :;<4:;84.+<4,;:4: 3im&all' 7. and 7oss' M. M@/e "ata Ware/ouse @oolkit5 @/e Complete Guide to "imensional ModelingM. pp. =+9. Wiley. 8nd Ed. 8998. I2%N 94,;+48998,4;. Ericsson' 7. M%uilding %usiness Intelligence (pplications $it/ .NE@M. +st Ed. C/arles 7iver Media. ?e&ruary 899,. pp. 8<48:. !endse' Nigel and %ange' Carsten M@/e Missing NeDt %ig @/ingsM' 2c/legel' 3urt MEmerging @ec/nologies Could !rove "isruptive to t/e %usiness Intelligence MarketM' Gartner Group. #uly .' 899; "avenport' @/omas and )arris' #eanne MCompeting on (nalytics5 @/e Ne$ 2cience o WinningM. )arvard %usiness 2c/ool !ress. 899;. I2%N +4,884+9==84=. Fueries rom )ell &log m W/en is an appliance not an applianceN "%M28 S "ata%ase Management 2ystem 2ervicesm%log (rc/ive m "ata $are/ouse appliances O act and iction @odd W/ite BNovem&er 1 +::9C. M@eradata Corp. su ers irst *uarterly Loss in our yearsM. 7os Angeles $usiness ;ournal.

+,<

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

UNIT ,
CONTENTS +.9 8.9 ,.:

DOCUMENT MANAGEMENT SYSTEM

,.9 1.9 ..9 ;.9

Introduction O&jectives M' n Cont"nt ,.* C !tor) 3.2 Document Management and Content Management ,., Co#<on"nt! ,.8 I!!u"! Addr"!!"d n Docu#"nt M'n'-"#"nt =.1 Using QML in "ocument and In ormation Management ,.? T)<"! o/ Docu#"nt M'n'-"#"nt S)!t"#! Conclusion 2ummary @utor4Marked (ssignment 7e erences-?urt/er 7eadings

*.:

INTRODUCTION

( docu#"nt #'n'-"#"nt !)!t"# B"M2C is a computer system Bor set o computer programsC used to track and store electronic documents and-or images o paper documents. @/e term /as some overlap $it/ t/e concepts o Content Management 2ystems and is o ten vie$ed as a component o Enterprise Content Management 2ystems and related to "igital (sset Management' "ocument imaging' Work lo$ systems and 7ecords Management systems. Contract Management and Contract Li ecycle Management BCLMC can &e vie$ed as eit/er components or implementations o ECM.

+.:

OB;ECTIVES

(t t/e end o t/is unit' you s/ould &e a&le to5 de ine document management system trace t/e /istory and development process o document management system compare and contrast document management system and content management systems kno$ t/e &asic components o document management systems ans$er t/e *uestion o issues addressed &y document management systems kno$ t/e types o document management systems availa&le o t/e s/el .

+,:

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

,.: ,.*

MAIN CONTENT C !tor)

%eginning in t/e +:<9s' a num&er o vendors &egan developing systems to manage paper4&ased documents. @/ese systems managed paper documents' $/ic/ included not only printed and pu&lis/ed documents' &ut also p/otos' prints' etc. Later' a second system $as developed' to manage electronic documents' i.e.' all t/ose documents' or iles' created on computers' and o ten stored on local user ile systems. @/e earliest electronic document management BE"MC systems $ere eit/er developed to manage proprietary ile types' or a limited num&er o ile ormats. Many o t/ese systems $ere later re erred to as document imaging systems' &ecause t/e main capa&ilities $ere capture' storage' indeDing and retrieval o image ile ormats. @/ese systems ena&led an organiEation to capture aDes and orms' save copies o t/e documents as images' and store t/e image iles in t/e repository or security and *uick retrieval Bretrieval $as possi&le &ecause t/e system /andled t/e eDtraction o t/e teDt rom t/e document as it $as captured' and t/e teDt indeDer provided teDt retrieval capa&ilitiesC. E"M systems evolved to $/ere t/e system $as a&le to manage any type o ile ormat t/at could &e stored on t/e net$ork. @/e applications gre$ to encompass electronic documents' colla&oration tools' security' and auditing capa&ilities.

,.+

Docu#"nt M'n'-"#"nt 'nd Cont"nt M'n'-"#"nt

@/ere is considera&le con usion in t/e market &et$een document management systems B"M2C and content management systems BCM2C. @/is /as not &een /elped &y t/e vendors' $/o are keen to market t/eir products as $idely as possi&le. @/ese t$o types o systems are very di erent' and serve complementary needs. W/ile t/ere is an ongoing move to merge t/e t$o toget/er Ba positive stepC' it is important to understand $/en eac/ system is appropriate. Docu#"nt M'n'-"#"nt S)!t"#! 5DMS6 "ocument management is certainly t/e older discipline' &orn out o t/e need to manage /uge num&ers o documents in organisations.

+19

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

Mature and $ell4tested' document management systems can &e c/aracterised as ollo$s5 ocused on managing documents' in t/e traditional sense Blike Word ilesC eac/ unit o in ormation BdocumentC is airly large' and sel 4 contained t/ere are e$ Bi anyC links &et$een documents provides limited integration $it/ repository Bc/eck4in' c/eck4out' etcC ocused primarily on storage and arc/iving includes po$er ul $ork lo$ targeted at storing and presenting documents in t/eir native ormat limited $e& pu&lis/ing engine typically produces one page or eac/ document

Note t/at t/is is just a generalised description o a "M2' $it/ most systems o ering a range o uni*ue eatures and capa&ilities. Nonet/eless' t/is does provide a representative outline o common "M2 unctionality. ( typical document management scenario5 A large legal firm purchases a DM0 to track the huge number of advice documents& contracts and briefs. It allo1s la1yers to easily retrieve earlier advice& and to use FprecedentF templates to quickly create ne1 documents. Aou canIt &uild a $e&site $it/ just a "M system Cont"nt M'n'-"#"nt S)!t"#! 5CMS6 Content management is more recent' and is primarily designed to meet t/e gro$ing needs o t/e $e&site and intranet markets. ( content management system can &e summarised as ollo$s5

manages small' interconnected units o in ormation Be.g. $e& pagesC eac/ unit BpageC is de ined &y its location on t/e site eDtensive cross4linking &et$een pages ocused primarily on page creation and editing provides tig/t integration &et$een aut/oring and t/e repository Bmetadata' etcC provides a very po$er ul pu&lis/ing engine Btemplates' scripting' etcC ( typical content management scenario5
+1+

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

A "M0 is purchased to manage the C@@@ page corporate 1ebsite. Template#based authoring allo1s business groups to easily create content& 1hile the publishing system dynamically generates richly# formatted pages. Content management and document management are complementary' not competing tec/nologies. Aou must c/oose an appropriate system i &usiness needs are to &e met.

,.,

Co#<on"nt!

"ocument management systems commonly provide storage' versioning' metadata' security' as $ell as indeDing and retrieval capa&ilities. )ere is a description o t/ese components5 M"t'd't' Metadata is typically stored or eac/ document. Metadata may' or eDample' include t/e date t/e document $as stored and t/e identity o t/e user storing it. @/e "M2 may also eDtract metadata rom t/e document automatically or prompt t/e user to add metadata. 2ome systems also use optical c/aracter recognition on scanned images' or per orm teDt eDtraction on electronic documents. @/e resulting eDtracted teDt can &e used to assist users in locating documents &y identi ying pro&a&le key$ords or providing or ull teDt searc/ capa&ility' or can &e used on its o$n. EDtracted teDt can also &e stored as a component o metadata' stored $it/ t/e image' or separately as a source or searc/ing document collections. Int"-r't on Many document management systems attempt to integrate document management directly into ot/er applications' so t/at users may retrieve eDisting documents directly rom t/e document management system repository' make c/anges' and save t/e c/anged document &ack to t/e repository as a ne$ version' all $it/out leaving t/e application. 2uc/ integration is commonly availa&le or o ice suites and e4mail or colla&oration-group$are so t$are. C'<tur" Images o paper documents using scanners or multi unction printers. Optical C/aracter 7ecognition BOC7C so t$are is o ten used' $/et/er integrated into t/e /ard$are or as stand4alone so t$are' in order to convert digital images into mac/ine reada&le teDt.

+18

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

Ind"7 n@rack electronic documents. IndeDing may &e as simple as keeping track o uni*ue document identi iersJ &ut o ten it takes a more compleD orm' providing classi ication t/roug/ t/e documentsI metadata or even t/roug/ $ord indeDes eDtracted rom t/e documentsI contents. IndeDing eDists mainly to support retrieval. One area o critical importance or rapid retrieval is t/e creation o an indeD topology. Stor'-" 2tore electronic documents. 2torage o t/e documents o ten includes management o t/ose same documentsJ $/ere t/ey are stored' or /o$ long' migration o t/e documents rom one storage media to anot/er B)ierarc/ical storage managementC and eventual document destruction. R"tr "&'( 7etrieve t/e electronic documents rom t/e storage. (lt/oug/ t/e notion o retrieving a particular document is simple' retrieval in t/e electronic conteDt can &e *uite compleD and po$er ul. 2imple retrieval o individual documents can &e supported &y allo$ing t/e user to speci y t/e uni*ue document identi ier' and /aving t/e system use t/e &asic indeD Bor a non4indeDed *uery on its data storeC to retrieve t/e document. More leDi&le retrieval allo$s t/e user to speci y partial searc/ terms involving t/e document identi ier and-or parts o t/e eDpected metadata. @/is $ould typically return a list o documents $/ic/ matc/ t/e userIs searc/ terms. 2ome systems provide t/e capa&ility to speci y a %oolean eDpression containing multiple key$ords or eDample p/rases eDpected to eDist $it/in t/e documentsI contents. @/e retrieval or t/is kind o *uery may &e supported &y previously4&uilt indeDes' or may per orm more time4consuming searc/es t/roug/ t/e documentsI contents to return a list o t/e potentially relevant documents. 2ee also "ocument retrieval. D !tr $ut on S"cur t) "ocument security is vital in many document management applications. Compliance re*uirements or certain documents can &e *uite compleD depending on t/e type o documents. ?or instance t/e )ealt/ Insurance !orta&ility and (ccounta&ility (ct B)I!((C re*uirements dictate t/at medical documents /ave certain security re*uirements. 2ome document management systems /ave a rig/ts management module t/at allo$s an administrator to give access to documents &ased on type to only certain people or groups o people. 3or0/(o2

+1=

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

Work lo$ is a compleD pro&lem and some document management systems /ave a &uilt in $ork lo$ module. @/ere are di erent types o $ork lo$. Usage depends on t/e environment t/e E"M2 is applied to. Manual $ork lo$ re*uires a user to vie$ t/e document and decide $/o to send it to. 7ules4&ased $ork lo$ allo$s an administrator to create a rule t/at dictates t/e lo$ o t/e document t/roug/ an organiEation5 or instance' an invoice passes t/roug/ an approval process and t/en is routed to t/e accounts paya&le department. "ynamic rules allo$ or &ranc/es to &e created in a $ork lo$ process. ( simple eDample $ould &e to enter an invoice amount and i t/e amount is lo$er t/an a certain set amount' it ollo$s di erent routes t/roug/ t/e organiEation. Co(('$or't on Colla&oration s/ould &e in/erent in an E"M2. "ocuments s/ould &e capa&le o &eing retrieved &y an aut/oriEed user and $orked on. (ccess s/ould &e &locked to ot/er users $/ile $ork is &eing per ormed on t/e document. V"r! on n0ersioning is a process &y $/ic/ documents are c/ecked in or out o t/e document management system' allo$ing users to retrieve previous versions and to continue $ork rom a selected point. 0ersioning is use ul or documents t/at c/ange over time and re*uire updating' &ut it may &e necessary to go &ack to a previous copy.

,.8

I!!u"! Addr"!!"d n Docu#"nt M'n'-"#"nt

@/ere are several common issues t/at are involved in managing documents' $/et/er t/e system is an in ormal' ad4/oc' paper4&ased met/od or one person or i it is a ormal' structured' computer en/anced system or many people across multiple o ices. Most met/ods or managing documents address t/e ollo$ing areas5
W/ere $ill documents &e storedN W/ere $ill people need to go to access documentsN !/ysical journeys to iling ca&inets and ile rooms are analogous to t/e onscreen navigation re*uired to use a document management system. )o$ $ill documents &e iledN W/at met/ods $ill &e used to organiEe or indeD t/e documents to assist in later retrievalN "ocument management systems $ill typically use a data&ase to store iling in ormation. )o$ $ill documents &e oundN @ypically' retrieval encompasses &ot/ &ro$sing t/roug/ documents and searc/ing or speci ic in ormation. )o$ $ill documents &e kept secureN )o$ $ill unaut/oriEed

Loc't on

F ( n-

R"tr "&'( S"cur t)

+1,

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

personnel &e prevented rom reading' modi ying or destroying documentsN D !'!t"r R"co&"r) R"t"nt on <"r od Arc1 & nD !tr $ut on 3or0/(o2 Cr"'t on )o$ can documents &e recovered in case o destruction rom ires' loods or natural disastersN )o$ long s/ould documents &e kept' i.e. retainedN (s organiEations gro$ and regulations increase' in ormal guidelines or keeping various types o documents give $ay to more ormal 7ecords Management practices. )o$ can documents &e preserved or uture reada&ilityN )o$ can documents &e availa&le to t/e people t/at need t/emN I documents need to pass rom one person to anot/er' $/at are t/e rules or /o$ t/eir $ork s/ould lo$N )o$ are documents createdN @/is *uestion &ecomes important $/en multiple people need to colla&orate' and t/e logistics o version control and aut/oring arise.

Aut1"nt c't on Is t/ere a $ay to vouc/ or t/e aut/enticity o a documentN

,.5

U! n- IML n Docu#"nt 'nd In/or#'t on M'n'-"#"nt

@/e attention paid to QML BEDtensi&le Markup LanguageC' $/ose +.9 standard $as pu&lis/ed ?e&ruary +9' +::<' is impressive. QML /as &een /eralded as t/e neDt important internet tec/nology' t/e neDt step ollo$ing )@ML' and t/e natural and $ort/y companion to t/e #ava programming language itsel . Enterprises o all stripes /ave rapturously em&raced QML. (n important role or QML is in managing not only documents &ut also t/e in ormation components on $/ic/ documents are &ased. Docu#"nt M'n'-"#"nt: Or-'n > n- F ("! "ocument management as a tec/nology and a discipline /as traditionally augmented t/e capa&ilities o a computerIs ile system. %y ena&ling users to c/aracteriEe t/eir documents' $/ic/ are usually stored in iles' document management systems ena&le users to store' retrieve' and use t/eir documents more easily and po$er ully t/an t/ey can do $it/in t/e ile system itsel . Long &e ore anyone t/oug/t o QML' document management systems $ere originally developed to /elp la$ o ices maintain &etter control over and access to t/e many documents t/at legal pro essionals generate. @/e &asic mec/anisms o t/e irst document management systems per ormed' among ot/ers' t/ese simple &ut po$er ul tasks5 (dd in ormation a&out a document to t/e ile t/at contains t/e document OrganiEe t/e user4supplied in ormation in a data&ase
+11

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

Create in ormation documents

a&out

t/e

relations/ips

&et$een

di erent

In essence' document management systems created li&raries o documents in a computer system or a net$ork. @/e document li&rary contained a Mcard catalogM $/ere t/e user4supplied in ormation $as stored and t/roug/ $/ic/ users could ind out a&out t/e documents and access t/em. @/e card catalog $as a data&ase t/at captured in ormation a&out a document' suc/ as t/ese5
Aut1or5 $/o $rote or contri&uted to t/e document M' n to< c!5 $/at su&jects are covered in t/e document Or - n't on d't"5 $/en $as it started Co#<("t on d't"5 $/en $as it inis/ed R"('t"d docu#"nt!5 $/at ot/er documents are relevant to t/is

document A!!oc 't"d '<<( c't on!5 $/at programs are used to process t/e document C'!"5 to $/ic/ legal case Bor ot/er &usiness processC is t/e document related

(rmed $it/ a data&ase o suc/ in ormation a&out documents' users could ind in ormation in more sensi&le and intuitive $ays t/an scanning di erent directoriesI lists o contents' /oping t/at a ileIs name mig/t reveal $/at t/e ile contained. Many people consider document management systemsI irst ac/ievement to /ave created Ma ile system $it/in t/e ile system.M 2oon' document management systems &egan to provide additional and valua&le unctionality. %y enric/ing t/e data&ases o in ormation a&out t/e documents Bt/e metadataC' t/ese systems provided t/ese capa&ilities5
V"r! on tr'c0 n-5 see /o$ a document evolves over time Docu#"nt !1'r n-5 see in $/at &usiness processes t/e document is

used and re4used E("ctron c r"& "25 ena&le users to add t/eir comments to a document $it/out actually c/anging t/e document itsel Docu#"nt !"cur t)5 re ine t/e di erent types o access t/at di erent users need to t/e document Pu$( !1 n- #'n'-"#"nt5 control t/e delivery o documents to di erent pu&lis/ing process *ueues 3or0/(o2 nt"-r't on5 associate t/e di erent stages o a documentIs li e4cycle $it/ people and projects $it/ sc/edules @/ese critical capa&ilities Bamong ot/ersC o document management systems /ave proven enormously success ul' ueling a multi4&illion dollar &usiness.

+1.

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

IML: M'n'- n- Docu#"nt Co#<on"nt! QML and its parent tec/nology' 2GML B2tandard GeneraliEed Markup LanguageC' provide t/e oundation or managing not only documents &ut also t/e in ormation components o $/ic/ t/e documents are composed. @/is is due to some nota&le c/aracteristics o QML data. Docu#"nt! &!. F ("! In QML' documents can &e seen independently o iles. One document can comprise many iles' or one ile can contain many documents. @/is is t/e distinction &et$een t/e <1)! c'( 'nd (o- c'( !tructur" o in ormation. QML data is primarily descri&ed &y its logical structure. In a logical structure' principal interest is placed on $/at t/e pieces o in ormation are and /o$ t/ey relate to eac/ ot/er' and secondary interest is placed on t/e p/ysical items t/at constitute t/e in ormation. 7at/er t/an relying on ile /eaders and ot/er system4speci ic c/aracteristics o a ile as t/e primary means or understanding and managing in ormation' QML relies on t/e markup in t/e data itsel . ( c/apter in a document is not a c/apter &ecause it resides in a ile called c/apter+.doc &ut &ecause t/e c/apterIs content is contained in t/e nc/aptero and n-c/aptero element tags. %ecause elements in QML can /ave attri&utes' t/e components o a document can &e eDtensively sel 4descriptive. ?or eDample' in QML you can learn a lot a&out t/e c/apter $it/out actually reading it i t/e c/apterIs markup is ric/ in attri&utes' as in nc/apter languageHMEnglis/M su&jectHMcolonial economicsM revisionVdateHM+::<9.8=M aut/orHM#oan Q. !ringleM t/esisVadvisorHM7amona Winkel/o Mo. W/en t/e elements carry sel 4descri&ing metadata $it/ t/em' systems t/at understand QML syntaD can operate on t/ose elements in use ul $ays' just like a traditional document management system can. %ut t/ere is a major di erence. In/or#'t on &!. Docu#"nt! QML markup provides metadata or all components o a document' not merely t/e o&ject t/at contains t/e document itsel . @/is makes t/e pieces o in ormation t/at constitute a document just as managea&le as t/e ields o a record in a data&ase. %ecause QML data ollo$s syntactic rules or $ell4 ormedness and proper containment o elements' document management systems t/at can correctly read and parse QML data can apply t/e unctions o document management system' suc/ as t/ose mentioned a&ove' to any and all in ormation components inside t/e document. @/e ocus on in ormation rat/er t/an documents rom QML o ers some
+1;

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

important capa&ilities5
R"u!" o/ In/or#'t on

W/ile standard document management systems do o er some measure o in ormation reuse t/roug/ ile s/aring' in ormation management systems &ased on QML or 2GML ena&le people to s/are pieces o common in ormation $it/out storing t/e piece o in ormation in multiple places.
In/or#'t on C'r&"!t n-

%y ena&ling people to ocus on in ormation components t/at make up documents rat/er t/an on t/e documents t/emselves' t/ese systems can identi y and capture use ul in ormation components t/at /ave ongoing value M&uriedM inside documents $/ose value as documents is limited. @/at is' a particular document may &e use ul only or a s/ort time' &ut c/unks o in ormation inside t/at document may &e reusa&le and valua&le or a longer period.
F n"@Gr'nu('r t) T"7t@M'n'-"#"nt A<<( c't on!

%ecause t/e in ormation components in QML documents are identi ia&le' manipulata&le' and managea&le' QML in ormation management tec/nology can support real economies in applications suc/ as translation o tec/nical manuals. E&'(u't n- Product O//"r n-! W/ile t/e general $orld o document management and in ormation management is moving to$ard adoption o structured in ormation and use o QML and 2GML' some product o erings distinguis/ t/emselves &y using underlying data&ase management products $it/ native support or o&ject4oriented data. O&ject4oriented data matc/es t/e structure o QML data *uite $ell and data&ase systems t/at compre/end o&ject4 oriented data adapt $ell to t/e tasks o managing QML in ormation. %y contrast' ot/er in ormation management products t/at compre/end QML or 2GML data use relational data&ase systems and provide t/eir o$n o&ject4oriented eDtensions to t/ose data&ase systems in order to compre/end o&ject4oriented data suc/ as QML or 2GML data' and relying on suc/ implementations /ave also garnered success and respect in t/e document management marketplace.

,.?

T)<"! o/ Docu#"nt M'n'-"#"nt S)!t"#!


Main--!yrus "M2 Open3M

(l resco Bso t$areC Colum&ia2o t


+1<

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

Comput/inkIs 0ie$Wise "idga/ "ocumentum "oc!oint )umming&ird "M Inter$ovenIs Worksite In onic "ocument Management BU3C I2I2 !apyrus 3no$ledge@ree Laser ic/e Livelink

O=spaces OracleIs 2tellent !erceptive 2o t$are Fuestys 2olutions 7edmap 7eport8We& 2/are!oint 2aperion 2(! 3MLC 2(! Net$eaver @7IM ConteDt QeroD "ocus/are

8.:

CONCLUSION

"ocument management systems /ave added variety to t/e pool o options availa&le in datase managemnt in corpcorations. Many products are o t/e s/el or end users to c/oose rom. @/e use o document management systems /as encouraged t/e concept and drive or paperless o ice and transactions. It is a concept t/at truly makes t/e uture &ig/t as man tend to$ard greater e iciency &y eliminating use o papers and /ard copies o data and in ormation.

5.:

SUMMARY
( docu#"nt #'n'-"#"nt !)!t"# B"M2C is a computer system Bor set o computer programsC used to track and store electronic documents and-or images o paper documents %eginning in t/e +:<9s' a num&er o vendors &egan developing systems to manage paper4&ased documents. @/ese systems managed paper documents' $/ic/ included not only printed and pu&lis/ed documents' &ut also p/otos' prints' etc. @/ere is considera&le con usion in t/e market &et$een document management systems B"M2C and content management systems BCM2C. "ocument management systems commonly provide storage' versioning' metadata' security' as $ell as indeDing and retrieval capa&ilities. )ere is a description o t/ese components5 @/ere are several common issues t/at are involved in managing documents' $/et/er t/e system is an in ormal' ad4/oc' paper4&ased met/od or one person or i it is a ormal' structured' computer en/anced system or many people across multiple o ices @/e attention paid to QML BEDtensi&le Markup LanguageC' $/ose +.9 standard $as pu&lis/ed ?e&ruary +9' +::<' is impressive. QML /as &een /eralded as t/e neDt important Internet tec/nology' t/e neDt
+1:

M%( ;1<

"(@(%(2E M(N(GEMEN@ 2A2@EM

step ollo$ing )@ML' and t/e natural and $ort/y companion to t/e #ava programming language itsel . Enterprises o all stripes /ave rapturously em&raced QML.

?.:

TUTOR@MARAED ASSIGNMENT

+. List 1 c/aracteristics o a document management system 8. "iscuss &rie ly $ork lo$ in t/e conteDt o it as a component o document management system

7.:

REFERENCESBFURTCER READINGS

%%C 4/8g8 guide 2/oe&oD 2torage. #ames 7o&ertson' !u&lis/ed on +, ?e&ruary 899=.( Miles L. Mat/ieu' Ernest (. CapoEEoli B8998C. MThe (aperless 2ffice: Accepting Digiti3ed dataM B!"?C. @roy 2tate University. 3evin Craine. M+xcerpts from Designing a 0trategyM B)@MLC. Craine Communications Group. Document

+.9

Você também pode gostar