Você está na página 1de 182

CS1254 DATABASE MANAGEMENT

SYSTEMS
1
DEPARTMENT OF COMPUTER SCIENCE AND
ENGINEERING
Fifth
Semester
CS1 25 4 DATABASE MANAGEMEN T SYSTEMS
Prepared
By
H.Pra
ha
!e"t#rer$
CSE
CS1254 DATABASE MANAGEMENT
SYSTEMS
2
CS1 25 4 DATABASE MANAGEMEN T SYSTEMS
UNIT I FUNDAMENTALS 9
Purpose of database system Views of data Data models Database languages Database system architecture
Database users and administrator Entity Relationship model (E-R Model ) E-R diagrams Introduction to
relational databases
UNIT II RELATIONAL MODEL 9
!he relational model !he catalog !ypes "eys Relational algebra Domain relational calculus !uple
relational calculus #undamental operations $dditional operations %&' fundamentals Integrity
!riggers
%ecurity $d(anced %&' features Embedded %&' Dynamic %&' Missing information Views
Introduction to distributed databases and client)ser(er databases
UNIT III DATABASE DESIGN 9
#unctional dependencies *on-loss decomposition #unctional dependencies #irst %econd !hird normal
forms Dependency preser(ation +oyce)codd normal form Multi-(alued dependencies and fourth normal
form ,oin dependencies and fifth normal form
UNIT IV TRANSACTIONS 9
!ransaction concepts !ransaction reco(ery $-ID properties %ystem reco(ery Media reco(ery !wo
phase commit %a(e points %&' facilities for reco(ery -oncurrency *eed for concurrency 'oc.ing
protocols !wo phase loc.ing Intent loc.ing Deadloc. %eriali/ability Reco(ery Isolation 'e(els %&'
#acilities for -oncurrency.
UNIT V IMPLEMENTATION TECHNIQUES 9
0(er(iew of Physical %torage Media Magnetic Dis.s R$ID !ertiary %torage #ile 0rgani/ation
0rgani/ation of Records in #iles Inde1ing and 2ashing 0rdered Indices +3 !ree Inde1 #iles + !ree
inde1 files %tatic hashing Dynamic hashing &uery processing o(er(iew -atalog information for cost
estimation %election operation %orting ,oin operation Database !uning
Tot!" 45
TE#T BOO$S
4 %ilberschat/5 $5 "orth5 2# and %udharshan5 %., Database System Concepts, 5th Edition, Tata Mc-
6raw
2ill5 7889
2. Date, C. J., Kannan, A. and Samynathan, S., An !nt"od#ction to Database Systems, $th Edition,
%ea"son
Education5 7889
REFERENCES
&. E'mas"i, (. and )a*athe, S.+., ,#ndamenta's o- Database Systems, .
th
Edition5 Pearson ) $ddison
:esley5
788;
2. (ama/"ishnan, (., Database Mana0ement Systems, 1"d Edition, Mc-6raw 2ill5 788<
1. Sin0h, S. K., Database Systems Concepts, Desi0n and App'ications, &
st
Edition5 Pearson Education5
7889
CS1254 DATABASE MANAGEMENT
SYSTEMS
3
2)!T !
!)T(3D2CT!3)
Purpose of Database systems- Views of data- Data Models- Database 'anguages- Database system
$rchitecture Database users and $dministrator- Entity Relationship Model (ER model) E-R
Diagram Introduction to relational databases
&. !)T(3D2CT!3)
Dt= "nown facts that can be recorded that ha(e implicit meaning
Eg %tudent roll no5 names5 address etc
Dt%&'" collection of inter-related data organi/ed meaningfully for a specific purpose.
DBMS" D+M% is a collection of interrelated data and a set of program to access those data !he primary
goal of a D+M% is to pro(ide a way to store and retrie(e database information that is both convenient and
efficient
Dt%&' S(&t')" Database and D+M% collecti(ely .nown as database system
I*!R0D>-!I0* !0 #I'E $*D D$!$+$%E %?%!EM%=
D t % &' A ** !+ , t+o - &
+an.ing= all transactions
$irlines= reser(ations5 schedules
>ni(ersities= registration5 grades
%ales= customers5 products5 purchases
0nline retailers= order trac.ing5 customi/ed recommendations
Manufacturing= production5 in(entory5 orders5 supply chain
2uman resources= employee records5 salaries5 ta1 deductions
-redit card transactions
!elecommunications @ #inance
CS1254 DATABASE MANAGEMENT
SYSTEMS
4
2. %2(%3SE 3, DATA+ASE S4STEMS
D . / % , 0 & o1 Co-2'-t +o- ! F +! ' P . o , ' && +- 3 S (&t ' )
+. Dt .'45-4-,( -4 +-,o-&+&t'-,(
%ince the files and application programs are created by different programmers o(er a long
period of time5 the files ha(e different formats and the programs may be written in se(eral
programming language !he same piece of information may be duplicated in se(eral files
Fo. E6)*!'" !he address and phone number of particular customer may appear in a file that
consists of personal information and in sa(ing account records file also !his redundancy
leads to data consistency that is5 the (arious copies of the same data may no longer agree
Fo. '6)*!'" a changed customer address may be reflected in personal information file5 but
not in sa(ing account records file
++. D+11+,5!t( +- ,,'&&+-3 4t
-on(entional file processing en(ironments do not allow needed data to be retrie(ed in a
con(enient and efficient manner
Fo. E6)*!'" %uppose that ban. officer needs to find out the names of all customers who
li(e within the city5s .&&627 8ip code. The ban/ o--icer has now two choices= Either get
the list of customers and e1tract the needed information manually5 or as. the data
processing
department to ha(e a system programmer write the necessary application program +oth
alternati(es are unsatisfactory
+++. Dt +&o!t+o-
%ince5 data is scattered in (arious files5 and files may be in different formats5 it is difficult to
write new application programs to retrie(e appropriate data
+2. Co-,5..'-t ,,'&& -o)!+'&
In order to impro(e the o(erall performance of the system and obtain a faster response time
many systems allow multiple users to update the data simultaneously In such en(ironment5
interaction of concurrent updates may results in inconsistent data
Fo. E6)*!'" -onsider ban. account $5 with AB88If two customers with draw funds (say
AB8 and A488 resp ) from account $ at the same time5 the result of the concurrent e1ecutions
AC885 rather than A<B8 In order to guard against this possibility5 some form of super(ision
must be maintained in the system
2. Ato)+,+t( P.o%!')
%ystem failure will lead to atomicity problem
CS1254 DATABASE MANAGEMENT
SYSTEMS
5
Fo. E6)*!'" #ailure during transfer of fund from system $ to $ It will be debited from $
but not credited to + leading to wrong transaction
2+. Co-,5..'-t A,,'&& A-o)!+'&
In order to impro(e the o(erall performance of the system and obtain a faster response time
many systems allow multiple users to update the data simultaneously In such en(ironment5
interaction of concurrent updates may result in inconsistent data
Fo. E6)*!'" -onsider ban. account A5 containing AB88 If two customers withdraw funds
say AB8 and A488 respecti(ely) from account A at about the same time5 the result of the
concurrent e1ecutions may lea(e the account in an incorrect (or inconsistent) state +alance
will be AC88 instead of A<B8 !o protect against this possibility5 the system must maintain
some form of super(ision
2++. S',5.+t( *.o%!')&
*ot e(ery user of the database system should be able to access all the data %ystem should be
protected using proper security
Fo. E6)*!'" In a ban.ing system5 pay roll personnel should be only gi(en authority to see
the part of the database that has information about the (arious ban. employees !hey do not
need access to information about customer accounts
%ince application programs added to the system in an ad-hoc manner5 it is difficult to enforce
such security constraints
2+++. I-t'3.+t( *.o%!')&
!he data (alues stored in the database must satisfy certain types of consistency constrains
Fo. E6)*!'" !he balance of a ban. account may ne(er fall below a prescribed amount (say
A488)!hese constraints are enforced in the system by adding appropriate code in the (arious
application programs
A42- t3'& o1 Dt% &'
Data base is a way to consolidate and control the operational data centrally It is a better way to control
the operational data !he ad(antages of ha(ing a centrali/ed control of data are=
+. R'45-4-,( ,- %' .'45,'4
In non-database systems5 each application or department has its own pri(ate files resulting in
considerable amount of redundancy of the stored data !hus storage space is wasted +y ha(ing a
centrali/ed database most of this can be a(oided
++. I-,o-&+&t'-,( ,- %' 2o+4'4
CS1254 DATABASE MANAGEMENT
SYSTEMS
6
:hen the same data is duplicated and changes are made at one side5 which is not propagated
to the other site5 it gi(es rise to inconsistency !hen the two entries regarding the same data will not
agree %o5 if the redundancy is remo(ed5 chances of ha(ing inconsistent data are also remo(ed
+++. T7' 4t ,- %' &7.'4
!he data stored from one application5 can be used for another application !hus5 the data of
database stored for one application can be shared with new applications
+2. St-4.4& ,- %' '-1o.,'4
:ith central control of the database5 the D+$ can ensure that all applicable standards are
obser(ed in the representation of the data
2. S',5.+t( ,- %' '-1o.,'4
D+$ can define the access paths for accessing the data stored in database and he can define
authori/ation chec.s whene(er access to sensiti(e data is attempted
2+. I-t'3.+t( ,- %' )+-t+-'4
Integrity means that the data in the database is accurate -entrali/ed control of the data helps
in permitting the administrator to define integrity constraints to the data in the database
1. 9!E: 3, DATA
$ maDor purpose of a database system is to pro(ide users with an abstract (iew of the data !hat is5
the system hides certain details of how the data are stored and maintained
D t % &t . , t+o-
!he -omple1ity is hidden from the users through se(eral le(el of abstraction !here are three le(els
of data abstraction=
i P7(&+,! !'2'!" It is the lowest le(el of abstraction that describes how th e d a ta a r e a c tual l y stor e d
!he physical le(el describes comple1 low-le(el data structures in details
ii Lo3+,! !'2'!" It is the ne1t higher le(el of abstraction that describes w h a t d a ta a r e stor e d in th e
d a tab a se and what relationships e1ist among those data
iii V+'/ !'2'!" It is the highest le(el of abstraction that describes only part of the entire database
CS1254 DATABASE MANAGEMENT
SYSTEMS
7
F+35.' !he three le(els of data abstraction
D t I -4 ' * ' -4 ' - , '
!he ability to modify a scheme definition in one le(el without affecting a scheme definition in the
ne1t higher le(el is called data independence !here are two le(els of data independence=
1. P7(&+,! 4t +-4'*'-4'-,' is the ability to modify the physical scheme without causing application
programs to be rewritten Modifications at the physical le(el are occasionally necessary in order to impro(e
performance
2. Lo3+,! 4t +-4'*'-4'-,' is the ability to modify the conceptual scheme without causing application
programs to be rewritten Modifications at the conceptual le(el are necessary whene(er the logical structure
of the database is altered
'ogical data independence is more difficult to achie(e than physical data independence since
application programs are hea(ily dependent on the logical structure of the data they access
I- &t-,'& - 4 &,7')&
Database change o(er times as information is inserted and deleted !he collection of information
stored in the database at a particular moment is called an +-&t-,' of the database
!he o(erall design of the database is called the database &,7')
T(*'& o1 4t%&' &,7')&
+. P7(&+,! &,7')= It describes the database design at the physical le(el
++. Lo3+,! &,7')" It describes the database design at the physical le(el
CS1254 DATABASE MANAGEMENT
SYSTEMS
8
+++. S5%&,7')" $ database may also ha(e se(eral subschemas at the (iew le(el called as subschemas
that describe different (iews of the database
.. DATA M3DE;S
>nderlying structure of the database is called as data models
It is a collection of conceptual tools for describing data5 data relationships5 data semantics5 and
consistency constraints
It is a way to describe the design of the database at physical5 logical and (iew le(el
D+11'.'-t t(*'& o1 4t )o4'!& .'=
Entity relationship model
Relational model
2ierarchical model
*etwor. model
0bDect +ased model
0bDect Relational model
%emi %tructured Data model
E - t+ t( .' !t+o - & 7 +* ) o 4 ' !
It is based on a collection of real world things or obDects called entities and the relationship among
these obDects
!he Entity relationship model is widely used in database design
R ' !t+o - ! Mo4'!
!he relational model uses a collection of tables to represent both data and the relationship among
those data
Each table has multiple columns and each column has a uniEue name
%oftware such as 0racle5 Microsoft %&' %er(er and %ybase are based on the relational model
Eg Record +ased model It is based on fi1ed format records of se(eral types
H + '. ., 7 +,! M o 4 ' !
2ierarchical database organi/e data in to a tree data structure such that each record type has only one
owner
2ierarchical structures were widely used in the first main frame database management systems
'in.s are possible (ertically but not hori/ontally or diagonally
CS1254 DATABASE MANAGEMENT
SYSTEMS
9
A42-t3'&
2igh speed of access to large datasets
Ease of updates
%implicity= the design of a hierarchical database is simple
Data security= 2ierarchical model was the first database model that offered the data security
that is pro(ided and enforced by the D+M%
Efficiency= !he hierarchical database model is a (ery efficient one when the database
contains a large number of transactions5 using data whose relationships are fi1ed
D+&42-t3'&
Implementation comple1ity
Database management problems
'ac. of structural independence
N ' t/ o . 0 M o 4 ' !
!he model is based on directed graph theory
!he networ. model replaces the hierarchical tree with a graph thus allowing more general
connections among the nodes
!he main difference of the networ. model from the hierarchical model is its ability to handle many-
to-many (n= n) relationship or in other words5 it allows a record to ha(e more than one parent
E6)*!' is5 an employee wor.ing for two departments
S)*!' -'t/o.0 )o4'!
A42-t3'&"
-onceptual simplicity =
-apability to handle more relationship types=
Data independence=
D+&42-t3'&"
Detailed structural .nowledge is reEuired
'ac. of structural independence
O % 8' , t9B& ' 4 D t ) o 4 ' !
!he obDect- oriented model is an e1tension of E-R model
!he obDect- oriented model is based on a collection of obDects
$n obDect contains (alues stored in instance (ariables within the obDect
$n obDect also contains bodies of code that operate in the obDect these bodies of code are called
methods
0bDects that contain the same types of (alues and methods are grouped together into classes
A42-t3'&"
$pplications reEuire less code
$pplications use more natural data model
-ode is easier to maintain
It pro(ides higher performance management of obDects and comple1 interrelationships
between obDects
0bDect-oriented features impro(e producti(ityData access is easy
O % 8' , t R' !t+o - ! Mo4 ' !
0bDect-relational data model combines the feature of modern obDect-oriented programming
languages with relational database features
%ome of the obDect-relational systems a(ailable in the mar.et are I+M D+7 uni(ersal ser(er5
3"ac'e Co"po"ation5s oracle F5 Microsoft -orporations %&' ser(er ; and so
on
S ' ) + S t. 5, t5. ' 4 D t M o 4 ' !
!his data model allows the indi(idual data items of same type to ha(e different sets of attributes
0ther data model allows a particular type of data item to ha(e same set of attributes
E1tensible Mar.up 'anguage (GM') is used to represent structured data
5. DATA+ASE ;A)<2A<ES
$ database system pro(ides
$ Dt D'1+-+t+o- L-353' to specify the database schema (DD')
$ Dt M-+*5!t+o- L-353' to e1press database Eueries and updates
Data definition and data manipulation languages are not two separate languages but part of a single
database language such as %&' language
D t 4 ' 1 +- +t+o- ! - 3 5 3 '
DD' specifies the database schema and some additional properties to data
!he storage structure and access methods are specified using specified using special type of DD' called
s 4t &to.3' -4 4t 4'1+-+t+o- !-35age
!he data (alues stored in the database must satisfy certain ,o-&+&t'-,( ,o-&t.+-t& #or e1ample5
suppose the balance on an account should not fall below A488
Database system concentrates on constraints that ha(e less o(erload
1. Do)+- Co-&t.+-t&"
Domain of possible (alue should be associated with e(ery attributes
Eg integer type5 character type5 date)time type
Declaring attributes to a particular domain will act as a constraint on that (alue
!hey are tested as and when (alues are entered in to database
2. R'1'.'-t+! Co-&t.+-t&"
In some cases there will be (alue that appears in one relation for a gi(en set of attributes also
appears for a certain set of attributes in some other relation %uch constraint is called
Referential -onstraints
If any modification (iolates the constraints then the action that caused the (iolation should be
reDected
:. A&&'.t+o-&
It is a condition that database should always satisfy
Domains and referential integrity are special form of assertion
Eg E(ery loan should ha(e a customer whose account balance is minimum of A488888
Modifications to database should not cause (iolation to assertion
4. A5t7o.+;t+o-
!he users are differentiated as per the access permit gi(en to them on the different data of the
database !his is .nown as authori/ation
!he most common authori/ations are
+. R'4 5t7o.+;t+o-
$llows reading but no modification of data
++. I-&'.t 5t7o.+;t+o-
$llows insertion of new data but no modification of e1isting data
+++. U*4t' 5t7o.+;t+o-
$llows modification but not deletion
+2. D'!'t' 5t7o.+;t+o-
$llows deletion of data
!he users may be assigned with all5 none or combination of these types
!he DD' gets some input and generates some output
!his output is placed in data dictionary which contains Meta data
Meta data is data about data
Data Dictionary is a special type of table which can only be accessed and updated by database
system
Database system consults the Data Dictionary before reading or modifying actual data
D t M - +*5 !t+o- L - 3 5 3'
$ 4t9)-+*5!t+o- !-353' <DML= is a language that helps users to access or manipulate data
$ Euery is a statement reEuesting the retrie(al of information
!he portion of DM' that in(ol(es information retrie(al is called as Euery language
!here are basically two types of DM'=
P.o,'45.! DML&
>ser should specify what data are needed and how to get those data
D',!.t+2' DML& (-o-*.o,'45.! DM's)
>ser should specify what data are needed without specifying how to get those data !his is
easier to learn and user than procedural DM'
Data manipulation that can be performed using DM' are
!he retrie(al of information stored in the database
!he insertion of new information into the database
!he deletion of information from the database
!he modification of information stored in the database
=. DATA+ASE S4STEM A(C>!TECT2(E
$ database system is partitioned into modules that deal with each of the responsibilities of the
o(erall system !he functional components of a database system can be broadly di(ided into
%torage Manager
&uery Processor
!he database system architecture is influenced by the underlying computer
architecture !he database system can be centrali/ed or client ser(er
Database systems are partitioned into two or three parts
In t/o t+'. .,7+t',t5.'5 the application is partitioned into a component that resides at the client
machine and in(o.es database functionality at the ser(er machine through Euery language
$pplication program interface standards li.e 0D+- and ,D+- are used for interaction between the
client and the ser(er
T/o t+'. .,7+t',t5.'
In t7.'' t+'. .,7+t',t5.'5 the client machines act as a front end and do not contain any direct database
calls
!he client end communicates with the application ser(ers through interface
!he application ser(er interacts with database system to access data
!he business logic of application says what actions to be carried out under what condition
!hree tier is more appropriate for large applications
T7.'' t+'. .,7+t',t5.'
S to.3' M- 3' .
$ storage manager is a program module that pro(ides the interface between the low le(el data stored in
the database and the application programs and Eueries submitted to the system
!he storage manager is responsible for the interaction with the file manager
!he storage manager translates the (arious DM' statements into low-le(el file system commands !hus5
the storage manager is responsible for storing5 retrie(ing5 and updating data in the database
Co)*o-'-t& o1 t7' &to.3' )-3'. .'"
1. A5t7o.+;t+o- -4 +-t'3.+t( )-3'.= It tests for satisfaction of (arious integrity constraints and
chec.s the authority of users accessing the data
2. T.-&,t+o- )-3'.= It ensures that the database remains in a consistent state despite system
failures5 and concurrent e1ecutions proceed without conflicting
:. F+!' )-3'.= It manages the allocation of space on dis. storage and the data structures used to
represent information stored on dis.
4. B511'. )-3'.= It is responsible for fetching data from dis. storage into main memory and to
decide what data to cache in main memory It enables the database to handle data si/es that are much
larger than the si/e of the main memory !he storage manager implements se(eral data structures as
part of physical system implementation
i Dt 1+!'&= which store the database itself
ii Dt 4+,t+o-.(= It contains metadata that is data about data !he schema of a table is an
e1ample of metadata $ database system consults the data dictionary before reading and
modifying actual data
iii I-4+,'&= :hich pro(ide fast access to data items that hold particular (alues
T 7 ' Q 5 '. ( P . o ,' &&o.
!he Euery processor is an important part of the database system It helps the database system to simplify
and facilitate access to data !he Euery processor components include=
1. DDL +-t'.*.'t'.5 which interprets DD' statements and records the definitions in the data
dictionary
2. DML ,o)*+!'.5 which translates DM' statements in a Euery language into an e(aluation plan
consisting of low-le(el instructions that the Euery e(aluation engine understands
$ Euery can be translates into any number of e(aluations plans that all gi(e the same result
!he DM' compiler also performs Euery optimi/ation5 that is5 it pic.s up the lowest cost
e(aluation plan from among the alternati(es
:. Q5'.( '2!5t+o- '-3+-'5 which e1ecutes low-le(el instructions generated by the DM'
compiler
Dt%&' S(&t') St.5,t5.'"
7. DATA+ASE 2SE(S A)D ADM!)!ST(AT3(
People who wor. with a database can be categori/ed as=
Database >sers
Database administrators
>.1. DATABASE USERS
!here are four types of database users5 differentiated by the way they interact with the system
1. N+2' 5&'.&
*ai(e users interact with the system by in(o.ing one of the application programs that ha(e
been written pre(iously
*ai(e users are typical users of form interface5 where the user can fill in appropriate fields of
the form
*ai(e users may also simply read reports generated from the database
2. A**!+,t+o- P.o3.))'.&
$pplication programmers are computer professionals who write application programs
Rapid application de(elopment (R$D) tools enable the application programmer to construct
forms and reports without writing a program
%pecial types of programming languages that combine control structures with data
manipulation language !hese languages5 sometimes called fourth-generation languages.
:. So*7+&t+,t'4 5&'.&
%ophisticated users interact with the system without writing programs Instead5 they form their
reEuests in a database Euery language
!hey submit each such Euery to a ?5'.( *.o,'&&o. that the storage manager understands
O-!+-' -!(t+,! *.o,'&&+-3 <OLAP= tools simplify analysis and 4t )+-+-3 tools specify
certain .inds of patterns in data
4. S*',+!+;'4 5&'.&
%peciali/ed users are sophisticated users who write speciali/ed database applications that do not
fit into the traditional data-processing framewor.
!he applications are computer-aided design systems5 .nowledge base and e1pert systems5
systems that store data with comple1 data types
>.2. DATABASE ADMINISTRATORS
$ person who has such central control o(er the system is called a 4t%&' 4)+-+&t.to. (DBA)
T7' 15-,t+o-& o1 DBA +-,!54'"
S,7') 4'1+-+t+o- !he D+$ creates the original database schema by e1ecuting a set of data definition
statements in the DD'
Sto.3' &t.5,t5.' -4 ,,'&&9)'t7o4 4'1+-+t+o-
S,7') -4 *7(&+,!9o.3-+;t+o- )o4+1+,t+o- !he D+$ carries out changes to the schema and
physical organi/ation to reflect the changing needs of the organi/ation
G.-t+-3 o1 5t7o.+;t+o- 1o. 4t ,,'&& +y granting different types of authori/ation5 the database
administrator can regulate which parts of the database (arious users can access
$uthori/ation information is .ept in a special system structure that the database system consults whene(er
someone attempts to access the data in the system
Ro5t+-' )+-t'--,'. E?amp'es o- the database administ"ato"5s "o#tine maintenance acti*ities a"e@
4 periodically bac.ing up the database
7 Ensuring that enough free dis. space
< Monitoring Dobs running on the database and ensuring that performance is not degraded by (ery
e1pensi(e tas.s submitted by some users
C Ensuring that performance is not degraded by (ery e1pensi(e tas.s submitted by some users
$. E)T!T4 (E;AT!3)S>!% M3DE; AE( M3DE;B
!he E-R data model considers the real world consisting of a set of basic obDects5 called entities5 and
relationships among these obDects
!he E-R data model employs three basic notions=
4 Entity sets
7 Relationship sets
< $ttributes
1. E-t+t( S't&
$n entity is Cthin05 o" CobDect in the real world that is distinguishable from all other
obDects #or e1ample5 each person is an entity
$n entity has a set of properties5 and the (alues for some set of properties may uniEuely identify an
entity
#or e1ample5 a customer with customer-id property with (alue -484 uniEuely identifies that person
$n entity may be concrete5 such as person or a boo.5 or it may be abstract5 such as a loan5 or a
holiday
$n entity set is a set of entities of the same type that share the same properties5 or attributes
#or e1ample all persons who are customers at a gi(en ban. can be defined as entity set customer
!he properties that describe an entity are called tt.+%5t'&
2. R'!t+o-&7+*& -4 R'!t+o-&7+*& &'t&
R'!t+o-&7+* is an association among se(eral entities
R'!t+o-&7+* &'t is a set of relationships of the same type
!he association between entity set is referred to as *.t+,+*t+o-. !hat is5 the entity sets E4, E7, . .
.,En *.t+,+*t' in relationship set R
R',5.&+2' .'!t+o-&7+* &'t" %ame entity set participating in a relationship more than once in a
different role is called Recursi(e relationship set
!he attributes of entities in Recursi(e relationship set is called 4'&,.+*t+2' tt.+%5t'&
T(*'& o1 .'!t+o-&7+*&
+= U-.( .'!t+o-&7+*" $ unary relationship e1ists when an association is maintained within a single entity
Manager
+oss Emp%&yee
:or.er
#igure$ssociation between two obDects of the same entity set
++= B+-.( .'!t+o-&7+*" $ binary relationship e1ists when two entities are associated
P#%is
her
P#%ish
es
B&&'
+++= T'.-.( .'!t+o-&7+*" $ ternary relationship e1ists when there are three entities associated
Tea"her Tea"h
es
S#(e"t
St#de
)t
+2= Q5t'.-.( .'!t+o-&7+*" $ Euaternary relationship e1ists when there are four entities associated
Tea"h
er
St#de)t
St#di
es
C&#rse materia%
S#(e
"t
!he number of entity set participating in a relationship is called 4'3.'' o1 t7' .'!t+o-&7+* &'t.
+inary relationship set is of degree 7H a tertiary relationship set is of degree <
E-t+t( .o!'" !he function that an entity p'ays in a "e'ationship is ca''ed that entity5s "o'e. A "o'e is one
end
of an association
Pers
&)
*&r'
s+
f&
r
Employee Employee
C&mpa)y
2ere Entity role is Employee
:. Att.+%5t'&
!he properties that describes an entity is called attributes
!he attributes of customer entity set are customerIid5 customerIname and city
Each attributes has a set of permitted (alues called the domain or (alue set
Each entity will ha(e (alue for its attributes
E1ample=
-ustomer *ame ,ohn
-ustomer Id <74
Att.+%5t'& .' ,!&&+1+'4 &
%imple
-omposite
%ingle- (alued
M
u
l
t
i
-
(
a
l
u
e
d

D
e
r
i
(
e
d
1= S+)*!' tt.+%5t'"
!his type of attributes cannot be di(ided into sub parts
E6)*!'" $ge5 se15 6P$
2= Co)*o&+t' tt.+%5t'"
!his type of attributes -an be subdi(ided
E6)*!'" $ddress= street5 city5 state5 /ip
:= S+-3!'92!5'4 tt.+%5t'"
!his type of attributes can ha(e only a single (alue
E6)*!'" %ocial security number
4= M5!t+92!5'4 tt.+%5t'"
Multi-(alued attribute -an ha(e many (alues
E6)*!'" Person may ha(e se(eral college degrees5 phone numbers
5= D'.+2'4 tt.+%5t'"
Deri(ed attribute -an be calculated or deri(ed from other related attributes or entities
E6)*!'" $ge can be deri(ed from D0+
@= Sto.'4 tt.+%5t'&=
!he attributes stored in a data base are called stored attributes
$n attribute ta.es a null (alue when an entity does not ha(e a (alue for it
*ull (alues indicate the (alue for the particular attribute does not e1ists or un.nown
Eg = 4 Middle name may not be present for a person (non e1istence case)
7 $partment number may be missing or un.nown
CONSTRAINTS
$n E-R enterprise schema may define certain constraints to which the contents of a database system
must conform
!hree types of constraints are
4 Mapping cardinalities
7 "ey constraints
< Participation constraints
1. M**+-3 ,.4+-!+t+'&
Mapping cardinalities e1press the number of entities to which another entity can be associated (ia
a relationship set
-ardinality in E-R diagram that is represented by two ways=
i) Directed line ( ) ii) >ndirected line ( )
!here are C categories of cardinality
+= O-' to o-'" $n entity in $ is associated with at most one entity in +5 and an entity in + is associated
with at most one entity in $
E,amp%e- A c#stome" ith sin0'e acco#nt at 0i*en b"anch is shon
by oneEtoEone "e'ationship as 0i*en be'o
C#st&m
er
Dep&sit
&r
A""&#)t
++= O-'9to9)-(" $n entity in $ is associated with any number of entities (/ero or more) in + $n
entity in +5 howe(er5 can be associated with at most one entity in $
E,amp%e- A c#stome" ha*in0 to acco#nts at a 0i*en b"anch is shon
by oneEtoEmany "e'ationship as 0i*en be'o.
C#stome"
Deposito"
Acco#nt
+++= M-(9to9o-'" $n entity in $ is associated with at most one entity in + $n entity in +5 howe(er5 can
be associated with any number (/ero or more) of entities in $
E6)*!'" Many employees wor.s for a company !his relationship is shown by many-to-one as gi(en
below
Emp'oyees :o"/sE-o"
Company
+2= M-(9to9)-(" $n entity in $ is associated with any number (/ero or more) of entities in +5 and an
entity in + is associated with any number (/ero or more) of entities in $
E6)*!'" Employee wor.s on number of proDects and proDect is handled by number of employees
!herefore5 the relationship between employee and proDect is many-to-many as shown below
Emp'oyee
:o"/sE
%"oDect
on
2. $'(&
$ .ey allows us to identify a set of attributes and thus distinguishes entities from each other
"eys also help uniEuely identify relationships5 and thus distinguish relationships from each other
$'( T(*' D'1+-+t+o-
S5*'.0'(
$ny attribute or combination of attributes that uniEuely identifies a row in the table
E6)*!'" (o''F)o att"ib#te o- the entity set Cst#dent5 distin0#ishes one st#dent
entity
from another -ustomerIname5 -ustomerIid together is a %uper .ey
C-4+t' $'(
Minimal %uper.ey $ super.ey that does not contain a subset of attributes that is itself a
super.ey
E6)*!'" %tudentIname and %tudentIstreet5are sufficient to uniEuely identify one
particular student
P.+).( $'(
!he candidate .ey selected to uniEuely identify all rows It should be rarely changed and
cannot contain null (alues
E6)*!'" (o''F)o is a p"ima"y set o- Cst#dent5 entity set.
Fo.'+3- $'(
$n attribute (or combination of attributes) in one table that must either match the primary
.ey of another table or be null
E6)*!'" -onsider in the staff relation the branchIno attribute e1ists to match staff to the
branch office they wor. in In the staff relation5 branchIno is foreign .ey
S',o-4.( $'( $n attribute or combination of attributes used to ma.e data retrie(al more efficient
:. P.t+,+*t+o- Co-&t.+-t
Participation can be di(ided into two types.
4 !otal 7 Partial
If e(ery entity in the entity set E participates in at least one relationship in R !hen participation is
called !otal Participation
If only some entities in the entity set E participate in relationships in R !hen the participation is
called Partial Participation
G. E)T!T4E(E;AT!3)S>!%AEE(B D!A<(AMS
E9R 4+3.) can e1press the o(erall logical structure of a database graphically
E9R 4+3.) consists of the following maDor components=
Component name Symbol Description
R',t-3!'& represent entity sets
E!!+*&'& represent attributes
D+)o-4& represent relationship sets
L+-'&
lin. attributes to entity sets and entity sets to
relationship sets
Do5%!' '!!+*&'& represent multi(alued attributes
D&7'4 '!!+*&'& represent deri(ed attributes
Do5%!' !+-'&
Represent total participation of an entity in a
relationship set
Do5%!' .',t-3!'& represent wea. entity sets
E9R 4+3.) /+t7 ,o)*o&+t'A )5!t+2!5'4A -4 4'.+2'4 tt.+%5t'&.
Do5%!' !+-'& are used in an E-R diagram to indicate that the participation of an entity set in a
relationship set is totalH that is5 each entity in the entity set occurs in at least one relationship in that
relationship set
!he number of time an entity participates in a relationship can be specified using comple1
,.4+-!+t+'&.
$n edge between an entity set and binary relationship set can ha(e an associated minimum and
ma1imum cardinality assigned in the form of lh
l 9 Minimum cardinality
h 9 Ma1imum cardinality
$ minimum (alue of 4 indicates total participation of the entity set in the relationship set
$ ma1imum (alue of 4 indicates that the entity participates in at most one relationship
$ ma1imum (alue J indicates no limit
$ label 4... on an edge is eEui(alent to a double line
St.o-3 -4 B'0 '-t+t( &'t&
8J indicates a customer can ha(e 8 or more loan
44 indicates a loan must ha(e one associated customer
$n entity set may not ha(e sufficient attributes to form a primary .ey %uch an entity set is termed a
/'0 '-t+t( &'t
$n entity set that has a primary .ey is termed a &t.o-3 '-t+t( &'t
:ea. entity set is associated with another entity set called the +4'-t+1(+-3 or o/-'. '-t+t( &'t ie5
wea. entity set is said to be e1istence dependent on the identifying entity set
Identifying entity set is said to own the wea. entity set
!he relationship among the wea. and identifying entity set is called the +4'-t+1(+-3 .'!t+o-&7+*
Discriminator in a wea. entity set is a set of attributes that distinguishes the different entities among
the wea. entity also called as partial .ey
E6t'-4'4 E9R F't5.'&
ER model that is supported with the additional semantic concepts is called the e1tended entity
relationship model or EER model
EER model deals with
4 %peciali/ation
7 6enerali/ation
< $ggregation
1. S*',+!+;t+o-"
!he process of designating subgroupings within an entity set is called S*',+!+;t+o-
S*',+!+;t+o- is a top-down process
-onsider an entity set person. $ person may be further classified as one of the following=
Customer
Employee
$ll person has a set of attributes in common with some additional attributes
%peciali/ation is depicted by a triangle component labeled ISA
The 'abe' !SA stands -o" is a -o" e?amp'e, that a c#stome" is a pe"son.
!he I%$ relationship may also be referred to as a &5*'. ,!&&9&5%,!&& relationship
2. G'-'.!+;t+o-"
6enerali/ation is a simple in(ersion of speciali/ation
6enerali/ation is the process of defining a more general entity type from a set of more speciali/ed
entity types
G'-'.!+;t+o- is a bottom-up approach
6enerali/ation results in the identification of a generali/ed super class from the original subclasses
Person is the higher-le(el entity set
Customer and employee are lower-le(el entity sets
!he person entity set is the superclass of the customer and employee subclasses
Att.+%5t' I-7'.+t-,'
$ property of the higher- and lower-le(el entities created by speciali/ation and generali/ation is
tt.+%5t' +-7'.+t-,'
!he attributes of the higher-le(el entity sets are said to be +-7'.+t'4 by the lower-le(el entity
sets
#or e1ample5 customer and employee inherit the attributes of person
!he outcome of attribute inheritance is
4 $ higher-le(el entity set with attributes and relationships that apply to all of its lower-le(el
entity sets
7 'ower-le(el entity sets with distincti(e features that apply only within a particular lower-
le(el entity set
If an entity set is in(ol(ed as a lower-le(el entity set in only one I%$ relationship5 then the entity
set has &+-3!' +-7'.+t-,'
If an entity set is in(ol(ed as a lower-le(el entity set in more than one I%$ relationship5 then the
entity set has )5!t+*!' +-7'.+t-,' and the resulting structure is said to be a lattice
Co-&t.+-t& o- G'-'.!+;t+o-&
4 0ne type of constraint determining which entities can be members of a lower-le(el entity set %uch
membership may be one of the following=
Co-4+t+o-94'1+-'4 In condition-defined the members of lower-le(el entity set is e(aluated on
the basis of whether or not an entity satisfies an e1plicit condition
U&'.94'1+-'4 >ser defined constraints are defined by user
7 $ second type of constraint relates to whether or not entities may belong to more than one lower-
le(el entity set within a single generali/ation !he lower-le(el entity sets may be one of the
following=
D+&8o+-t $ disjointness constraint reEuires that an entity belong to no more than one lower-
le(el entity set
O2'.!**+-3 %ame entity may belong to more than one lower-le(el entity set within a single
generali/ation
< $ final constraint5 the ,o)*!'t'-'&& ,o-&t.+-t specifies whether or not an entity in the higher-le(el
entity set must belong to at least one of the lower-le(el entity sets !his constraint may be one of the
following=
Tot! 3'-'.!+;t+o- or &*',+!+;t+o- Each higher-le(el entity must belong to a lower-le(el
entity set It is represented by double line
P.t+! 3'-'.!+;t+o- or &*',+!+;t+o- %ome higher-le(el entities may not belong to any
lower-le(el entity set
:. A33.'3t+o-
0ne limitation of the E-R model is that it cannot e1press relationships among relationships
-onsider the ternary relationship wors-on5 between a employee5 !ranch5 and jo! *ow5 suppose
we want to record managers for tas.s performed by an employee at a branch !here another
entity set manager is created
!he best way to model such a situation is to use aggregation
A33.'3t+o- is an abstraction through which relationships are treated as higherle(el entities
In our e1ample wor.s-on act as high le(el entity
E-R diagram with redundant relationships E-R diagram with aggregation
S5)).( o1 ER 4+3.)
$ccount number +alance
$-484
$-74B
$-487
B88
;88
C88
&6. !)T(3D2CT!3) T3 (E;AT!3)A;DATA+ASES
$ relational database is based on the relational model and uses a collection of tables to represent
both data and the relationship among those data
It includes DM' and DD' languages
T%!'&=
Each table has multiple columns and each column has uniEue name
$ relational model is an e1ample of a record based model
Record based model are structured in fi1ed format record of se(eral types
Each table contains record of particular type Each record type defines a fi1ed number of fields or
attributes
!he columns of the table correspond to the attribute of record type
Dt M-+*5!t+o- L-353' <DML=
DM' includes following commands
4 I*%ER! !o insert one or more number of Rows
7 %E'E-! !o display one or more rows
< >PD$!E >sed to alter the column (alues in a table
C DE'E!E >sed to delete one or more rows
Dt D'1+-+t+o- L-353' <DDL=
DD; includes following
commands
4 -RE$!E
-ommand used for creating tables
7 DE%-
-ommand used to (iew the table structure
< $'!ER
-ommand used for modifying table structure
C RE*$ME
used to change the name of the table
B DR0P
-ommand used for remo(ing an e1isting table
2 M.0 Q5 '&t+o- &
4 Define data5 database5 database management system5 database systemK
7 'ist any eight applications of D+M%
< :hat are the disad(antages of .eeping organi/ation information in a file processing systemK
C :hat are the ad(antages of using a D+M% (-entrali/ed control of data)K
B :ith the bloc. diagram5 discuss briefly the (arious le(els of data abstractionK
9 Define instance and schemaK
; Define the terms 4) physical schema 7) logical schema <)%ubschema
F :hat is conceptual schemaK
L Define data modelK
48 :hat is storage managerK
44 :hat are the components of storage managerK
47 :hat is the purpose of storage managerK
4< 'ist the data structures implemented by the storage manager
4C :hat is a data dictionaryK
4B :hat is an entity relationship modelK
49 :hat are attributesK 6i(e e1amples
4; :hat are the types of $ttributesK
4F :hat is relationshipK 6i(e e1amples
4L Define the terms
78 Define null (alues
74 Define the terms
77 :hat is meant by the degree of relationship setK
7< Define the terms
7C Define wea. and strong entity setsK
7B :hat does the cardinality ratio specifyK
79 E1plain the two types of participation constraint
7; Define the terms
7F :rite short notes on relational model
7L Define the term Domain
<8 %pecify with suitable e1amples5 the different types of .eys used in database management systems
<4 Define Data model
1@ M.0 Q5 '&t+o- &
1 E1plain D+M% %ystem $rchitecture
7 E1plain E-R Model in detail with suitable e1ample
< E1plain about (arious data models
C Draw an E R Diagram for +an.ing5 >ni(ersity5 -ompany5 $irlines5 $!M5 2ospital5 'ibrary5 %uper
mar.et5 Insurance -ompany
B E1plain in details about the (arious database languages
9 Discuss about (arious operations in Relational Databases
; Discuss about database users and administrators
2)!T !! (E;AT!3)A;
M3DE;
!he relational Model !he catalog- !ypes "eys - Relational $lgebra Domain Relational -alculus
!uple Relational -alculus - #undamental operations $dditional 0perations- %&' fundamentals -
Integrity !riggers - %ecurity $d(anced %&' features Embedded %&' Dynamic %&'- Missing
Information Views Introduction to Distributed Databases and -lient)%er(er Databases
&. T>E (E;AT!3)A; M3DE;
STRUCTURE OF RELATIONAL DATABASES"
$ relational database consists of a collection of t%!'&5 each of which is assigned a uniEue name
$ row in a table represents a relationship among a set of (alues
BASIC STRUCTURE
Each column header is attributes Each attribute allows a set of permitted (alues called domain of
that attribute
$ table of n-attributes must be a subset of
D1 D2 Dn.1
Dn
$ relation is a cartesian product of list of domains
Mathematically table is called as a .'!t+o- and rows in a table are called as t5*!'&.
!he tuples in a relation can be either &o.t'4 o. 5-&o.t'4
%e(eral attributes can ha(e &)' 4o)+- E.3." customerIname5 employeeIname
$ttributes can also be 4+&t+-,t. E.3." balance5 branchIname
$ttributes can ha(e null (alues incase if the (alue is 5-0-o/- o. 4o'& -ot '6+&t.
Database schema begins with upper case and database relation begins with lower case
$ccount-schema M (account-number5 branch-name5 balance)
account ($ccount-schema)
A,,o5-t T%!'
2. T>E CATA;3<
!he catalog is a place where all the schemas and the corresponding mappings are .ept
!he catalog contains detailed information also called as descriptor information or meta data
Descriptor information is essential for the system to perform its Dob properly
#or e1ample the authori/ation subsystem uses catalog information about users and security
constraints to grant or deny access to a particular user
!he catalog should be self describing
1. (E;AT!3)A; A;<E+(A
!he relational algebra is a procedural Euery language
It consists of a set of operations that ta.e one or two relations as input and produce a new relation as
their result
Fo.)! D'1+-+t+o-
$ basic e1pression in the relational algebra consists of either one of the following=
$ relation in the database
$ constant relation
'et E
"
and E
#
be relational-algebra e1pressionsH the following are all relational-algebra
e1pressions= E
4
E
7
E
4
E
7
E
4
1 E
7
p
(E
4
)5 P is a predicate on attributes in E
4
s
(E
4
)5 % is a list consisting of some of the attributes in E
4
1
(E
4
)5 1 is the new name for the result of E
4
O*'.t+o-& ,- %' 4+2+4'4 +-to
B&+, o*'.t+o-& o. F5-4)'-t! O*'.t+o-& 9%elect5 ProDect5 >nion5 rename5 set difference @
-artesian product
A44+t+o-! o*'.t+o-& that can be e1pressed in terms of basic operations-%et intersection5 *atural
Doin Di(ision and $ssignment
E6t'-4'4 o*'.t+o-&96enerali/ed proDection5 $ggregate operations and 0uter Doin
:.1. +AS!C 3%E(AT!3)S 3( ,2)DAME)TA; 3%E(AT!3)S
!he select5 proDect5 and rename operations are called unary o*'.t+o-&5 because they operate on one
relation
!he other three operations (union5 set difference5 cartesian product) operate on pairs of relations and
are5 therefore5 called binary o*'.t+o-&
!he basic or fundamental operations are as follows
4 %elect
7 ProDect
< >nion
C Rename
B %et difference
9 -artesian product
1. S'!',t O*'.t+o- <C=
!e select operation selects tuples that satisfy a gi(en predicate
S(-t6
C
D&'!',t ,o-4+t+o-E
<R=
Symbo' H is #sed to denote the se'ect ope"ato"
Predicate appears as a subscript to H and argument relation in paranthesis
E6)*!'
Co-&+4'. t7' !o- .'!t+o-
.#ery- H
branch-name I%e""y"id0e
(loan)
0utput relation is
%elect operation allows all comparisons using M5 IM5 N5 5 O5

It allows combination of ser(er predicates using connecti(es li.e and (J)5 or (K)5 and not (

)
Eg= 4 H
amountO4788
(loan)
7 H
branch-name I%e""y"id0e
J
amountO4788
(loan)
Ot7'. E6)*!'&
-onsider following +oo. relation
Boo0FI4 T+t!' A5t7o. P5%!+&7'. Y'. P.+,'
+884 D+M% "orth Mc6rawI2ill 7888 7B8
+887 -ompiler >lman 788C <B8
+88< 00MD Rambaugh 788< CB8
+88C PP' %abista 7888 B88
Fo!!o/+-3 .' t7' &o)' '6)*!'& o1 t7' &'!',t o*'.t+o-.
E6)*!' 1" Display boo.s published in the 7888
L#e"y &@ H
yearM7888
(+oo.)
!he output of Euery 4 is shown below
Boo0FI4 T+t!' A5t7o. P5%!+&7'. Y'. P.+,'
+884 D+M% "orth Mc6rawI2ill 7888 7B8
+88C PP' %abista 7888 B88
E6)*!' 2" Display all boo.s ha(ing price greater than <88
L#e"y 2@ H
priceO<88
(+oo.)
!he output of Euery 7 is shown below
Boo0FI4 T+t!' A5t7o. P5%!+&7'. Y'. P.+,'
+887 -ompiler >lman 788C <B8
+88< 00MD Rambaugh 788< CB8
+88C PP' %abista 7888 B88
E6)*!' :" %elect the tuples for all boo.s whose publishing year is 7888 or price is greater than <88
L#e"y 1@ H
(yearM7888) 0R (priceO<88)
(+oo.)
!he output of Euery < is shown
below
Boo0FI4 T+t!' A5t7o. P5%!+&7'. Y'. P.+,'
+884 D+M% "orth Mc6rawI2ill 7888 7B8
+887 -ompiler >lman 788C <B8
+88< 00MD Rambaugh 788< CB8
+88C PP' %abista 7888 B88
E6)*!' 4" %elect the tuples for all boo.s whose publishing year is 7888 and price is greater than <88
L#e"y 1@ H
(yearM7888) $*D (priceO<88)
(+oo.)
!he output of Euery C is shown below
Boo0FI4 T+t!' A5t7o. P5%!+&7'. Y'. P.+,'
+88C PP' %abista 7888 B88
2. P.o8',t o*'.t+o- <G=
!he proDect operation selects certain columns from a table while discarding others It remo(es any
duplicate tuples from the result relation
S(-t6
G
Dtt.+%5t'!+&tE
< R =
The symbo' M ApiB is #sed to denote the p"oDect ope"ation
$ttribute list to be proDected is specified as subscript of M and R denotes the relation
E.3." M
loan-number5 amount
(loan)
E6)*!'" !he following are the e1amples of proDect operation on +oo. relation
E6)*!' 1" Display all titles with author name
L#e"y &@ M
!itle5 $uthor
(+oo.)
!he output of Euery 4 is shown below
T+t!' A5t7o.
D+M% "orth
-ompiler >lman
00MD Rambaugh
PP' %abista
E6)*!' 2" Display all boo. titles with authors and price
L#e"y 2@ M
!itle5 $uthor5 Price
(
+oo. )
!he output of Euery 7 is shown
below
T+t!' A5t7o. P.+,'
D+M% "orth 7B8
-ompiler >lman <B8
00MD Rambaugh CB8
PP' %abista B88
Co)*o&+t+o- o1 &'!',t -4 *.o8',t o*'.t+o-&
!he relational operations select and proDect can be combined to form a complicated Euery
M
customer-name
AH
customer-city I>a""ison
(customer))
Input !able=
customer
0utput=
-ustomer-name
2ayes
E6)*!'" Display the titles of boo.s ha(ing price greater than <88
L#e"y@ M
!itle5(
H
priceO<88
(+oo.))
!he output of Euery 4 is shown below
T+t!'
-ompiler
00MD
PP'
:. R'-)' o*'.t+o- <H=
In relational algebra5 you can rename either the relation or the attributes or both !he general rename
operation can ta.e any of the following forms=
S(-t6
H
&<-'/ tt.+%5t' -)'&=
< R = renames both the relation and its attributes
H
&
< R = renames only the relation
H
<-'/ tt.+%5t' -)'&=
< R =renames only the attribute name
The symbo' CN5 A"hoB is #sed to denote the (E)AME ope"ato".
CS5 is the ne "e'ation
C(5 is o"i0ina' "e'ation.
E6)*!' 1" Renames both the relation and its attributes5 the second renames the relation only and the third
renames as follows
JN
!emp(+name5 $name5 Pyear5 +price)
( +oo. )
E6)*!' 2" 0nly the relation name is renamed
JN
!emp
(+oo.)
E6)*!' :" 0nly the attribute names are renamed
JN
(+name5 $name5 Pyear5 +price)
( +oo. )
4. U-+o- o*'.t+o- <U=
#or union operation (r > s) to be (alid two condition should be satisfied
4 Relation r and s should ha(e the same number of attributes
7 !he domain (alue of i
th
attribute of r and i
th
attribute of s must be the same for all i
S(-t6" Relation4 > Relation 7
E6)*!'" -onsider the two relations=
!o find the names of all customers with a loan in the ban.=
Mcustomer-name (!orrower$
!o find the names of all customers with an account in the ban.=
Mcustomer-name (depositor)
U-+o- of these two setsH that is5 !o find customer names that appear in either or both of the two relations
Mcustomer-name (!orrower$ % &customer-name (depositor)
!he result relation for this Euery=
5. S't D+11'.'-,' O*'.t+o- <9=
!o find tuples that is in one relation but is not in another !he two condition of union operation aso
apply for set difference
S(-t6" Relation4 9 Relation 7
E6)*!'" find all customers of the ban. who ha(e an account but not a loan
Mcustomer-name (depositor) ' Mcustomer-name (!orrower$
!he result relation for this Euery
@. C.t'&+-9P.o45,t O*'.t+o-<#=
-artesian product is also .nown as -R0%% PR0D>-! or -R0%% ,0I*%
-artesian product allows us to combine information from any 7 relation
S(-t6" Relation4 6 Relation 7
E6)*!'" -onsider following two relations publisherIinfo and +oo.Iinfo
P5%!+&7'.FI-1o
P5%!+&7'.F,o4' N)'
P8884 Mc6rawI2ill
P8887 P2I
P888< Pearson
Boo0FI-1o
Boo0FID T+t!'
+8884 D+M%
+8887 -ompiler
!he -artesian product of PublisherIInfo and +oo.IInfo is gi(en in fig
P5%!+&7'.FI-1o # Boo0FI-1o
P5%!+&7'.F,o4' N)' Boo0FID T+t!'
P8884 Mc6rawI2ill +8884 D+M%
P8887 P2I +8884 D+M%
P888< Pearson +8884 D+M%
P8884 Mc6rawI2ill +8887 -ompiler
P8887 P2I +8887 -ompiler
P888< Pearson +8887 -ompiler
4 %et intersection
7 *atural Doin
< Di(ision
C $ssignment
1. S't +-t'.&',t+o- O*'.t+o- < )
:.2. ADD!T!3)A; 3%E(AT!3)S
!he result of intersection operation is a relation that includes all tuples that are in both Relation4 and
Relation7
!he intersection operation is denoted by depositor borrower
S(-t6" Relation4 Relation 7
E6)*!'"
M
customer-name
(!orrower$ M
customer-name
(depositor)
!he result relation for this Euery=
2. Nt5.! 8o+- < =
%alary
EmpIcode %alary
E8884
E8887
E888<
E888C
7888
B888
;888
48888
!he natural Doin operation performs a selection on those attributes that appear in both relation
schemes and finally remo(es duplicate attributes
S(-t6" Relation4 Relation 7
E6)*!'" consider the 7 relations
Employee
EmpIcode EmpIname
E8884
E8887
E888<
E888C
2ari
0m
%mith
,ay
Q5'.("M
empIname5 salary
(employee salary)
!he output of Euery is=
EmpIname %alary
2ari
0m
%mith
,ay
7888
B888
;888
48888
:. D+2+&+o- o*'.t+o- <I=
Di*ision 3pe"ation is s#ited to O#e"ies that inc'#de the ph"ase C-o" a''5.
S(-t6" Relation4 I Relation 7
E6)*!'" -onsider three relations=
A,,o5-t R'!t+o- B.-,7 R'!t+o-
D'*o&+to. R'!t+o-
%uppose that we wish to find all customers who ha(e an account at all the branches located in +roo.lyn
St'* 1" :e can obtain all branches in +roo.lyn by the e1pression
r4 M M
!ranch-name
()
!ranch-city I+"oo/'yn
(!ranch))
!he result relation for this e1pression is shown in figure
St'* 2" :e can find all (customer-name, !ranch-name) pairs for which the customer has an account at a
branch by writing
r7 M M
customer-name, !ranch-name
(depositor account)
#igure shows the result relation for this e1pression
*ow5 we need to find customers who appear in r7 with every branch name in r4 !he operation that
pro(ides e1actly those customers is the di(ide operation
!hus5 the Euery is
M
customer-name, !ranch-name
(depositor account)
* M
!ranch-name
()
!ranch-city I+"oo/'yn
(!ranch))
!he result of this e1pression is a relation that has the schema (customer-name) and that contains the tuple
(,ohnson)
4. T7' A&&+3-)'-t O*'.t+o- < =
!he &&+3-)'-t operation wor.s li.e assignment in a programming language
E6)*!'"
Result of the e1pression to the right of the + is assigned to the relation (ariable on the left of the+
:ith the assignment operation5 a Euery can be written as a seEuential program consisting of a series
of assignments followed by an e1pression whose (alue is displayed as the result of the Euery
:.:. EPTE)DED (E;AT!3)A;EA;<E+(A 3%E(AT!3)S
4 6enerali/ed proDection
7 $ggregate operations
< 0uter Doin
1. G'-'.!+;'4 *.o8',t+o-
!he 3'-'.!+;'49*.o8',t+o- operation e1tends the proDection operation by allowing arithmetic
functions to be used in the proDection list
!he generali/ed proDection operation has the form
M,4, ,#,..., ,n(E)
:here E is any relational-algebra e1pression5 and each of ,4, ,7, . . . , ,n is an arithmetic
e1pression in(ol(ing constants and attributes in the schema of E
E6)*!'" %uppose we ha(e a relation credit-info5 as in #igure
Q5'.(" M
customer-name, (limit ' credit-!alance) & credit-availa!le
(credit-info)
R'&5!t of this Euery=
2. A33.'3t' F5-,t+o-&
A33.'3t' 15-,t+o-& ta.e a collection of (alues and return a single (alue as a result #ew $ggregate
#unction are5
4 $(g
7 Min
< Ma1
C %um
B -ount
1. A23" !he aggregate function 23 returns the a(erage of the (alues
E6)*!'" >se the pt-wors relation in #igure
%uppose that we want to find out the a(erage of salaries
G
avg (salary)(pt-wors)
!he symbol - is the 'ette" < in ca''i0"aphic -ontQ "ead it as ca''i0"aphic <.
R'&5!t"
S!.(
7897B
2. M+- -4 M6
M+-" Return the minimum (alues in a collection
M6" Return the Ma1imum (alues in a collection
E6)*!'"
!ranch-name
-
&5) (salary) & sum-salary
,
)6 (salary) & ma.-salary
(pt-wors)
R'&5!t"
!he attribute !ranch-name in the left-hand subscript of - indicates that the input relation pt-
wors must be di(ided into groups based on the (alue of !ranch-name
!he calculated sum is placed under the attribute name sum-salary and the ma1imum salary is
placed under the attribute ma1-salary
:. S5)"
!he aggregate function sum returns the total of the (alues
E6)*!'" %uppose that we want to find out the total sum of salaries
-
&5)(salary)(pt-wors)
!he symbol - is the 'ette" < in ca''i0"aphic -ontQ "ead it as ca''i0"aphic <.
R'&5!t"
S!.(
49B88
4. Co5-t"
Returns the number of the elements in the collection5
E6)*!'" -
,o5-t-4+&t+-,t(!ranch-name)
(pt-wors)
R'&5!t" /he result of this Euery is a single row containing the (alue <
:. O5t'. 8o+-
,oins are classified into three types namely=
4 Inner ,oin
7 0uter ,oin
< *atural ,oint
I--'. Jo+- < =
Inner ,oin returns the matching rows from the tables that are being Dointed
E1ample= -onsider the two relations
E6)*!'"

R'&5!t"
O5t'. Jo+-
!he o5t'.98o+- operation is an e1tension of the Doin operation to deal with missing information
0uter-join operations a(oid loss of information
0uter ,oins are classified into three types namely=
4 'eft 0uter ,oin
7 Right 0uter ,oin
< #ull 0uter ,oin
1. L'1t O5t'. Jo+- < =
!he !'1t o5t'. 8o+- ( ) ta.es all tuples in the left relation that did not match with any tuple in the
right relation5 pads the tuples with null (alues for all other attributes from the right relation5 and adds them
to the result of the natural Doin
E6)*!'"
R'&5!t"
2. R+37t O5t'. Jo+- < =
!he .+37t o5t'. 8o+- ( ) is symmetric with the left outer Doin= It pads tuples from the right relation
that did not match any from the left relation with nulls and adds them to the result of the natural Doin
E6)*!'"
R'&5!t"
:. F5!! O5t'. Jo+- < =
!he 15!! o5t'. 8o+-( ) does both of those operations5 padding tuples from the left relation that did
not match any from the right relation5 as well as tuples from the right relation that did not match any from
the left relation5 and adding them to the result of the Doin
E6)*!'"
R'&5!t"
.. (E;AT!3)A; CA;C2;2S
Relational -alculus is a formal Euery language where we can write one declarati(e e1pression to
specify a retrie(al reEuest and hence there is no description of how to retrie(e it
$ calculus e1pression specifies what is to be retrie(ed rather than how to retrie(e it
Relational -alculus is considered to be non procedural language
Relational -alculus can be di(ided into
4 !uple Relational -alculus
7 Domain Relational -alculus
/.0. T2%;E (E;AT!3)A; CA;C2;2S
!uple Relational -alculus is a nonprocedural Euery language
$ Euery in the !uple Relational -alculus is e1pressed as follows
Pt Q P (t ) R
It is the set of all tuples t such that predicate P is true for t
t r denotes that tuple t is in relation r
P is a formula similar to that of the predicate calculus
$ tuple (ariable is said to be a free varia!le unless it is Euantied by a or
t R loan J S s R customer(tSbranch-nameT M sSbranch-nameT)
t is a free (ariable !uple (ariable s is said to be a !ound (ariable
$ tuple-relational-calculus formula is built up out of atoms $n atom has one of the
following forms=
4 s r5 where s is a tuple (ariable and r is a relation
7 sS.T

uSyT5 where s and u are tuple (ariables5 . is an attribute on which s is defined5 y is
an attribute on which u is defined5 and is a comparison operator
< sS.T c5 where s is a tuple (ariable5 . is an attribute on which s is defined5 is a
comparison operator5 and c is a constant in the domain of attribute .
R5!'& to %5+!t 1o.)5!& 1.o) to)&
$n atom is a formula
If P4 is a formula5 then so are P4 and (P4)
If P4 and P7 are formulae5 then so are P4 K P75 P4 J P75 and P4 T P7
If P4(s) is a formula containing a free tuple (ariable s5 and r is a relation5 then
S s R r (P4(s)) and U s R r (P4(s)) are also
formulae
E?5+2!'-,' .'!t+o- +- T5*!' .'!t+o-! ,!,5!5&
P4 J P7 is eEui(alent to ( (P4) K
(P7))

U t R r (P4(t)) is eEui(alent to

t R r (P4(t))

P4 T P7 is eEui(alent to (P4) K P7
B-0+-3 E6)*!'
4 !ranch (!ranch1name, !ranch1city, assets )
7 customer (customer1name, customer1street, customer1city )
< account (account1num!er, !ranch1name, !alance )
C loan (loan1num!er, !ranch1name, amount )
B depositor (customer1name, account1num!er )
9 !orrower (customer1name, loan1num!er )
E6)*!' Q5'.+'&
4 #ind the loan1num!er, !ranch1name, and amount for loans of o(er A4788
Pt Q t loan t Samount T 4788R
7 #ind the loan number for each loan of an amount greater than A4788
Pt Q s loan (t Sloan1num!er T M s Sloan1num!er T s Samount T 4788)R
*otice that a relation on schema Sloan1num!er2 is implicitly defined by the Euery
< #in
d the names of all customers ha(ing a loan5 an account5 or both at the ban.
Pt Q s !orrower ( t Scustomer1name T M s Scustomer1name T) u depositor ( t Scustomer1name
T M u Scustomer1name T)
C #ind the names of all customers who ha(e a loan and an account at the ban.
Pt Q s !orrower ( t Scustomer1name T M s Scustomer1name T) u depositor ( t Scustomer1name
T M u Scustomer1nameT )
B #ind the names of all customers ha(ing a loan at the Perryridge branch
Pt Q s !orrower (t Scustomer1name T M s Scustomer1name T u loan (u S!ranch1name T M
%e""y"id0e u Sloan1num!er T M s Sloan1num!er T))R
S1't( o1 E6*.'&&+o-&
$ tuple relational calculus may generate an infinite relation
#or e1ample5 P t Q t loanR results in infinitely many tuples that are not in loan relation
!o guard against the problem5 a domain is defined for all tuple relational calculus formula P
It is denoted by dom(P) it denotes that P can ta.e (alue only in that domain
$n e1pression Pt Q P (t )R in the tuple relational calculus is safe if e(ery component of t appears in
one of the relations5 tuples5 or constants that appear in P
4.2. D3MA!) (E;AT!3)A; CA;C2;2S
Domain relational calculus is also a nonprocedural Euery language eEui(alent in power to the tuple
relational calculus
It ser(ers as the theoretical basis of widely used &uery +y E1ample (&+E)
language Domain relational calculus e1pression is of the form=
P .
4
, .
7
, 3, .
n
Q P (.
4
5 .
7
, 3, .
n
)R
.
4
5 .
7
, 3, .
n
represent domain (ariables
P represents a formula composed of atoms
$n atom in Domain relational calculus has one of the following form
4 4 .4, .7, . . . , .n 5 r, where r is a relation on n attributes and .4, .7, . . . , .n are domain
(ariables or domain constants
7 . y5 where . and y are domain (ariables and is a comparison operator
< . c5 where . is a domain (ariable5 is a comparison operator5 and c is a constant
R5!'& to %5+!t 1o.)5!& 1.o) to)&
$n atom is a formula
If P4 is a formula5 then so are P4 and (P4)
If P4 and P7 are formulae5 then so are P4 K P75 P4 J P75 and P4 T P7
If P4(.) is a formula in 15 where 1 is a free domain (ariable5 then
S . (P4(1)) and U . (P4(1)) are also
formulae
E6)*!' Q5'.+'&
4 #ind the loan1num!er, !ranch1name, and amount for loans of o(er A4788
P l5 !, a Q l5 !, a loan a O 4788R
7 #ind the names of all customers who ha(e a loan of o(er A4788
P c Q l, !, a ( c, l !orrower l, !, a loan a O 4788)R
< #ind the names of all customers who ha(e a loan from the Perryridge branch and the loan
amount=
P c, a Q l ( c, l !orrower ! ( l, !, a loan ! I %e""y"id0eBBV
P c, a Q l ( c, l !orrower l, 6 Perryridge7, a loan)R
S1't( o1 E6*.'&&+o-&
!he e1pression= P .
4
, .
7
, 3, .
n
Q P (.
4
5 .
7
, 3, .
n
)R is safe if all of the following hold=
4 $ll (alues that appear in tuples of the e1pression are (alues from dom (P$ (that is5 the (alues appear
either in P or in a tuple of a relation mentioned in P$
7 ,o" e*e"y the"e e?ists s#b-o"m#'a o- the -o"m . (P
4
(.$)5 the subformula is true if and only if there
is a (alue of . in dom (P
4
) such that P
4
(.$ is true
< ,o" e*e"y -o" a'' s#b-o"m#'a o- the -o"m 1 (P
4
(.$)5 the subformula is true if and only if P
4
(.$ is
true for all (alues . from dom (P
4
)
5.1. I-t.o45,t+o-
5. SL; ,2)DAME)TA;S
%&' is a standard common set used to communicate with the relational database
management systems
$ll tas.s related to relational data management-creating tables5 Euerying5 modifying5 and
granting access to users5 and so on
5.2. A42-t3'& o1 SQL
%&' is a high le(el language that pro(ides a greater degree of abstraction than procedural languages
%&' enables the end-users and systems personnel to deal with a number of database management
systems where it is a(ailable
$pplication written in %&' can be easily ported across systems
%&' specifies what is reEuired and not how it should be done
%&' was simple and easy to learn can handle comple1 situations
$ll %&' operations are performed at a set le(el
5.:. P.t& o1 SQL
!he %&' language has se(eral parts=
Dt94'1+-+t+o- !-353' (DD') !he %&' DD' pro(ides commands for defining relation
schemas5 deleting relations5 and modifying relation schemas
I-t'.,t+2' 4t9)-+*5!t+o- !-353' (DM') !he %&' DM' includes a Euery language based
on both the relational algebra and the tuple relational calculus It also includes commands to insert
tuples into5 delete tuples from5 and modify tuples in the database
V+'/ 4'1+-+t+o- !he %&' DD' includes commands for defining (iews
T.-&,t+o- ,o-t.o! %&' includes commands for specifying the beginning and ending of
transactions
E)%'44'4 SQL and 4(-)+, SQL Embedded and dynamic %&' define how %&' statements can
be embedded within general-purpose programming languages5 such as -5 -335 ,a(a5 P')I5 -0+0'5
Pascal5 and #0R!R$*
I-t'3.+t( !he %&' DD' includes commands for specifying integrity constraints that the data stored
in the database must satisfy >pdates that (iolate integrity constraints are disallowed
A5t7o.+;t+o- !he %&' DD' includes commands for specifying access rights to relations and
(iews
5.4. Do)+- T(*'& +- SQL
1. C7. <-=" #i1ed length character string5 with user-specified length n.
2. 2.,7.<-=" Variable length character strings5 with user-specified ma1imum length n.
:. +-t" Integer (a finite subset of the integers that is machine-dependent)
4. S)!!+-t" %mall integer (a machine-dependent subset of the integer domain type)
5. -5)'.+, <*A4=" fi1ed point number5 with user-specified precision of p digits5 with n digits to the
right of decimal point
@. R'!A 4o5%!' *.',+&+o-" #loating point and double-precision floating point numbers5 with
machine-dependent precision
>. 1!ot <-=" #loating point number5 with user-specified precision of at least n digits
K. Dt'" Dates5 containing a (C digit) year5 month and date
E6)*!'" date C2665-;-275
9. T+)'" !ime of day5 in hours5 minutes and seconds
E6)*!'" time C6G@66@165 time C6G@66@16.755
1L. T+)'&t)*" date plus time of day
E6)*!'" timestamp C2665-;-27 6G@66@16.755
11. I-t'.2!" period of time
E6)*!'" inte"*a' C&5
day
5.5. DATA DEFINITION LANGUAGE <DDL=
It is used to create a table5 alter the structure of a table and also drop the table
DDL Co))-4&
1. CREATE -ommand used for creating tables
S(-t6" ,.'t' t%!' Ntable nameO (columnname4 data type (si/e)5 -olumnname 7 data
type(si/e)55 columnname n data type(si/e))H
E6)*!'" create table customer (custIname (archar7 (4B)5 socialIsecurityIno
number(44)5custIstreet (archar7(;)5custIcity (archar7(48))H
2. DESC -ommand used to (iew the table structure
S(-t6" 4'&, Ntable nameOH
E6)*!'" desc customerH
:. ALTER -ommand used for modifying table structure
+= S(-t6" !t'. t%!' Ntable nameO )o4+1( (columnname data type (new si/e))H
E6)*!'" alter table customer modify (custIstreet (archar7 (48))H
++= S(-t6" !t'. t%!' Ntable nameO )o4+1( (columnname new data type (si/e))H
E6)*!'" alter table customer modify (socialIsecurityIno (archar7 (44==M
+++= S(-t6" !t'. t%!' Ntable nameO 44 (new columnname data type (si/e))H
E6)*!'" alter table customer add (accIno (archar7(B))H
+2= S(-t6" !t'. t%!'Ntable nameO 4.o* (column name)H
E6)*!'" alter table customer drop (accIno)H
4. RENAME used to change the name of the table
S(-t6" .'-)' N0ld table nameO to N*ew table nameOH
E6)*!'" rename cust to cust4H
5. DROP -ommand used for remo(ing an e1isting table
S(-t6" 4.o* t%!' Ntable nameOH
E6)*!'" drop table cust4H
5.@. DATA MANIPULATION LANGUAGE <DML=
Data Manipulation language commands let user to insert5 modify and delete the data from
database
DDL Co))-4&
1. INSERT !o insert one or more number of Rows
S(-t6 1" +-&'.t +-to Dt%!' -)'E 2!5'& ('ist of Data Values)
S(-t6 2" +-&'.t into Ntable nameO (column names) 2!5'& (list of data (alues)
Insert command using >ser interaction
S(-t6 :" +-&'.t into N!able nameO 2!5'& (@columnname45 Wco'#mnname2X)
2. SELECT !o display one or more rows
S(-t6 1" &'!',t N 1.o) Ntable nameOH
S(-t6 2" &'!',t columnname45 columnname 7 1.o) Ntable nameOH
S(-t6 :" &'!',t N 1.o) Ntable nameO /7'.' NconditionOH
E6)*!'"
a) #ind the names of all branches in the loan table
&'!',t branchIname 1.o) loanH
b) 'ist all account numbers made by brighton branch
&'!',t ,,F-o 1.o) account /7'.' branchIname M UbrightonUH
c) 'ist the customers who are li(ing in the city harrison
&'!',t custIname 1.o) ,5&to)'. /7'.' custIcity M UharrisonUH
:. UPDATE >sed to alter the column (alues in a table
S(-t6" 5*4t' Ntable nameO &'t columnInameMnewI(alue where NconditionOH
E6)*!'" 5*4t' the account table to replace the balance (alue B88 to CB8
Q5'.(" 5*4t' account &'t ba'anceI.56 he"e accFnoI5A-&6&5Q
4. DELETE >sed to delete one or more rows
S(-t6" 4'!'t' from Ntable nameO /7'.' NconditionOH
E6)*!'" 4'!'t' from borrower /7'.' c#stFnameI5Dac/son5Q
5.>. BASIC STRUCTURE OF SQL E#PRESSION
%&' e1pression consists of three clauses=
S'!',t" !he select clause corresponds to proDection operation of the relational algebra It is used to
list the attributes desired in the result of a Euery
F.o)" !he from clause corresponds to the -artesian product operation of the relational algebra It
lists the relations to be scanned in the e(aluation of the e1pression
B7'.'" !he where clause corresponds to the selection predicate of the relations that appear in the
form clause
G'-'.! 1o.) o1 SQL ?5'.(
S'!',t A&, A2XXXX., An
F.o) (&, (2XXXXX, (m
B7'.' P
:here5 $4-represent an attribute
R4-represent relation
P-is a predicate
E6)*!'"
,ind the names o- a'' b"anches in the loan "e'ation@
&'!',t !ranch-name 1.o) loan
&'!',t 4+&t+-,t !ranch-name 1.o) loan 89:istinct eyword eleminates duplicates98
&'!',t !! !ranch-name 1.o) loan 89:uplicates are not removed98
,ind a'' 'oan numbers for loans made at the Perryridge branch with loan amounts greater that Y&266.
&'!',t loan-num!er 1.o) loan /7'.' !ranch-name I 5%e""y"id0e5 -4 amount 5 4788
R'-)' O*'.t+o-
!he %&' allows renaming relations and attributes using the as clause=
Old-name & new-name
E6)*!'" #ind the name5 loan number and loan amount of all customersH rename the column name
loan1num!er as loan1id.
select customer1name ,!orrower.loan1num!er as loan1id,a mount from !orrower,loan where
!orrower.loan1num!er ; loan.loan1num!er
T5*!' V.+%!'&
!uple (ariables are defined in the from clause (ia the use of the as clause
E6)*!'" #ind the customer names and their loan numbers for all customers ha(ing a loan at some
branch
select customer1name, /.loan1num!er, <.amount from !orrower as /, loan as <
where /.loan1num!er ; <.loan1num!er
St.+-3 O*'.t+o-
%&' includes a string-matchin0 ope"ato" -o" compa"isons on cha"acte" st"in0s.The ope"ato"
'i/e
uses patterns that are described using two special characters=
o percent (V) !he V character matches any substring
o underscore ( I ) !he I character matches any character
E6)*!'"
5%e""yZ5 matches any st"in0 be0innin0 ith %e""y.
5Zid0eZ5 matches any st"in0 containin0 id0e as a s#bst"in0, -o" e?amp'e, 5%e""y"id0e5,
5(oc/
(id0e5, 5Mian#s +"id0e5, and 5(id0eay5.
5- - - 5 matches any st"in0 o- e?act'y th"ee cha"acte"s.
5 - - -Z5 matches any st"in0 o- at 'east th"ee cha"acte"s.
E6)*!'" S'!',t N -"om c#stome" he"e c#stome"Fname 'i/e 5DZ5Q
S'!',t N -"om c#stome" he"e c#stome"FSt"eet 'i/e 5FaZ5Q
%&' supports a (ariety of string operations such as
concatenation A#sin0 [[B
con(erting from upper to lower case (and (ice (ersa)
finding string length5 e1tracting substrings5 etc
O.4'.+-3 t7' D+&*!( o1 T5*!'&
'ist in alphabetic order the names of all customers ha(ing a loan in Perryridge branch
E6)*!' "select distinct customer1name from !orrower, loan where !orrower loan1num!er ;
loan.loan1num!er and !ranch1name ; UPerryridgeU order by customer1name
:e may specify desc for descending order or asc for ascending order5 for each attributeH ascending
order is the default
E6)*!'" order by customer1name desc
S't O*'.t+o-&
S't operators combine the results of two Eueries into a single one
1. U-+o- returns all distinct rows selected by either Euery
E6)*!'" #ind all customers ha(ing a loan5 an account or both at the ban.
Q5'.(" select custIname from depositor 5-+o- select custIname from borrowerH
2. U-+o- !! 9 returns all rows selected by either Euery
E6)*!'" #ind all customers ha(ing a loan and an account at the ban.
Q5'.(" select custIname from depositor union all select custIname from borrowerH
3. I-t'.&',t returns only rows that are common to both the &ueries
E6)*!'" #ind all customers who ha(e both a loan5 and an account at the ban.
Q5'.(" select custIname from depositor +-t'.&',t select custIname from borrowerH
4. M+-5& returns all distinct rows selected only by the first &uery and not by the second
E6)*!'" !o find all customers who ha(e an account but no loan at the ban.
Q5'.(" select custIname from depositor minus select custIname from borrowerH
A33.'3t' F5-,t+o-
!hese functions operate on the multiset of (alues of a column of a relation5 and return a (alue
<= AVG 9 !o find the a(erage of (alues
E6)*!'" #ind the a(erage of account balance from the account table
Q5'.(" select 23 (balance) from accountH
<%= SUM !o find the sum of (alues
E6)*!'" #ind the sum of account balance from the account table
Q5'.(" select &5) (balance) from accountH
<,= MA# Returns the ma1imum (alue
E6)*!'" #ind the Ma1imum (alue of account balance from the account table
Q5'.(" select )6 (balance) from accountH
<4= MIN 9 Returns the minimum (alue
E6)*!'" #ind the Minimum (alue of account balance from the account table
Q5'.(" select )+- (balance) from accountH
<'= COUNT Returns the number of rows in the column or table
E6)*!' 1" #ind the number rows in the customer table
Q5'.(" select ,o5-t <N= from customerH
E6)*!' 2" #ind the number of rows in the balance column of account table
Q5'.(" select ,o5-t <%!-,'= from accountH
<1= GROUP BY
A33.'3t' F5-,t+o-& G.o5* B(
E6)*!' 1" #ind the a(erage account balance at each branch
Q5'.("select branchInameA 23 (balance) from account 3.o5* %( branchInameH
<3= HAVING CLAUSE
A33.'3t' F5-,t+o-& H2+-3 C!5&'
E6)*!' 2" #ind the a(erage account balance at brighton branch
Q5'.(" select branchIname5 23 (balance) from account 3.o5* %( branchIname ha(ing
b"anchFnameI5b"i0hton5M
N5!! V!5'&
It is possible for tuples to ha(e a null (alue5 denoted by null5 for some of their attributes
=ull signifies an un.nown (alue or that a (alue does not e1ist
!he predicate is null can be used to chec. for null (alues
E6)*!'" #ind all loan number which appears in the loan relation with null (alues for
amount.
select loan1num!er from loan where amount is null
-4= !he result of true -4 unnown is unnown5 false -4 unnown is false5 while unnown -4 unnown
is unnown
o.= !he result of true o. unnown is true5 false o. unnown is unnown5 while unnown o. unnown is
unnown
-ot= !he result of -ot unnown is unnown
N'&t'4 S5%?5'.+'&
%&' pro(ides a mechanism for the nesting of subEueries
$ subEuery is a select-from-where e1pression that is nested within another Euery
$ common use of subEueries is to perform tests for set membership5 set comparisons5 and set
cardinality
E6)*!' 1" #ind all the information of customer who has an account number is $-484.
Q5'.(" &'!',t N from customer where custInameM(select custIname
from depositor where accInoMU$-484U)H
E6)*!' 2"#ind all customers who ha(e a loan from the ban.5 find their names $nd loan numbers.
Q5'.(" &'!',t custIname5 loanIno from borrower where loanIno in
(select loanIno from loan)H
1. S't )')%'.&7+*&
S'!',t N -"om c#stome" he"e c#stome"Fname
inAC>ays5,Jones5BQ
INE6)*!'"
NOT
INE6)*!'" S'!',t N -"om c#stome" he"e c#stome"Fname not inAC>ays5,Jones5BQ
2. S't ,o)*.+&o-&
%&' uses (arious comparison operators such as N5 NM5M5O5OM5NO5any5 all5
some5Osome5Oany etc to compare sets
E6)*!'& 1" S'!',t N from borrower where loanInumberNany(select loanInumber
-"om 'oan 2 he"e
b"anchFnameI5%e""y"id0e5BQ
E6)*!' 2" S'!',t loanIno from loan from amountNM<8888H
:. T'&t 1o. E)*t( R'!t+o-
E1ists is a test for non empty set
E6)*!'" %elect title from boo. where e1ists(select J from order where boo.boo.-
idMorderboo.Iid)H
%imilar to e1ists we can use not e1ists also
E6)*!'" %elect title from boo. where not e1ists(select J from order where boo.boo.-
idMorderboo.Iid)H
4. T'&t 1o. %&'-,' o1 45*!+,t' t5*!'&
!he >niEue construct tests whether a subEuery has any duplicate tuples in its result
E6)*!'" &'!',t /.customer-name 1.o) depositor & /
/7'.' 5-+?5' (&'!',t R.customer-name 1.o) account, depositor & R
/7'.' /.customer-name M R.customer-name -4
R.account-num!er M account.account-num!er -4
account.!ranch-name I 5%e""y"id0e5B
*ot >niEue construct is used for test the e1istence of duplicate tuples in the same manner
Co)*!'6 Q5'.+'&
-omple1 Eueries are often hard or impossible to write as a single %&' bloc.!here are two ways of
composing multiple %&' bloc.s to e1press a comple1 Euery
4 Deri(ed relations
7 /+t7 clause
1. D'.+2'4 R'!t+o-&
%&' allows a subEuery e1pression to be used in the 1.o) clause If we use such an
e1pression5 then we must gi(e the result relation a name5 and we can rename the attributes #or renaming &
clause is used
Fo. '6)*!'" ,ind the a*e"a0e acco#nt ba'ance o- those b"anches he"e the a*e"a0e acco#nt
ba'ance is
0"eate" than Y&266.
S'!',t !ranch-name, avg-!alance 1.o) (&'!',t !ranch-name5 23 (!alance) 1.o) account
3.o5* %( !ranch-name)& !ranch-avg (!ranch-name, avg-!alance) /7'.' avg-!alance 5 4788
2ere subEuery result is named as !ranch-avg with attributes of !ranch-name and avg-!alance.
2. /+t7 clause
!he /+t7 clause pro(ides a way of defining a temporary (iew5 whose definition is a(ailable
only to the Euery in which the /+t7 clause occurs
-onsider the following Euery5 which selects accounts with the ma1imum balanceH if there are
many accounts with the same ma1imum balance5 all of them are selected
/+t7 ma.-!alance (value) &
&'!',t )6(!alance)
1.o) account
&'!',t account-num!er
1.o) account5 ma.-!alance
/7'.' account.!alance M ma.-!alance.value
@. !)TE<(!T4
Integrity constraints ensures that changes made to the database by authori/ed users donot result in a
loss of data consistency
It is a mechanism used to pre(ent in(alid data entry into the table
Pre(ents accidental damages of database
T(*'&
4 Domain integrity -onstraints
7 Entity integrity -onstraints
< Referential integrity -onstraints
1. Do)+- +-t'3.+t( Co-&t.+-t&
$ Domain is a set of (alues that may be assigned to an attribute $ll (alues that appear in a column
of a relation (table) must be ta.en from the same domain
T(*'&
*ot *ull -onstraints
-hec. -onstraints
= NOT NULL It will not allow null (alues
S(-t6 " ,.'t' t%!' Ntable nameO(columnname data type (si/e) constraint constraintIname -ot
-5!!=M
E6)*!' " accountIno char(48) notnullH
%= CHEC$ 9 >se the CHEC$ constraint when you need to enforce integrity rules that can be e(aluated
based on a condition (logical e1pression)
S(-t6 " ,.'t' t%!' Ntable nameO(columnname data type (si/e) constraint constraintIname ,7',0
(chec.Icondition))H
E6)*!'" create table student (name char(4B) not null5student-id char(48)5 degreeIle(el char(4B)5
p"ima"y /eyAst#dentFidB, chec/Ade0"eeF'e*e'
inACbache'o"s5,5maste"5,5docto"ate5BBBQ
!he ,.'t' 4o)+- clause can be used to dene new domains #or e1ample5 the statements=
,.'t' 4o)+- :ollars -5)'.+,(4757)
,.'t' 4o)+- Pounds -5)'.+,(4757)
2. E-t+t( +-t'3.+t( Co-&t.+-t&
!he entity integrity constraints state that no primary .ey (alue can be null !his is because the
primary .ey (alue is used to identify indi(idual tuples in a relation
T(*'&
>niEue -onstraint
Primary .ey -onstraint
= UNIQUE $(oid duplicate (alues uniEue($
D4
5$
D7
,
XX,A
Dm
)
!he uniEue specification saya that attributes $
D4
5$
D7
,XX,A
Dm
form a candidate .ey !hese attributes
should ha(e distinct (alues
S(-t6 ",.'t' t%!' Ntable nameE<,o!5)--)' 4t t(*' <&+;'= ,o-&t.+-t
constraintIname 5-+?5'=M
%= Co)*o&+t' UNIQUE Multicolumn uniEue .ey is called composite uniEue .ey
S(-t6 " ,.'t' t%!' Ntable nameO(columnname4 data type (si/e)5 columnname7 data type
(si/e)5 constraint constraintIname 5-+?5' (columnname45 columnname7))H
,= PRIMARY $EY It will not allow null (alues and a(oid duplicate (alues
S(-t6 " ,.'t' t%!' Ntable nameO(columnname data type (si/e) constraint constraintIname
*.+).( 0'(=M
4= Co)*o&+t' PRIMARY $EY Multicolumn primary .ey is called composite primary .ey
S(-t6 " ,.'t' t%!' Dtable nameO(columnname4 data type (si/e)5 columnname7 data type
(si/e)5 constraint constraintIname *.+).( 0'( (columnname45 columnname7))H
:. REFERENTIAL INTEGRITY
Ensures that a (alue appears in one relation for a gi(en set of attributes also appears for a certain set
of attributes in another relation !his condition is called referential integrity
R'1'.'-,' 0'( <1o.'+3- 0'(= Its represent relationships between tables #oreign .ey is a column whose
(alues are deri(ed from the primary .ey of the same or some other table.
S(-t6" ,.'t' t%!' Ntable nameO(columnname data type (si/e) constraint constraintIname
.'1'.'-,'& parentItableIname)H
Fo.)! D'1+-+t+o-
'et r
4
(R
4
) and r
7
(R
7
) be relations with primary .eys >
4
and >
7
respecti(ely
!he subset of R
7
is a foreign ey referencing >
4
in relation r
4
5 if for e(ery t
7
in r
7
there must be a tuple t
4
in r
4
such that t
4
S>
4
T M t
7
S T
Referential integrity constraint also called subset dependency since its can be written as
(r
7
)
>4
(r
4
)
A&& '. t+o-&
$n assertion is a predicate e1pressing a condition that we wish the database always to satisfy
$n assertion in %&' ta.es the form
-reate assertion Nassertion-nameO chec. NpredicateO
:hen an assertion is made5 the system tests it for (alidity5 and tests it again on e(ery update that
may (iolate the assertion
$sserting for all G5P(G) is achie(ed in a round-robin fashion using not e1ists G such that not
P(G)
A&&'.t+o- E6)*!'
!he sum of all loan amounts for each branch must be less than the sum of all account balances at
the branch
,.'t' &&'.t+o- sum-constraint ,7',0 (-ot '6+&t& (&'!',t J 1.o) !ranch /7'.' (&'!',t
&5)(amount) 1.o) loan /7'.' loan.!ranch-name M !ranch.!ranch-name) 5M (&'!',t
&5)(!alance) 1.o) account /7'.' account.!ranch-name M !ranch.!ranch-name)))
>.T(!<<E(S
$ t.+33'. is a statement that is e1ecuted automatically by the system as a side effect of a
modification to the database
!o design a trigger mechanism5 we must=
%pecify the conditions under which the trigger is to be e1ecuted
%pecify the actions to be ta.en when the trigger e1ecutes
T.+33'. E6)*!'
%uppose that instead of allowing negati(e account balances5 the ban. deals with o(erdrafts by
setting the account balance to /ero
creating a loan in the amount of the o(erdraft
gi(ing this loan a loan number identical to the account number of the o(erdrawn account
!he condition for e1ecuting the trigger is an update to the account relation that results in a negati(e
!alance (alue
S(-t6"
C.'t' o. .'*!,' t.+33'.Dt.+33'.9-)'EO%'1o.'P1t'.QO+-&'.tP4'!'t'P5*4t'Qo- Dt%!'9
-)'ER1o. ',7 &tt')'-tP1o. ',7 .o/SR/7'- D,o-4+t+o-ESM
P.t& o1 t.+33'."
a !rigger statement (!he DM' statement li.e insert)delete)update It fires the trigger body)
b !rigger body
c !rigger restriction (optional)
T(*'& o1 t.+33'.&"
a +efore
b $fter
c #or each row
d #or each statement (default)
a +efore)after=
It specifies when the trigger boby should be fired
In case of before5 the trigger will be e1ecuted before e1ecuting the triggering statement
In case of after5 it will be e1ecuted after the triggering statement
b #or each row)statement=
It decides if the trigger body to be fired once for each row affected by the triggering statement or
only once for the statementb e1ecuted
+y default5 the trigger fires for each statement
D+&%!+-3 T.+33'.&"
S(-t6 "$lter trigger NtriggerInameO disableH
E6)*!' " $lter trigger salesItrigger disableH
%pecific triggers on a table can be disabled as follows
$lter table purchaseIdetails disable purchaseH
$ll triggers on a table can be disabled on a table as follows
S(-t6 "$lter table NtableInameO disable all triggersH
E6)*!' "$lter table salesIdetails disables all triggersH
E-%!+-3 t.+33'."
S(-t6 "$lter table NtableInameO enable triggerInameH
E6)*!' "$lter table purchaseIdetails enable purchaseH
!o Enable all triggers
S(-t6 "$lter table NtableInameO enable all triggersH
D.o**+-3 t.+33'.&"
S(-t6 "Drop trigger NtriggerInameOH
E6)*!' "Drop trigger purchaseH
T.+33'. E6)*!'"
,.'t' t.+33'. overdraft-trigger 1t'. 5*4t' o- account
.'1'.'-,+-3 -'/ .o/ & nrow 1o. ',7 .o/ /7'- nrow.!alance N 8
T.+33'.+-3 E2'-t& -4 A,t+o-& +- SQL
!riggering e(ent can be +-&'.t5 4'!'t' or 5*4t'
!riggers on update can be restricted to specific attributes
E.3. ,.'t' t.+33'. overdraft-trigger 1t'. 5*4t' o1 !alance o- account
Values of attributes before and after an update can be
referenced .'1'.'-,+-3 o!4 .o/ & = for deletes
and updates .'1'.'-,+-3 -'/ .o/ & " for
inserts and updates
!riggers can be acti(ated before an e(ent5 which can ser(e as e1tra constraints Eg con(ert blan.s
to null
Stt')'-t L'2'! T.+33'.&
Instead of e1ecuting a separate action for each affected row5 a single action can be e1ecuted for
all rows affected by a transaction
o >se 1o. ',7 &tt')'-t instead of 1o. ',7 .o/
o >se .'1'.'-,+-3 o!4 t%!' or .'1'.'-,+-3 -'/ t%!' to refer to temporary tables (called
transition tables) containing the affected rows
o -an be more efficient when dealing with %&' statements that update a large number of rows
B7'- Not To U&' T.+33'.&
!riggers were used earlier for tas.s such as
maintaining summary data (eg total salary of each department)
Replicating databases by recording changes to special relations (called ,7-3' or 4'!t
relations) and ha(ing a separate process that applies the changes o(er to a replica
!here are better ways of doing these now=
Databases today pro(ide built in materiali/ed (iew facilities to maintain summary data
Databases pro(ide built-in support for replication
Encapsulation facilities can be used instead of triggers in many cases
Define methods to update fields
-arry out actions as part of the update methods instead of
through a trigger
$. SEC2(!T4
%ecurity of data is important concept in D+M% because it is essential to safeguard the data against
any unwanted users
It is a protection from malicious attempts to steal or modify data
T7'.' .' 1+2' 4+11'.'-t !'2'!& o1 &',5.+t(
1. Dt%&' &(&t') !'2'!
$uthentication and authori/ation mechanism to allow specific users access only to reEuired data
2. O*'.t+-3
:. N't/o.0 !'2'!
Protection from in(alid logins
#ile-le(el access protection
%"otection -"om imp"ope" #se o- s#pe"#se" a#tho"ity.
Protection from improper use of pri(ileged machine instructions
Each site must ensure that it communicates with trusted sites
'in.s must be protected from theft or modification of messages
M',7-+&)& 5&'4
Identification protocol (password based)
-ryptography
4. P7(&+,! !'2'!
So!5t+o-
Protection of eEuipment from floods5power failure etc
Protection of dis.s from theft5erasure5physical damage etc
Protection of networ. and terminal cables from wire tapes5non-in(asi(e electronic
ea(esdropping5physical damage5 etc
Replication hardware-mirrored dis.s5dual busses etc
Multiple access paths between e(ery pair of de(ices
Physical security by loc.s5police etc
5. H5)- !'2'!
So!5t+o-
%oftware techniEues to detect physical security breaches
Protection from stolen passwords5sabotage5etc
A5t7o.+;t+o-
4 #reEuent change of passwords
7 2se o- non-0#essab'e passo"ds.
< 'og all in(alid access attempts
C Data audits
B -areful hiring practices
Fo.)& o1 5t7o.+;t+o- o- *.t& o1 t7' 4t%&'"
Read authori/ation - allows reading5 but not modification of data
Insert authori/ation - allows insertion of new data5 but not modification of e1isting data
>pdate authori/ation - allows modification5 but not deletion of data
Delete authori/ation - allows deletion of data
Fo.)& o1 5t7o.+;t+o- to )o4+1( t7' 4t%&' &,7')"
Inde1 authori/ation - allows creation and deletion of indices
Resources authori/ation - allows creation of new relations
$lteration authori/ation - allows addition or deletion of attributes in a relation
Drop authori/ation - allows deletion of relations
T7' 3.-t &tt')'-t is used to gi(e authori/ation
S(-t6" 3.-t Npre(ilege listO on Nrelation name or (iew nameO to Nuser)role listO
E6)*!' " 3.-t &'!',t o- ,,o5-t to 8o7-A ).(M )) this Euery grants database users Dohn and
mary with select authori/ation
3.-t 5*4t'<,,o5-t= o- !o- to 8o7-A ).(M
R'2o0' &tt')'-t" It gets bac. the granted pre(ilege
S(-t6" .'2o0' Npre(ilege listO on Nrelation name or (iew nameO from Nuser)role
listOSrestrictQcascadeT
E6)*!' " .'2o0' &'!',t o- %.-,7 1.o) 8o7-A ).(M
.'2o0' 5*4t'<,,o5-t= o- !o- 1.o) 8o7-A ).(M
Re(ocation of a pri(ilege from a user may cause other users also to lose that pri(ilegeH referred to as
cascading of the re(o.e
:e can pre(ent cascading by specifying restrict=
re(o.e select on
!ranch from %
"
, %
#
, %
?
restrict
:ith restrict5 the re(o.e command fails if cascading re(o.es are reEuired
Ro!'&
Roles permit common pri(ileges for a class of users can be specified Dust once by creating a
co""espondin0 "o'e
Pri(ileges can be granted to or re(o.ed from roles5 Dust li.e user
Roles can be assigned to users5 and e(en to other roles
o create role teller
create role manager
o grant select on
!ranch to teller
grant update (!alance) on account to teller
grant all pri(ileges on account to manager
grant teller to manager
grant teller to alice, !o!
grant manager to avi
A5t7o.+;t+o- -4 V+'/&
>sers can be gi(en authori/ation on (iews5 without being gi(en any authori/ation on the
relations used in the (iew definition
$bility of (iews to hide data ser(es both to simplify usage of the system and to enhance security
by allowing users access only to data they need for their Dob
$ combination or relational-le(el security and (iew-'e*e' sec#"ity can be #sed to 'imit a
#se"5s
access to precisely the data that user needs
G.-t+-3 o1 P.+2+!'3'&
!he passage of authori/ation from one user to another may be represented by an authori/ation
grant graph
!he nodes of this graph are the users
!he root of the graph is the database administrator
-onsider graph for update authori/ation on loan
$n edge %
i
%
D
indicates that user %
i
has granted update authori/ation on loan to %
j
A5t7o.+;t+o- G.-t G.*7
A5t7o.+;t+o- G.-t G.*7
Requirement" $ll edges in an authori/ation graph must be part of some path originating with
the database administrator
If D+$ re(o.es grant from %
4
=
o 6rant must be re(o.ed from %
C
since %
4
no longer has authori/ation
o 6rant must not be re(o.ed from %
B
since %
B
has another authori/ation path from D+$
through %
7
Must pre(ent cycles of grants with no path from the root=
o D+$ grants authori/ation to %
7
o %
;
grants authori/ation to %
<
o %
F
grants authori/ation to %
7
o D+$ re(o.es authori/ation from %
7
If the database administrator re(o.es authori/ation from %75 %7 retains authori/ation through %<5
If authori/ation is re(o.ed subseEuently from %<5 %< appears to retain authori/ation through %7
:hen the database administrator re(o.es authori/ation from %<5 the edges from%< to %7 and from
%7 to %< are no longer part of a path starting with the database administrator
!he edges between %7 and %< are deleted5 and the resulting authori/ation graph is
A54+t& T.+!&
$n audit trail is a log of all changes (inserts)deletes)updates) to the database along with
information such as which user performed the change5 and when the change was performed
>sed to trac. erroneous)fraudulent updates
-an be implemented using triggers5 but many database systems pro(ide direct support
L+)+tt+o-& o1 SQL A5t7o.+;t+o-
%&' does not support authori/ation at a tuple le(el
o Eg we cannot restrict students to see only (the tuples storing) their own grades
:ith the growth in :eb access to databases5 database accesses come primarily from application
ser(ers
o End users donUt ha(e database user ids5 they are all mapped to the same database user id
$ll end-users of an application (such as a web application) may be mapped to a single database
user
!he tas. of authori/ation in abo(e cases falls on the application program5 with no support from
%&'
o B'-'1+t" fine grained authori/ations5 such as to indi(idual tuples5 can be implemented by the
application
o D./%,0" $uthori/ation must be done in application code5 and may be dispersed all o(er
an application
o -hec.ing for absence of authori/ation loopholes becomes (ery difficult since it reEuires
reading large amounts of application code
E-,.(*t+o-
:ata Encryption <tandard (DE%) substitutes characters and rearranges their order on the basis of an
encryption .ey which is pro(ided to authori/ed users (ia a secure mechanism %cheme is no more secure
than the .ey transmission mechanism since the .ey has to be shared
$d(anced Encryption %tandard ($E%) is a new standard replacing DE%5 and is based on the
RiDndael algorithm5 but is also dependent on shared secret .eys
Pu!lic-ey encryption is based on each user ha(ing two .eys=
o public key publicly published .ey used to encrypt data5 but cannot be used to decrypt data
o private key -- .ey .nown only to indi(idual user5 and used to decrypt data
*eed not be transmitted to the site doing encryption
Encryption scheme is such that it is impossible or e1tremely hard to decrypt data gi(en only the
public .ey
!he R%$ public-.ey encryption scheme is based on the hardness of factoring a (ery large number
(488Us of digits) into its prime components
A5t7'-t+,t+o- <C7!!'-3' .'&*o-&' &(&t')=
Password based authentication is widely used5 but is susceptible to sniffing on a networ.
-hallenge-response systems a(oid transmission of passwords
o D+ sends a (randomly generated) challenge string to user
o >ser encrypts string and returns result
o D+ (erifies identity by decrypting result
o -an use public-.ey encryption system by D+ sendin0 a messa0e enc"ypted #sin0
#se"5s
public .ey5 and user decrypting and sending the message bac.
D+3+t! &+3-t5.'& are used to (erify authenticity of data
o Pri(ate .ey is used to sign data and the signed data is made public
o $ny one can read the data with public .ey but cannot generate data without pri(ate .ey
o Digital signatures also help ensure nonrepudiation= sender
cannot later claim to ha(e not created the data
D+3+t! C'.t+1+,t'&
Digital certificates are used to (erify authenticity of public .eys
Problem= when you communicate with a web site5 how do you .now if you are tal.ing with the
genuine web site or an imposterK
o %olution= use the public .ey of the web site
o Problem= how to (erify if the public .ey itself is genuineK
%olution=
o E(ery client (eg browser) has public .eys of a few root-le(el certification authorities
o $ site can get its name)>R' and public .ey signed by a certification authority= signed
document is called a certificate
o -lient can use public .ey of certification authority to (erify certificate
o Multiple le(els of certification authorities can e1ist Each certification authority
presents its own public-.ey certificate signed by a higher le(el authority5 and
>ses its pri(ate .ey to sign the certificate of other web sites)authorities
9. EM+EDDED SL;
Embedded %&' are %&' statements included in the programming language
!he %&' standard defines embeddings of %&' in a (ariety of programming languages such as
-5 ,a(a5 and -obol
$ language to which %&' Eueries are embedded is referred to as a 7o&t !-353'5 and the %&'
structures permitted in the host language comprise em!edded %&'
!he embedded %&' program should be preprocessed prior to compilation
!he preprocessor replaces embedded %&'reEuests with host language declarations and procedure
calls
!he resulting program is compiled by host language compiler
E#EC SQL statement is used to identify embedded %&' reEuest to the preprocessor
o EGE- %&' Nembedded %&' statement O E*DIEGE-
Not'" this (aries by language (for e1ample, the Ja*a embeddin0 #ses \ SL; ] X. VQ 5 - language
uses semicolon instead of E*DIEGE-)
E6)*!' Q5'.(
#rom within a host language5 find the names and cities of customers with more than the (ariable
amount dollars in some account
%pecify the Euery in %&' and declare a cursor for it
EGE- %&'
4',!.' c ,5.&o. 1o.
&'!',t depositor.customer1name, customer1city
1.o) depositor, customer, account
/7'.' depositor.customer1name ; customer.customer1name
-4 depositor account1num!er ; account.account1num!er
-4 account.!alance 5 @amount
E*DIEGE-
!he o*'- statement causes the Euery to be e(aluated
EGE- %&' o*'- c E*DIEGE-
!he 1't,7 statement causes the (alues of one tuple in the Euery result to be placed on host
language (ariables
EGE- %&' 1't,7 c +-to =cn, @cc E*DIEGE-
Repeated calls to 1't,7 get successi(e tuples in the Euery result
$ (ariable called %&'%!$!E in the %&' communication area (%&'-$) gets set to C626665 to
indicate no more data is a(ailable
!he ,!o&' statement causes the database system to delete the temporary relation that holds the
result of the Euery
EGE- %&' ,!o&' c E*DIEGE-
*ote= abo(e details (ary with language #or e1ample5 the ,a(a embedding defines ,a(a iterators to step
through result tuples
U*4t'& T7.o537 C5.&o.&
-an update tuples fetched by cursor by declaring that the cursor is for update
o 4',!.' c ,5.&o. 1o. &'!',t J 1.o) account
/7'.' !ranch1name I C%e""y"id0e5
1o. 5*4t'
!o update tuple at the current location of cursor c
5*4t' account &'t !alance ; !alance 3 488 /7'.' ,5..'-t o1 c
1L. D4)AM!C SL;
!he Dynamic %&' allows programs to construct and submit %&' Eueries at run time
Dynamic %&' can be e1ecuted immediately or can be used later
!wo principle Dynamic %&' statements are
4 PREP$RE
7 EGE->!E
SL;S32(CE I 5De'ete -"om acco#nt he"e amo#nt^&6666Q
EGE- %&' PREP$RE %&'PREPPED #R0M=%&'%0>R-EH
EGE- %&' EGE->!E %&'PREPPEDH
%&'%0>R-E specifies the programming language (ariable
%&'PREPPED identifies the %&' (ariables It holds the compiled (ersion of %&' statement
whose source form is gi(en in %&'%0>R-E
!he prepare statement ta.es the source statement and prepares it to produce an e1ecutable
(ersion5 which is stored in %&'PREPPED
EGE->!E statement e1ecutes the %&'PREPPED (ersion
EGE->!E IMMEDI$!E statement combines the functions of PREP$RE and EGE->!E in a
single operation
C!! L'2'! I-t'.1,'
!he %&' -all 'e(el Interface ASL;_C;!B is based on Mic"oso-t5s 3penso#"e Data+ase
Connecti*ity
(0D+-)
!hey allow the applications to be written from which the e1act %&' code is not .nown until run
time
!wo principle reason for using %&')-'I
Dynamic %&' is a source code statement Dynamic %&' reEuires some .ind of %&'
compiler to process the operations li.e PREP$RE5 EGE->!E %&')-'I does not reEuir any
special compiler instead it uses the host language compiler It is in obDect code form
%&')-'I is D+M% independent ie5 it allows creation of se(eral applications with different
D+M%
E6)*!' 1o. %&')-'I
st"cpy AsO'so#"ce, De'ete -"om acco#nt he"e amo#nt`&6666BQ
rc M %&'E1ecDirect(hstmt5(%&'-2$RJ)sElsource5%&'*!%)H
%trcpy is used to copy the source form of delete statement into sElsource (ariable
%&'E1ecDirect e1ecutes the %&' %tatement contained in sElsource anf assigns the return code
to the (ariable rc
T/o &t-4.4& ,o--',t& - SQL 4t%&' -4 *'.1o.)& ?5'.+'& -4 5*4t'&.
0pensoure Data+ase -onnecti(ity (0D+-) was initially de(eloped for - language and e1tended
to other languages li.e -335 -W amd Visual +asic
,a(a Data+ase -onnecti(ity (,D+-) is an application program interface foe Da(a language
!he users and applications connects to an %&' ser(er establishing a session5 e1ecutes a series of
atatements and finally disconnects the session
In addition to normal %&' commands5 a session can also contains commands to commit the
wor. carried out or rollbac. the wor. carried out in a session
11.9!E:S
$ View is an obDect that gi(es the user a logical (iew of data from an underlying tables or tables
It is not desirable for all users to see the entire logical model
%ecurity consideration may reEuire that certain data be hidden from users
$ny relation that is not part of the logical model5 but is made (isible to a user as a (irtual relation5
is called as (iew
V+'/& )( %' ,.'t'4 1o. t7' 1o!!o/+-3 .'&o-&"
4 !o pro(ide data security
7 &uery simplicity
< %tructure simplicity
C.'t+-3 o1 V+'/&
VIEB"9is an imaginary table
S(-t6" -reate (iew N(iew nameO(column alias name) as Euery with condition
Q5'.(" create (iew custall as (%elect custIname5 city from customer)H
A&&+3-+-3 N)'& to Co!5)-&
E6)*!'" create (iew custall (customername5 city) as (%elect custIname5 city from customer)H
S'!',t+-3 4t 1.o) 2+'/" Display the (iew
E6)*!'"select J from custallH
U*4t+o- o1 V+'/
Views can be used for data manipulation ie5 the user can perform insert5 >pdate5a nd the delete
operations on the (iew
!he (iews on which data manipulation can be done are called 5*4t%!' V+'/&5 the (iews that do
not allow data manipulation are called R'4o-!( V+'/&
D'&t.o(+-3 2+'/
$ (iew can be dropped by using the drop (iew command
S(-t6" 4.o* (iew (iewInameH
E6)*!'" drop (iew custallH
12.!)T(3D2CT!3) T3 D!ST(!+2TED DATA+ASES A)D
C;!E)T_SE(9E( DATA+ASES
2 M.0 Q5 '&t+o- &
4 Define- relational algebra
7 :hat is a %E'E-! operationK
< :hat is a PR0,E-! operationK
C :rite short notes on tuple relational calculus
B :rite short notes on domain relational calculus
9 Define Euery languageK
; :hat are the two different categories of Euery languagesK
F :rite short notes on %chema diagram
L :hich condition is called referential integrityK E1plain its basic concepts
48 :hat are the parts of %&' languageK
47 :hat are the categories of %&' commandK
4< :hat are the three classes of %&' e1pressionK8r E1plain the basic structure of an %&' e1pression
4C 6i(e the general form of %&' EueryK
4B :hat is the use of rename operationK
49 Define tuple (ariableK
4; 'ist the string operations supported by %&'K
4F 'ist the set operations of %&'K
4L :hat is the use of >nion and intersection operationK
78 :hat are aggregate functionsK $nd list the aggregate functions supported by %&'K
74 :hat is the use of group by clauseK
77 :hat is the use of sub EueriesK
7< :hat is (iew in %&'K 2ow is it definedK
7C :hat is the use of with clause in %&'K
7B 'ist the table modification commands in %&'K
79 'ist out the statements associated with a database transactionK
7; :hat is transactionK
7F 'ist the %&' domain !ypesK
7L :hat is the use of integrity constraintsK
<8 Mention the 7 forms of integrity constraints in ER modelK
<4 :hat is triggerK
<7 :hat are domain constraintsK
<< :hat are referential integrity constraintsK
<C :hat is assertionK Mention the forms a(ailable
<B 6i(e the synta1 of assertionK
<9 :hat is the need for triggersK
<; 'ist the reEuirements needed to design a trigger
<F 6i(e the forms of triggersK
<L :hat does database security refer toK
C8 'ist some security (iolations (or) name any forms of malicious access
C4 'ist the types of authori/ation
C7 :hat is authori/ation graphK
C< 'ist out (arious user authori/ation to modify the database schema
CC :hat are audit trailsK
CB Mention the (arious le(els in security measures
C9 *ame the (arious pri(ileges in %&'K
C; Mention the (arious user pri(ileges
CF 6i(e the limitations of %&' authori/ation
CL 6i(e some encryption techniEuesK
B8 :hat does authentication referK
B4 'ist some authentication techniEues
B7 :hat is embedded %&'K :hat are its ad(antagesK
1@ M.0 Q5 '&t+o- &
4 Discuss about (arious operations in Relational algebra (#undamental operations $dditional operation)
7 Discuss in detail about an Integrity5 !riggers and %ecurity
< E1plain Embedded and Dynamic %&'
C E1plain %tring 0perations and $ggregate functions used in %&'
B E1plain detail in domain relational calculus
9 E1plain detail in !uple relational calculus
; E1plain detail in distributed databases and client)ser(er databases
2)!T !!!
DATA+ASE DES!<)
#unctional Dependencies *on-loss Decomposition #unctional Dependencies #irst5 %econd5 !hird
*ormal #orms5 Dependency Preser(ation +oyce)-odd *ormal #orm- Multi-(alued Dependencies and
#ourth *ormal #orm ,oin Dependencies and #ifth *ormal #orm
1. !)T(3D2CT!3)
Relational database design reEuires a 0ood co''ection o- "e'ation schemas
P+t91!!& +- R'!t+o-! Dt%&' D'&+3-
$ bad design may lead to
Repetition of information
Inability to represent certain information
D'&+3- Go!&
a) $(oid redundant data
b) Ensure that relationships among attributes are represented
c) #acilitate the chec.ing of updates for (iolation of database integrity constraints
E6)*!'" -onsider the relation schema=
'ending-schemaM (branchIname5 branchIcity5 assets5c ustomerIname5 loanIno5 amount)
%.-,7F-)' %.-,7F,+t( &&'t& ,5&to)'.F-)' !o-F-o )o5-t
Downtown
Redwood
Porryride
Downtown
+roo.lyn
Palo $lto
2orsenec.
+roo.lyn
L85885888
745885888
4;5885888
L85885888
,ones
%mith
2ayes
,ac.son
'-4;
'-7<
'-4B
'-4C
4888
7888
4B88
<B88
2ere branch Downtown details are represented 7 times !his leads to a redundancy problem
Redundancy leads to
(a) :astage of space
(b) -omplicates updating5 introduces inconsistency
N5!! 2!5'&
(a) -annot store information about a branch if no loan e1ist
(b) -an use null (alues5 but they are difficult to handle
2. ,2)CT!3)A; DE%E)DE)C!ES
#unctional dependencies are constraints on the set of legal relations
!he functional dependency holds on R if and only if for any legal relations r(R)5 whene(er
any two tuples t
4
and t
7
of r agree on the attributes 5 they also agree on the attributes . !hat is5
t
4
S T M t
7
S T t
4
S T M t
7
S T
It reEuires that the (alue for a certain set of attributes determines uniEuely the (alue for another set
of attributes
In a gi(en relation R5 G and ? are attributes $ttributes ? is functionally dependent on attribute G if
each (alue of G determines e1actly one (alue of ?5 which is represented as
G O ?
i.e., P dete"mines 4 o" 4 is -#nctiona''y dependent on P
G O ? does not imply ? O G
#or e1ample5 in a student relation the *a'#e o- an att"ib#te Ma"/s is .nown then the (alue of
an
att"ib#te <"ade is dete"mined since
Mar.s O 6rade
T(*'&
(a) #ull functional dependency
(b) Partial functional dependency
(c) !ransiti(e functional dependency
<=F5!! 4'*'-4'-,+'&
In a relation R5 G and ? are attributes G functionally determines ? %ubset of G should not
functionally determine ?
In the abo(e e1ample mar.s is fully functionally dependent on studentIno and courseIno together
and not on subset of PstudentIno5 courseInoR
!his means mar.s cannot be determined either by studentIno or courseIno aloneIt can be
determined only using studentIno and courseIno together
2ence mar.s are fully functionally dependent on PstudentIno5 courseInoR
<%=P.t+! 4'*'-4'-,+'&
$ttribute ? is partially dependent on the attribute G only if it is dependent on a subset of attribute G
#or e1ample courseIname5 InstructerIname are partially dependent on composite attributes Pstudent-
no5courseInoR because courseIno alone defines courseIname5 InstructorIname
<,=T.-&+t+2' 4'*'-4'-,+'&
G5 ? and X are < attributes in the relation R
G O ?
? O X
G O X
#or e1ample5 grade depends on mar.s and in turn mar. depends on PstudentIno
courseInoR5 hence 6rade depends fully transiti(ely on PstudentIno @ courseInoR
U&' o1 F5-,t+o-! D'*'-4'-,+'&
:e use functional dependencies to=
o !est relations to see if they are legal under a gi(en set of functional dependencies
If a relation r is legal under a set , of functional dependencies5 we say that r satisfies
,.
o specify constraints on the set of legal relations
:e say that , holds on R if all legal relations on R satisfy the set of functional
dependencies ,.
2.1. CLOSURE OF A SET OF FUNCTIONAL DEPENDENCIES
6i(en a set of functional dependencies ,5 there are certain other functional dependencies that are
logically implied by ,
o #or e1ample= If A A and A C5 then we can infer that A -
!he set of all functional dependencies logically implied by , is the closure of ,
:e denote the closure of , by ,
B
.
:e can find all #
B
by applying A.)&t.o-3T& A6+o)&"
o R'1!'6+2+t( R5!'
If is a set of attributes and 5 then holds
o A53)'-tt+o- R5!'
If , then is a set of attributes5 then holds
o T.-&+t+2+t( R5!'
If holds and holds then holds
!hese rules are
o %ound (generate only functional dependencies that actually hold) and
o -omplete (generate all functional dependencies that hold)
In addition to these three basic rules there are three additional rules to simplify manual
computation of ,
3

o U-+o- R5!'
If holds and holds5 then holds
o D',o)*o&+t+o- R5!'
If holds5 then holds and holds
o P&'54ot.-&+t+2+t( R5!'
If holds and holds5 then holds
E6)*!'"
-onsider the schema R ; (A, A, C, -, C, D$
%et of functional dependency , ; P$ A
A C
C- C
C- D
A CE
some members of ,
3
o A C
+y transiti(ity from A A and A C then A C holds
o A- D
+y augmenting A C with 65 to get A- C- and then transiti(ity with C- D
we get A- D.
o C- CD
+y union rule of C- C and C- D, C- CD holds
!he left-hand and right-hand sides of a functional dependency are both subsets of R
P.o,'45.' 1o. Co)*5t+-3 F
U
" !o compute the closure of a set of functional dependencies #=
,
3
M ,
.'*'t
1o. ',7 functional dependency f in ,
3
apply refle1i(ity and augmentation rules on f
add the resulting functional dependencies to ,
3
1o. ',7 pair of functional dependencies f
4
and f
7
in ,
3
+1 f
4
and f
7
can be combined using transiti(ity
t7'- add the resulting functional dependency to ,
3
5-t+! ,
3
does not change any further
$ set of si/e n has 7
n
subsets5 there are a total of 7 F 7
n
M 7
n34
possible functional
dependencies5 where n is the number of attributes in R
Each iteration of the loop e1cept the last iteration adds at least one functional dependency to ,3
2.2. CLOSURE OF ATTRIBUTES SETS
$n attribute is functionally determined by if holds
6i(en a set of attributes 5 define the closure of (
3
) under , as the set of attributes that are
functionally determined by under ,
$lgorithm to compute
3
5 the closure of under ,
result =M aH
/7+!' (changes to result) 4o
1o. ',7 +- , 4o
%'3+-
'-4
+1 result t7'- result =M result
E6)*!' o1 Att.+%5t' S't C!o&5.'
R ; (A, A, C, -, C, D$
, ; PA A
A C
C- C
C- D
A CR
!o compute closure of A, (A-$
3
!he algorithm start with result M $6
4 result ; A-
7 A G A includes A in result %ince AG A is in , and A result5 so result@ ; result
A
< result ; AAC- (A C $
< result ; AAC-C (C- C and C- A-AC$
C result ; AAC-CD (C- D and C- A-ACC$
U&'& o1 Att.+%5t' C!o&5.'
!here are se(eral uses of the attribute closure algorithm=
!esting for super .ey=
o !o test if is a super .ey5 we compute
35
and chec. if
3
contains all attributes of R
!esting functional dependencies
o !o chec. if a functional dependency holds (or5 in other words5 is in ,
3
)5 Dust chec. if
3

o !hat is5 we compute


3
by using attribute closure5 and then chec. if it contains
o Is a simple and cheap test5 and (ery useful
2.:. CANONICAL COVER
If a relational schema R has a set of functional dependencies
:hene(er a user performs an update on the relation5 the database system must ensure that the
update does not (iolate any functional dependencies.
!he system must roll bac. the update if it (iolates any functional dependencies in the set ,.
!he (iolation can be chec.ed by testing a simplified set of functional dependencies
If simplified set of functional dependency is satisfied then the original functional dependency is
satisfied and (ice (ersa
%ets of functional dependencies may ha(e redundant dependencies that can be inferred from the
others
$ canonica' co*e" o- , is a minima' set o- -#nctiona' dependencies eO#i*a'ent to ,,
ha*in0 no
redundant dependencies or redundant parts of dependencies
E6t.-'o5& Att.+%5t'&
$n attribute of a functional dependency is said to be '6t.-'o5& if we can remo(e it without
changing the closure of the set of functional dependencies
-onsider a set , of functional dependencies and the functional dependency in ,
o $ttribute $ is e1traneous in if A and , logically implies
(, P R) P( A) R
o $ttribute A is e1traneous in if A and the set of functional dependencies
(, P R) P ( A)R logically implies ,.
E1ample= 6i(en , M PA C5 AA C R
o A is e1traneous in AA C because PA C, AA CR logically implies A C
E1ample= 6i(en , M HA C5 AA C:E
o C is e1traneous in AA C: since $+ C can be inferred e(en after deleting C.
T'&t+-3 +1 - Att.+%5t' +& E6t.-'o5&
-onsider a set , of functional dependencies and the functional dependency in ,
!o test if attribute $ is e1traneous in
&. compute (P R $)
3
using the dependencies in ,
2. chec. that (P R $)
3
contains H if it does5 A is e1traneous in
!o test if attribute A is e1traneous in
&. compute
3
using only the dependencies in
(, P R) P ( A)R5
2. chec. that
3
contains AI if it does, A is e1traneous in
E1ample= # contains AA C:, A E and E C. to chec. C is e1traneous in AA C:, we compute
the attribute closure of $+ under
o ,5I] AA C:, A E, E CE
!he closure is AAC:E5 which includes C: %o C is e1traneous
D'1+-+t+o- o1 C-o-+,! Co2'.
$ canonical cover for , is a set of dependencies ,
c
such that
o , logically implies all dependencies in ,
c,
and
o ,
c
logically implies all dependencies in ,, and
o *o functional dependency in ,
c
contains an e1traneous attribute5 and
o Each left side of functional dependency in ,
c
is uniEue
!o compute a canonical co(er for ,=
#
c
M#
.'*'t
>se the union rule to replace any dependencies in ,
4 4
and
4 7
with
4 4 7
#ind a functional dependency with an
e1traneous attribute either in or in
If an e1traneous attribute is found5 delete it from
5-t+! , does not change
Co)*5t+-3 C-o-+,! Co2'.
R M (A, A, C$
, ; HA AC
A C
A A
AA CR
+y union rule combine A AC and A A into A AC
o %et is now HA AC, A C, AA CR
A is e1traneous in AA C
o -hec. if the result of deleting $ from AA C is implied by the other dependencies
$fter deleting $ from AA C the resultant set will be HA AC, A C, A CR
A C is already present in the set
o %o the resultant set is now HA AC, A CR
C is e1traneous in A AC
o Remo(ing C from A AC we get PA A, A -R
!he canonical co(er is=
A A
A C
-anonical co(er might not be uniEue
:. )3(MA;!aAT!3)
*ormali/ation of data is a process of analy/ing the gi(en relational schema based on their
functional dependencies and primary .ey to achie(e the desirable properties of
Minimi/e redundancy
Minimi/e insert5 delete and update anomalies during database acti(ities
*ormali/ation is an essential part of database design
!he concept of normali/ation helps the designer to built efficient design
P5.*o&' o1 No.)!+;t+o-"
Minimi/e redundancy in data
Remo(e insert5 delete and update anomaly during database acti(ities
CS1254 DATABASE MANAGEMENT
SYSTEMS
9
0
Reduce the need to reorgani/e data when it is modified or enhanced
*ormali/ation reduces a comple1 user (iew to a set of small and stable subgroups of
fields)relations
!his process helps to design a logical data model .nown as conceptual data model
No.)!+;t+o- Fo.)&" Different normali/ation forms are=
1. F+.&t -o.)! 1o.) <1NF="
$ relation is said to be in the first normal form if it is already in unnormali/ed form and it has
-o .'*'t+-3 3.o5*
2. S',o-4 -o.)! 1o.) <2NF="
$ relation is said to be in second normal form if it is already in the first normal form and it
has -o *.t+! 4'*'-4'-,(
:. T7+.4 -o.)! 1o.) <:NF="
$ relation is said to be in third normal form if it is already in second normal form and it has
-o t.-&+t+2' 4'*'-4'-,(
4. Bo(,'9Co44 -o.)! 1o.)<BCNF="
$ relation is said to be in +oyce--odd normal form if it is already in third normal form and
'2'.( 4't'.)+--t +& ,-4+4t' 0'( It is a stronger (ersion of <*#
5. Fo5.t7 -o.)! 1o.) <4NF= "
$ relation is said to be in fourth normal form if it is already in +-*# and it has -o
)5!t+2!5'4 4'*'-4'-,(.
@. F+1t7 -o.)! 1o.) <5NF= "
$ relation is said to be in B*# if it is already in <*# and has -o 8o+- 4'*'-4'-,(
.. ,!(ST )3(MA; ,3(M A1),B
4*# does not allow multi (alued attribute or composite attribute and their combinations
It states that domain of the attribute includes only single (alue5 atomic or indi(isible (alue
4*# does not allow relation within relation
E1ample= -onsider the following schema Department
D'*.t)'-t
Dname Dnumb e r DmgrIssn Dlocation
D'*.t)'-t
CS1254 DATABASE MANAGEMENT
SYSTEMS
9
1
Dname Dnumb e r DmgrIssn Dlocation
Research B <<<CCBBBB P+ellaire5 %ugsrland5
2oustonR
$dministration C LF;9BC<74 %tafford
headEuarters 4 FFF99BBBB 2ouston
In our e1ample Departmentrelation is not in 4*# because Dlocation has multi(alued attributes
!here are < main techniEues to achie(e 4*# for such relation
4 Remo(e the Dlocation that (iolates 4*# and place it in a separate relation DeptIlocation
along with primary .ey Dnumber of department !he primary .ey of this relation is the
combination of PDnumber5 DlocationR
D'*tF!o,t+o-
Dnumb e r Dlo ca ti on
B +ellaire
B %ugsrland
B 2ouston
C %tafford
4 2ouston
7 E1pand the .ey so that there will be separate tuple in the original department relation !he
primary .ey becomes PDnumber5 DlocationR !his solution has the disad(antage of
introducing redundancy in the relation
Dname Dnumb e r DmgrIssn Dlo ca ti on
Research B <<<CCBBBB +ellaire
Research B <<<CCBBBB %ugsrland
CS1254 DATABASE MANAGEMENT
SYSTEMS
9
2
Research B <<<CCBBBB 2ouston
$dministration C LF;9BC<74 %tafford
headEuarters 4 FFF99BBBB 2ouston
< If a ma1imum number of (alues is .nowm for the attribute #or e1ample5 if it is .nown that
atmost three locations can e1ist for a department5 and then replace Dlocation by Dlocation45
Dlocation 75 and Dlocation< !his solution has the disad(antage of introducing null (alues if
most departments ha(e fewerthan three locations
Dname Dnumb e r DmgrIssn Dlocation
4
Dlocation
7
Dlocation<
Research B <<<CCBBBB +ellaire %ugsrland 2ouston
$dministration C LF;9BC<74 %tafford *ull *ull
headEuarters 4 FFF99BBBB 2ouston *ull *ull
4*# does not allow nested relation
EMPIPR0,
PR0,%
Eid Ename P number 2ours
!his schema can also be represented as
EMPIPR0, (Eid5 Ename5 PPR0,% (Pnumber5 2ours)R)
!o normali/e this nested relation into 4*#5 we remo(e the nested relation attribuyes into a new
relation and propagate the primary .ey into it
Primary .ey of the new relation will be the partial .ey with the primary .ey of the original relation
EMPIPR0, (Eid5 Ename5 PPR0,% ( Pnumber5 2ours)R)
EMPIPR0,4 EMPIPR0,7
Eid Ename Eid Pnumber 2ours
5. SEC3)D )3(MA; ,3(M A2),B
7*# is based on the concept of full functional dependency
$ functional dependency JK is 15!! 15-,t+o-! 4'*'-4'-,( if remo(al of any attribute A from J
means that the dependency does not hold any more
+.'.A any attribute A J, (J-HAE$ does not functionally determine K
$ functional dependency JK is *.t+! 15-,t+o-! 4'*'-4'-,( if some attribute A J remo(ed
from J and the functional dependency still holds
+.'A for some A J, (J-HAE$K holds
E1ample
Peid5 PnumberRhours is partial functional dependency
Peid5 PnumberREname is partial functional dependency because eid Ename holds
!he test for 7*# in(ol(es testing for #Ds whose '2% attribute are parts of the P" If the P"
contains a single attribute5 the test need not be applied at all
$ relational schema R is in 7*# if e(ery nonprime attribute $ in R is full functional dependent
on the P" of R
P.+)' tt.+%5t'" $n attribute of a relational schema R is called a Prime attribute of R if it is a
member of some candidate .ey of R
No-*.+)' tt.+%5t'" $n attribute is called a nonprime attribute if it is not a prime attribute ie5 if
it is not a member of any candidate .ey
E6)*!'
EMPIPR0,
% sn P number 2ours Ename Pname Plocation
In the abo(e e1ample EMPIPR0, %sn and Pnumber are primary .ey
!he table is in 4*#
#D4 is in 7*# but #D7 and #D< (iolates 7*f
!he Ename5 Pname5 Plocation in #D7 and #D< are partially dependent on the primary .ey attributes
%sn and Pnumber
$ relation which is not in second normal form can be made to be in 7*# by decomposing the
relation into a number such that each nonprime attribute is fully functional dependent on the primary
.ey
% sn P number 2ours
%o the abo(e table can be decomposed in to three tables
EMPIPR0,
% sn P number 2ours Ename Pname Plocation
#D4
#D7
#D<
EP4
#D4
EP7
% sn Ename
#D7
EP<
P number Pname Plocation
#D<
=. T>!(D )3(MA; ,3(M A1),B
!hird *ormal #orm is based on the concept of transiti(e dependency
$ relational schema R is in <*# if it satisfies 7*# and no nonprime attribute in relation R is
transiti(ely dependent on the primary .ey
$ functional dependency G? in a relational schema R is a transiti(e dependency if there is a set
of attributes X that is neither a candidate .ey or a subset of any .ey R5 and both GX and X?
hold E6)*!'"
EMPIDEP!
Ename Eid D0+ $ddress Dnumber Dname DM6Rid
ED4
Ename Eid D0+ $ddress Dnumber
ED7
Dnumber Dname DM6Rid
!he dependency EidDMRid is transiti(e through Dnumber in EM0IDEP!5 because both the
dependencies EidDnumber and DnumberDM6Rid hold
Dnumber is neither a .ey itself nor a subset of .ey of EMPIDEP! therefore the EMPIDEP!
relational schema is not in <*#
!he relation is in 7*# because there is no partial dependencies on the .ey attribute
:e can normali/e EMPIDEP! by decomposing it into two <*# relational schemas ED4 and ED7
S5)).( o1 No.)! Fo.)&
No.)! 1o.) T'&t R')'4(
1NF
Relation should ha(e no multi(alued
attributes or nested relations
#orms new relations for each
multi(alued attributes or nested
relations
2NF
#or relations where primary .ey
contains multiple attribute5 no non .ey
attribute should be functionally
dependent on a pert of primary .ey
Decomposes and set up a new relation
for each partial .ey with its dependent
attributes Ma.e sure to .eep a relation
with the original primary .ey and any
attributes that are fully #D on it
:NF
Relations should not ha(e a non .ey
attribute functionally determined by
another non .ey attribute ie5 there
should be no transiti(e dependency of
a non .ey attribute on the primary .ey
Decompose and set up a relation that
includes the non .ey attributes that
functionally determines other non-.ey
attributes
>. ;3SS ;ESS DEC3M%3S!T!3)
'et R be a relational schema and # be a set of functional dependencies on R
'et R4 and R7 form a decomposition of R
'et r(R) be a relation with schema R
!he decomposition is lossless decomposition if
R4
(r)
R7
(r) M r
If natural Doin is computed on R4 and R7 then we get the relation r
$ decomposition that is not a lossless decomposition is called lossy decomposition
!he lossless Doin decomposition is also called lossless decomposition and the lossy Doin
decomposition is called lossy decomposition
R4 and R7 form a lossless decomposition of R if at least one of the following functional dependency
is in #
3
o R4 R7R4
o R4 R7R7
If R4 R7 forms a super .ey of either R4 or R75 the decomposition of R is a lossless decomposition
$ttribute closure can be used to calculate super.ey
E1ample= -onsider the following schema
borIloan M (customerIid5 loanInumber5 amount)
If it is decomposed into
borrower M (customerIid5 loanInumber)
loan M (loanInumber5 amount)
rule
2ere borrower M loanInumber and loanInumberamount5 satisfies lossless decomposition
K. DE%E)DE)C4 %(ESE(9AT!3)
'et , be a set of functional dependencies on a schema R5 and let R45 R7, . . . , Rn be a
decomposition of R
!he .'&t.+,t+o- of , to Ri is the set ,i of all functional dependencies in ,3 that include only
attributes of Ri
E1ample
, M HA G A, A G CE
!he restriction of , is A G C5 since A G C is in ,35 e(en though it is not in ,
E(en though ,LM ,, ,5
3
M#
3
he"e ,5I,& #7 #< #n
The decomposition ha*in0 the p"ope"ty ,5
3
M#
3
is a 4'*'-4'-,(9*.'&'.2+-3 4',o)*o&+t+o-
A!3o.+t7) to t'&t 4'*'-4'-,( *.'&'.2t+o-
compute ,
3
H
1o. ',7 schema Ri in : 4o
%'3+-
,L@ Ib
'-4
,i = M the restriction of ,
3
to RiH
1o. ',7 restriction ,i 4o
%'3+-
'-4
compute ,L
B
H
,L M ,L ,i
+1 (,L
B
M ,
3
) t7'- return (true)
'!&' return (false)H
The inp#t to the a'0o"ithm is a set o- decomposed "e'ationa' schemas D I ](&, (2, (1X,
(nV and a
set # of functional dependencies
!his algorithm is e1pensi(e since it reEuires the computation of #
3
!he second alternati(e method to calculate dependency preser(ation is as follows
!he test is applied to each P R in #
result M N
/7+!' (changes to result) 4o
1o. ',7 Ri in the decomposition
t M (result ORi)
3
O
Ri result M result t
If result contains all attributes in P5 then the functional dependency N G P is preser(ed
9. +34CEEC3DD )3(MA; ,3(M A+C),B
$ relational schema R is in +-*# with respect to a set # of functional dependencies5 if for all
#D in #
3
of the form P R5 where R and R at least one of the following holds
P R is a tri(ial dependency (ie5 )
is a super .ey for the schema R
BCNF D',o)*o&+t+o- A!3o.+t7)
result =M PRRH
done =M falseH
compute ,
3
H
/7+!' <-ot done$ 4o
+1 (there is a schema R
i
in result that is not in +-*#)
t7'- %'3+-
let be a nontri(ial functional
dependency that holds on R
i
such that R
i
is not in ,
3
5
and ; H
result =M (result Q R
i
$ (R
i
Q ) ( 5 $I
E1ample=
'-4
'!&' done =M t.5'M
-onsider the following relational schema
4 Customer-schema M (customer-name5 customer-street5 customer-city)
customer-name G customer-street, customer-city
#. Aranch-schema M (!ranch-name5 assets5 !ranch-city)
!ranch-name G assets, !ranch-city
?. Roan-info-schema M (!ranch-name5 customer-name5 loan-num!er5 amount)
loan-num!er G amount, !ranch-name
Customer-schema is in +-*# %ince customer-name is a candidate .ey5 functional dependencies with
customer-name on the left side do not (iolate the definition of +-*#
%imilarly5 the relation schema Aranch-schema is also in +-*# !he schema Roan-info-schema is not
in +-*# #irst5 note that loan-num!er is not a super.ey for Roan-info-schema5 since we can ha(e a pair
of tuples with a single loan for e1ample5
(Downtown5 ,ohn +ell5 '-CC5 4888)
(Downtown5 ,ane +ell5 '-CC5 4888)
loannum!er is not a candidate .ey
2owe(er5 the functional dependency loan-num!er G amount is nontri(ial !herefore5 Roan-info-schema
does not satisfy the definition of +-*#
If se(eral customer names are associated with a loan5 then branch name and the amount is repeated
once for each customer
!his can be eliminated by redesigning the database such that all schemas are in +-*#
0ne approach to this problem is to ta.e the e1isting non- +-*# design as a starting point5 and to
decompose those schemas that are not in +-*#
-onsider the decomposition of Roan-info-schema into two schemas=
Roan-schema M (loan-num!er, !ranch-name, amount)
Aorrower-schema M (customer-name, loan-num!er)
!he decomposition is based on the following e1pression
R
4
M ( 5 $
R
7
M (R
i
Q )
!his decomposition is a lossless-Doin decomposition !o determine whether these schemas are in +-*#5
we need to determine what functional dependencies apply to them I
n this e1ample5 it is easy to see that loan-num!er G amount, !ranch-name applies to the Roan-
schema5 and that only tri(ial functional dependencies apply to Aorrower-schema
$lthough loan-num!er is not a super.ey for Roan-info-schema5 it is a candidate .ey for Roan-schema
!hus5 both schemas of our decomposition are in +-*#
T'&t+-3 D',o)*o&+t+o- 1o. BCNF
!o test if a relation is in +-*# the following can be done=
4 !o chec. if a nontri(ial dependency N G P causes a (iolation of +-*#5 compute N3 and
(erify that it includes all attributes of RH that is5 it is a super.ey of R
7 !o chec. if a relation schema R is in +-*#5 it sufficient to chec. only if the dependencies in the
set , does not (iolate +-*#5 rather than to chec. all dependencies in ,3
If none of the dependencies in , causes a (iolation of +-*#5 then none of the dependencies in ,3 will
cause a (iolation of +-*# either It is not true when a relation is decomposed
$n alternati(e +-*# test to chec. if a relation Ri, decomposition of R is in +-*#5 the following test
can be applied=.
#or e(ery subset N of attributes in Ri5 chec. that N3 either includes no attribute of Ri-N5 or includes
all attributes of Ri
1L.M2;T!E9A;2ED DE%E)DE)C!ES A)D ,32(T> )3(MA; ,3(M
CS1254 DATABASE MANAGEMENT
SYSTEMS
10
0
MULTI9VALUED DEPENDENCIES <MVD=
$ Multi-(alued Dependencies G GG ? on relation schema R5 where G and ? are both subset of R5
specifies the following constraints any relation state r of R
If two tuples t
4
and t
7
e1ist in r such that t
4
SGT M t
7
SGT5 then two tuples t
<
and t
C
should e1ist in r
with the following properties
t
<
SGT M t
C
SGT M t
4
SGT M t
7
SGT
t
<
S?T M t
4
S?T and t
C
S?T M t
7
S?T
t
<
SXT M t
7
SXT and t
C
SXT M t
4
SXT
where X is used to denote (R- (G ?))
G GG ? holds5 we say that G multidetermines ?
$ MVD is tri(ial Multi-(alued Dependencies if
a ? is a subset of G
b G ? M R
In MVD G GG ?5 G GG X can be written as G GG ?)X
I-1'.'-,' .5!'& 1o. 15-,t+o-! -4 )5!t+2!5'4 4'*'-4'-,+'&
1. IR1 (refle1i(e rule for #Ds) = If J K5 then JK
2. IR2 (augmentation rule for #Ds) = PJKRMJSKS
. IR: (transiti(e rule for #Ds) = HJK, KSE;JS
4. IR4 (complementation rule for #Ds) @ H J GG KE ; HJGG(R-J K$$E
5. IR5 (augmentation rule for MVDs) = if J GG K and w S, then wJGG KS
@. IR@ (transiti(e rule for MVDs) = if H J GG K, J GG S E ; J GG (S-K$
>. IR> (replication rule for MVDs) = H J GG KE;H . GG yE
K. IRK (coalescence rule for #Ds and MVDs) =
if H J GG KE and there e.ists w with the properties that
a) w y is empty
b) w/ and
c) y /5 then 1/
FOURTH NORMAL FORM
$ relational schema R is in C*# with respect to a set of dependency # if for e(ery non tri(ial
multi(alued dependency J GG K in #
3
5 G is a super .ey for R
E1ample
En a me Dn a me
%mith ,ohn
%mith $nna
CS1254 DATABASE MANAGEMENT
SYSTEMS
10
1
EMP
En a me P n a me Dn a me
%mith G ,ohn
%mith ? $nna
%mith G $nna
%mith ? ,ohn
EMPIPR0,E-!% EMPIDEPE*DE*!%
En a me P n a me
%mith G
%mith ?
If the relation has nontri(ial MVDs5 then insert5 delete and update operations on single tuple
may cause additional tuples to be modified
!o o(ercome these anomalies the relation is decomposed into C*#
P.o,'45.' 1o. 4NF"
Input= $ uni(ersal relation R and a set of functional and multi(alued dependencies #
%et D =MP RR
:hile there is a relation schema & in D that is not in C*#5 do
P
-hoose a relation schema &in D that is not in C*#H
#ine a nontri(ial MVD J GG K that (iolates C*#H
Replace &in D by two relation schemas (&-?) and (G ?)H
RH
11.J3!) DE%E)DE)C!ES A)D ,!,T> )3(MA; ,3(M
JOIN DEPENDENCIES
$ Doin dependency (,D) denoted by ,D (R
4
5 R
7
5 R
<
X (
n
) specified on relation schema R5 specifies
a constraint on the state r of R
% n a me P a rtIn a me P roDIn a me
%mith +olt G
%mith *ut ?
$dams.y +olt ?
:alton *ut X
$dams.y *ail G
$dams.y +olt G
%mith +olt ?
CS1254 DATABASE MANAGEMENT
SYSTEMS
10
2
!he constraint states that e(ery legal state r of R should ha(e a nonadditi(e Doin decomposition into
R
4
5 R
7
5 R
<
X (
n
ie5 for e(ery such relation r we ha(e
(
R4
(r)A
R7
A"BX
Rn
(r)) M r
,D denoted as ,D (R
4
5 R
7
) implies an MVD (R
4
R
7
) GG (R
4
- R
7
)
FIFTH NORMAL FORM <5NF=
$ relational schema R is in fifth normal form or ProDect ,oin *ormal #orm (P,*#) with respect to a
set # of functional5 multi(alued and Doin dependency if5 for e(ery nontri(ial Doin dependency ,D (R
4
5 R
7
5
R
<
X (
n
) in #
3
5 e(ery R
i
is a super.ey of R
SUPPLY
% n a me P a rtIn a me
%mith +olt
%mith *ut
$dams.y +olt
:alton *ut
!he supply relation is decomposed into three relations R45 R75 R< that are in B*#
% n a me P roDIn a me
%mith G
%mith ?
$dams.y ?
:alton X
$dams.y G
CS1254 DATABASE MANAGEMENT
SYSTEMS
10
3
$dams.y *ail
R4
R7
R<
P a rtIn a me P roDIn a me
+olt G
*ut ?
+olt ?
*ut X
*ail G
ANOMALIES IN DATABASES
!here are three types of anomalies !hey are
4 Insert $nomalies
7 >pdate $nomalies
< Delete $nomalies
1. I-&'.t A-o)!+'&"
!he inability to insert part of information into a relational schema due to the una(ailability of part of
the remaining information is called Insert $nomalies
E6)*!'" !- the"e is a 0#id ha*in0 no "e0iste"ed #nde" him, then e can not inse"t the
0#ide5s
information in the schema proDect
2. U*4t' A-o)!+'&"
>pdation of relation schema with redundancy may lead to update anomalies
E6)*!'" If a person changes his address then the updation should be carried out where(er the
copies occur If it is not updated properly then data inconsistency arises
:. D'!'t' A-o)!+'&"
If the deletion of some information leads to loss of some other information5 then we say there is a
deletion anomaly
E6)*!'" If a guide guides one student and if the student discontinues the course then the
information about the guid will be lost
2 M . 0 Q 5 ' &t+o - &
CS1254 DATABASE MANAGEMENT
SYSTEMS
10
4
&. :hat is meant by functional dependenciesK
7 :hat are the uses of functional dependenciesK
< :hat is #ully #unctional dependencyK
C :hat is Partial #unctional dependenciesK
B :hat is !ransiti(e #unctional dependenciesK
9 :hat are a1iomsK
; :rite the inference rule for functional dependencies
F :hat is meant by computing the closure of a set of functional dependencyK
L Define canonical co(erK
48 'ist the properties of canonical co(er
44 'ist the disad(antages of relational database system
47 :hat is .nown as normali/ationK
4< %ummari/e different normal form
4C :hat is first normal formK
4B :hat is 7*#K
49 Define +oyce codd normal form
4; E1plain the desirable properties of decomposition
4F Describe briefly any two undesirable properties that a bad database design may ha(e
4L :hat is the purpose of normali/ationK
1@ M . 0 Q 5 ' &t+o - &
1. E1plain detail about #unctional Dependencies
7 E1plain detail about first5 second and third normali/ation form
< E1plain detail about +oyce code normal form and fifth normali/ation form
C E1plain detail in decomposition using #unctional Dependencies
B E1plain detail in decomposition using Multi-Valued Dependencies
CS1254 DATABASE MANAGEMENT
SYSTEMS
10
5
UNIT IV
TRANSACTIONS
!ransaction -oncepts - !ransaction Reco(ery $-ID Properties %ystem Reco(ery Media Reco(ery
!wo Phase -ommit - %a(e Points %&' #acilities for reco(ery -oncurrency *eed for -oncurrency
'oc.ing Protocols !wo Phase 'oc.ing Intent 'oc.ing Deadloc.- %eriali/ability Reco(ery
Isolation 'e(els %&' #acilities for -oncurrency
1. TRANSACTION CONCEPTS
$ transaction is a logical unit of wor. It begins with the e1ecution of a +E6I* !R$*%$-!I0*
operation and ends with the e1ecution of a -0MMI! or R0''+$-" operation
A S)*!' T.-&,t+o- <P&'54o ,o4'=
+E6I* !R$*%$-!I0*
>PD$!E $--47< (+$'$*-E= M+$'$*-E-A488)H
If any error occurred !2E* 60!0 >*D0H
E*D I#H
>PD$!E $--47< (+$'$*-E= M+$'$*-E3A488)H
If any error occurred !2E* 60!0 >*D0H
E*D I#H
-0MMI!H
60!0 #I*I%2H
>*D0H
R0''+$-"H
#I*I%2H
RE!>R*H
In our e1ample an amount of A488 is transferred from account 47< to CB9
It is not a single atomic operation5 it in(ol(es two separate updates on the database
!ransaction in(ol(es a seEuence of database update operation
CS1254 DATABASE MANAGEMENT
SYSTEMS
10
6
!he purpose of this transaction is to transform a correct state of database into another incorrect state5
without preser(ing correctness at all intermediate points
!ransaction management guarantees a correct transaction and maintains the database in a correct
state
It guarantees that if the transaction e1ecutes some updates and then a failure occurs before the
transaction reaches its planned termination5 then those updates will be undone
!hus the transaction either e1ecutes entirely or totally cancelled
!he system component that pro(ides this atomicity is called transaction manager or transaction
processing monitor or !P monitor
R0''+$-" and -0MMI! are .ey to the way it wor.s
1. COMMIT"
!he -0MMI! operation signals successful end of transaction
It tells the transaction manager that a logical unit of wor. has been successfully completed and
database is in correct state and the updates can be recorded or sa(ed
2. ROLLBAC$"
. +y contrast5 the R0''+$-" operation signals unsuccessful end of transaction
%. It tells the transaction manager that something has gone wrong5 the database might be in
incorrect state and all the updates made by the transaction should be undone
:. IMPLICIT ROLLBAC$"
E1plicit R0''+$-" cannot be issued in all cases of transaction failures or errors %o the
system issues implicit R0''+$-" for any transaction failure
If the transaction does not reach the planned termination then we R0''+$-" the transaction
else it is -0MMI!!ED
4. MESSAGE HANDLING"
$ typical transaction will not only update the database5 it will also send some .ind of message
bac. to the end user indicating what has happened
E6)*!'" T"ans-e" done i- the C3MM!T is "eached, o" E""o"ct"ans-e" not done
5. RECOVERY LOG"
!he system maintains a log or Dournal or dis. on which all particular about the updation is
maintained
!he (alues of before and after updation is also called as before and after images
!his log is used to bring the database to the pre(ious state incase of some undo operation
!he log consist of two portions
CS1254 DATABASE MANAGEMENT
SYSTEMS
10
7
aan active or online portion
b an archive or offline portion
!he o-!+-' *o.t+o- is the portion used during normal system operation to record details of
updates as they are performed and it is normally .ept on dis.
:hen the o-!+-' *o.t+o- becomes full5 its contents are transferred to the offline portion5 which
can be .ept on tape
@. STATEMENT ATOMICITY"
!he system should guarantee that indi(idual statement e1ecution must be atomic.
>. PROGRAM E#ECUTION IS A SEQUENCE OF TRANSACTIONS"
-0MMI! and R0''+$-" terminate the transaction, not the application program
$ single program e1ecution will consist of a seTuence of se(eral transactions running one after
another
PR06R$M EGE->!I0* I% $ %E&>E*-E 0# !R$*%$-!I0*%
K. NO NESTED TRANSACTIONS"
$n application program can e1ecute a +E6I* !R$*%$-!I0* statement only when it has no
transaction currently in progress
ie5 no transaction has other transactions nested inside itself
9. CORRECTNESS"
Consistent means not *io'atin0 any /non inte0"ity const"aint.
-onsistency and correctness of the system should be maintained
If / is a transaction that transforms the database from state :" to state :#, and if :" is correct5
then :# is correct as well
1L. MULTIPLE ASSIGNMENT"
CS1254 DATABASE MANAGEMENT
SYSTEMS
10
8
Multiple assignments allow any number of indi(idual assignments (ie5 updates) to be
pe"-o"med sim#'taneo#s'y.
E6)*!'" >PD$!E $-- 47< P+$'$*-E= M +$'$*-E - A488R
>PD$!E $-- CB9 P+$'$*-E= M +$'$*-E 3 A488R
Multiple assignments would ma.e the statement atomic
-urrent products do not support multiple assignments
2. TRANSACTION RECOVERY
$ transaction begins by e1ecuting a +E6I* !R$*%$-!I0* operation and ends by e1ecuting either a
-0MMI! or a R0''+$-" operation
-0MMI! establishes a ,o))+t *o+-t o. &(-,7 *o+-t
$ commit point corresponds to the successful end of a transaction and the database will be in a correct
state
R0''+$-" rolls the database bac. to the pre(ious commit point
!here will be se(eral transactions e1ecuting in parallel in a database
B7'- ,o))+t *o+-t +& '&t%!+&7'4"
4 :hen a program is committed5 the change is made permanent ie5 they are guaranteed to be
recorded in the database Prior to the commit point updates are tentati(e ie5 they can be
subseEuently be undone
7 $ll database positioning is lost and all tuple loc.s are released
Database positioning means at the time of e1ecution each program will typically ha(e addressability to
certain tuples in the database5 this addressability is lost at a -0MMI! point
!ransactions are not only a unit of wor. but also unit of .',o2'.(
If a transaction successfully commits5 then the system updates will be permanently recorded in the
database5 e(en if the system crashes the (ery ne1t moment
If the system crashes before the updates are written physically to the database5 the system5s "esta"t
procedure will still record those updates in the database
!he (alues can be disco(ered from the rele(ant records in the log
!he log must be physically written before the -0MMI! processing can complete !his is called /.+t'9
7'4 !o3 .5!'
!he restart procedure helps in reco(ering any any transactions that completed successfully but not
physically written prior to the crash
I)*!')'-tt+o- +&&5'&
CS1254 DATABASE MANAGEMENT
SYSTEMS
10
9
4 Database updates are .ept in buffers in main memory and not physically written to dis. until the
transaction commits !hat way5 if the transaction terminates unsuccessfully5 there will be no need
to undo any dis. updates
7 Database updates are physically written to the dis. after -0MMI! operation !hat way5 if the
system subseEuently crashes5 there will be no need to redo any dis. updates
If there is no enough dis. space then a transaction may steal buffer space from another transaction !hey
may also force updates to be written physically at the time of -0MMI!
B.+t' 7'4 !o3 .5!' +& '!%o.t'4 & 1o!!o/&"
4 !he log record for a gi(en database update must be physically written to the log before that update
is physically written to the database
7 $ll other log records for a gi(en transaction must be physically written to the log before the
-0MMI! log record for that transaction is physically written to the log
< -0MMI! processing for a gi(en transaction must not complete until the -0MMI! log record for
that transaction is physically written to the log
:. ACID PROPERTIES
$-ID stands for Atomicity5 Correctness5 Isolation and Durability
N Ato)+,+t(" !ransactions are atomic
-onsider the following e1ample
!ransaction to transfer AB8 from account $ to account +=
.'4(A)
A =M A Q B8
/.+t'(A)
.'4(A)
A =M A B B8
/.+t'(A$
.'4<!=A which transfers the data item J from the database to a local buffer belonging to the
transaction that e1ecuted the read operation
/.+t'<!=A which transfers the data item J from the local buffer of the transaction that e1ecuted the
write bac. to the database
+efore the e1ecution of transaction /i the (alues of accounts A and A are A4888 and A78885
respecti(ely
%uppose if the transaction fails due to some power failure5 hardware failure and system error the
transaction /i will not e1ecute successfully
CS1254 DATABASE MANAGEMENT
SYSTEMS
11
0
If the failure happens after the write(A) operation but before the write(A) operation !he
database will ha(e (alues ALB8 and A7888 which results in a failure
!he system destroys AB8 as a result of failure and leads the system to inconsistent state
T7' %&+, +4' o1 to)+,+t( +&" !he database system .eeps trac. of the old (alues of any data on
which a transaction performs a write5 if the transaction does not terminate successfully then the
database system restores the old (alues
$tomicity is handled by t.-&,t+o-9)-3')'-t ,o)*o-'-t
N Co..',t-'&&P Co-&+&t'-,("
!ransactions transform a correct state of the database into another correct state5 without necessarily
preser(ing correctness at all intermediate points
In our e1ample the transaction is in consistent state if the sum of $ and + is unchanged by the
e1ecution of transaction
NI&o!t+o-"
!ransactions are isolated from one another
E(en though there are many t"ansactions "#nnin0 conc#""ent'y, any 0i*en t"ansaction5s
#pdates a"e
concealed from all the rest5 until that transaction commits
!he database will be temporarily inconsistent while the transaction is in progress
:hen the amount is reduced from $ and not yet incremented to + the database will be inconsistent
If a second concurrently running transaction reads A and A at this intermediate point and computes
A3A5 it will obser(e an inconsistent (alue
If the second transaction performs updates on A and A based on the inconsistent (alues that it read5
the database will remain inconsistent e(en after both transactions are completed
In order to a(oid this problem serial e1ecution of transaction is preferred
Co-,5..'-,( ,o-t.o! ,o)*o-'-t maintain isolation of transaction
ND5.%+!+t("
0nce a transaction commits5 its updates persist in the database5 e(en if there is a subseEuent system
crash
!he computer system failure may lead to loss of data in main memory5 but data written to dis. are
not lost
Durability is guaranteed by ensuring the following
o !he updates carried out by the transaction should be written to the dis.
CS1254 DATABASE MANAGEMENT
SYSTEMS
11
1
o Information stored in the dis. should be sufficient to enable the database to reconstruct the
updates when the database system restarts after failure
o Reco(ery management component is responsible for ensuring durability
4. SYSTEM RECOVERY
!he system must be reco(ered not only from purely local failures such as an indi(idual transaction5 but
a'so -"om 0'oba' -ai'#"es
$ local failure affects only the transaction in which the failure has actually occurred
$ global failure affects all of the transactions in progress at the time of the failure
!he failures fall into two broad categories=
1. S(&t') 1+!5.'& (eg5 power outage)5 which affect all transactions currently in progress but do not
physically damage the database $ system failure is sometimes called a soft crash
2. M'4+ 1+!5.'& (eg5 head crash on the dis.)5 which cause damage to the database or some portion
of it $ media failure is sometimes called a hard crash
S(&t') 1+!5.' -4 .',o2'.(
During system failures the contents of main memory is lost
!he transaction at the time of the failure will not be successfully completed5 so transactions must be
undone ie5 rolled bac. when the system restarts
It is necessary to redo certain transactions at the time of restart that is not successfully completed prior
to the crash but did not manage to get their updates transferred from the buffers in main memory to the
physical database
:hene(er some prescribed number of records has been written to the log the system automatically t0'&
,7',0*o+-t
!he chec.point record contains a list of all transactions that were in progress at the time the chec.point
was ta.en
To &'' 7o/ ,7',0 *o+-t /o.0& ,o-&+4'. t7' 1o!!o/+-3
CS1254 DATABASE MANAGEMENT
SYSTEMS
11
2
$ system failure has occurred at time tf.
!he most recent chec.point prior to time tf was ta.en at time tc.
!ransactions of type /" completed (successfully) prior to time tc.
!ransactions of type /# started prior to time tc and completed (successfully) after time tc and before
time tf.
!ransactions of type /? also started prior to time tc but did not complete by time tf
!ransactions of type /U started after time tc and completed (successfully) before time tf.
#inally5 transactions of type /V also started after time tc but did not complete by time tf.
!he transactions of types /? and /Vmust be undone5 and transactions of types /# and /U must be redone $t
restart time5 the system first goes through the following procedure
1. %tart with two lists of transactions5 the >*D0 list and the RED0 list
2. %et the >*D0 list eEual to the list of all transactions gi(en in the most recent chec.point record
and the RED0 list to empty
:. %earch forward through the log5 starting from the chec.point record
4. If a +E6I* !R$*%$-!I0* log record is found for transaction /, add / to the >*D0 list
5. If a -0MMI! log record is found for transaction /, mo(e / from the >*D0 list to the RED0 list
@. :hen the end of the log is reached5 the >*D0 and RED0 lists are identified
!he system now wor.s bac.ward through the log5 undoing the transactions in the >*D0 list
!hen wor.s forward5 redoing the transactions in the RED0 list
Restoring the database to a correct state by redoing wor. is sometimes called forward recovery
Restoring the database to a correct state by undoing wor. is called !acward reco(ery
:hen all reco(ery acti(ity is complete5 then the system is ready to accept new wor.
ARIES
CS1254 DATABASE MANAGEMENT
SYSTEMS
11
3
Earlier reco(ery system performs >*D0 before RED0 operations
$RIE% scheme performs RED0 before >*D0 operation
$RIE% operates in three broad phases=
4 Analysis@ +uild the RED0 and >*D0 lists
7 Redo@ %tart from the log determined in the analysis phase and restore the database to the state it was in the
time of crash
< %ndo@ >ndo the effects of transactions that failed to commit
!he name A(!ES stands -o" A'0o"ithms -o" (eco*e"y and !so'ation E?p'oitin0 Semantics.
5. TBO PHASE COMMIT
!wo-phase commit is important whene(er a gi(en transaction can interact with se(eral independent
"eso#"ce mana0e"s,.
E6)*!'A
o -onsider a transaction running on an I+M mainframe that updates both an IM% database and
a D+7 database If the transaction completes successfully5 then both IM% data and D+7 data
are committed
o -on(ersely5 if the transaction fails5 then both the updates must be rolled bac.
o It is not possible to commit one database update and rollbac. the other If done so the
atomicity will not be maintained in the system
o !he"e-o"e, the t"ansaction iss#es a sin0'e 0'oba' o" &(&t')9/+4' -0MMI!
or
R0''+$-"
o !hat -0MMI! or R0''+$-" is handled by a system component called the ,oo.4+-to..
o -oordinators tas. is to guarantee the resource managers commit or roll bac.
o It should also guarantee even if the system fails in the middle of the process.
o !he two-phase commit protocol is responsible for maintaining such a guarantee
BOR$ING
$ssume that the transaction has completed and a -0MMI! is issued 0n recei(ing the -0MMI!
reEuest5 the coordinator goes through the following two-phase process=
"repare#
!he resource manager should 0et "eady to 0o eithe" ay on the t"ansaction.
!he *.t+,+*-t in the transaction should record all updates performed during the transaction from
temporary storage to permanent storage
In order to perform either -0MMI! or R0''+$-" as necessary
CS1254 DATABASE MANAGEMENT
SYSTEMS
11
4
Resource manager now replies 3K to the coo"dinato" o" )3T 3K based on the "ite
ope"ation.
Commit#
:hen the coordinator has recei(ed replies from all participants5 it ta.es a decision regarding the
transaction and records it in the physical log
!- a'' "ep'ies e"e 3K, that the decision is commitQ i- any "ep'y as )ot 3K, the
decision is
"o''bac/.
!he coordinator informs its decision to all the participants
Each participant must then commit or roll bac. the transaction locally5 as instructed by the
coordinator
If the system fails at some point during the process5 the restart procedure loo.s for the decision of the
coordinator
If the decision is found then the two phase commit can start processing from where it has left off
If the decision is not found then it assumes that the decision is R0''+$-" and the process can
complete appropriately
If the participants are from se(eral systems li.e in distributed system5 then some participants should wait
for long time for the coordinators decision
Data communication manager (D- manager) can act as a resource manager in case of a two-phase
commit process
9 SAVEPOINTS
!ransactions cannot be nested with in another transaction
!ransactions cannot be bro.en down into smaller subtransactions
!ransactions establish intermediate &2'*o+-t& while e1ecuting
If there is a roll bac. operation e1ecuted in the transaction5 instead of performing roll bac. all the way to
the beginning we can roll bac. to the pre(ious sa(epoint
%a(epoint is not the same as performing a -0MMI!5 updates made by the transaction are still not
(isible to other transaction until the transaction successfully e1ecutes a -0MMI!
>. MEDIA RECOVERY
Media reco(ery is different from transaction and system reco(ery
$ media failure is a failure such as a dis. head crash or a dis. controller failure in which some portion
of the database has been physically destroyed
Reco(ery from such a failure basically in(ol(es reloading or restoring the database from a bac.up or
dump copy and then using the log
!here is no need to undo transactions that were still in progress at the time of the failure
CS1254 DATABASE MANAGEMENT
SYSTEMS
11
5
!he dump portion of that utility is used to ma.e bac.up copies of the database on demand
%uch copies can be .ept on tape or other archi(al storage5 it is not necessary that they be on direct access
media
$fter a media failure5 the restore portion of the utility is used to recreate the database from a specified
bac.up copy
K. SQL FACILITIES FOR RECOVERY
%&' supports transactions and transaction-based reco(ery
$ll e1ecutable %&' statements are atomic e1cept -$'' and RE!>R*
%&' pro(ides +E6I* !R$*%$-!I0*5 -0MMI!5 and R0''+$-"5 called %!$R!
!R$*%$-!I0*5 -0MMI! :0R"5 and R0''+$-" :0R"5 respecti(ely
%ynta1 for %!$R! !R$*%$-!I0*=
%!$R! !R$*%$-!I0* 4option commalist5I
!he Noption commalistO specifies an access mode5 an isolation le(el5 or both
!he ,,'&& )o4' is either RE$D 0*'? or RE$D :RI!E
o If neither is specified5 RE$D :RI!E is assumed If RE$D :RI!E is specified5 the
isolation le(el must not be RE$D >*-0MMI!!ED
!he +&o!t+o- !'2'! ta.es the form I%0'$!I0* 'EVE' 4isolation5, where 4isolation5 can !e RE$D
>*-0MMI!!ED5 RE$D -0MMI!!ED5 REPE$!$+'E RE$D5 or %ERI$'IX$+'E
T7' &(-t6 1o. COMMIT -4 ROLLBAC$ +&"
-0MMI! S:0R"T S$*D S*0T -2$I*TH
R0''+$-" S:0R"T S$*D S*0T -2$I*TH
$*D -2$I* causes a %!$R! !R$*%$-!I0* to be e1ecuted automatically after the -0MMI!H
$*D *0 -2$I* is the default
$ -'0%E is e1ecuted automatically for e(ery open cursor e1cept for the cursors declared :I!2
20'D
$ cursor declared :I!2 20'D is not automatically closed at -0MMI!
%&' also supports sa(epoints
%ynta1= %$VEP0I*! 4savepoint name5I
!his synta1 creates a sa(epoint with the specified user-chosen name
S(-t6 1o. .o!! %,0 = R0''+$-" !0 4savepoint name5I
!his statement undoes all updates done since the specified sa(epoint
CS1254 DATABASE MANAGEMENT
SYSTEMS
11
6
S(-t6 1o. .'!'&+-3 &2'*o+-t&= RE'E$%E 4savepoint name5I
!his statement drops the specified sa(epoint $ll sa(epoints are automatically dropped at transaction
termination
2 M.0 Q5 '&t+o- &
&. :hat is transactionK
7 :hat are the two statements regarding transactionK
< :hat are the properties of transactionK
C :hat is reco(ery management componentK
B :hen is a transaction rolled bac.K
$ny changes that the aborted transaction made to the database must be undone 0nce the changes caused by
an aborted transaction ha(e been undone5 then the transaction has been rolled bac.
9 :hat are the states of transactionK
; :hat is a shadow copy schemeK
F 6i(e the reasons for allowing concurrencyK
L :hat is a(erage response timeK
48 :hat are the two types of seriali/abilityK
44 Define loc.K
47 :hat are the different modes of loc.K
4< Define deadloc.K
4C Define the phases of two phase loc.ing protocol
4B Define upgrade and downgradeK
49 :hat is a database graphK
4; :hat are the two methods for dealing deadloc. problemK
4F :hat is a reco(ery schemeK
4L :hat are the two types of errorsK
78 :hat are the storage typesK
74 Define bloc.sK
77 :hat is meant by Physical bloc.sK
7< :hat is meant by buffer bloc.sK
7C :hat is meant by dis. bufferK
7B :hat is meant by log-based reco(eryK
79 :hat are uncommitted modificationsK
CS1254 DATABASE MANAGEMENT
SYSTEMS
11
7
7; Define shadow paging
7F Define page
7L E1plain current page table and shadow page table
<8 :hat are the drawbac.s of shadow-paging techniEueK
<4 Define garbage collection
<7 Differentiate strict two phase loc.ing protocol and rigorous two phase loc.ing protocol
<< 2ow the time stamps are implemented
<C :hat are the timestamps associated with each data itemK
<B :hy is it necessary to ha(e control of concurrent e1ecution of transactionK 2ow is it made possibleK
1@ M.0 Q5 '&t+o- &
1 Define %eriali/ability E1plain the types of seriali/ability with e1ample
7 E1plain Deadloc. with e1ample
< E1plain in detail about 'oc.ing Protocol
C E1plain the *eed for -oncurrency -ontrol
B Discuss about transaction reco(erability
9 E1plain Reco(ery isolation le(els with e1ample
; E1plain in detail about $-ID properties
CS1254 DATABASE MANAGEMENT
SYSTEMS
11
8
2)!T 9
!M%;EME)TAT!3) TEC>)!L2ES
0(er(iew of Physical %torage Media Magnetic Dis.s R$ID !ertiary storage #ile
0rgani/ation 0rgani/ation of Records in #iles Inde1ing and 2ashing 0rdered Indices +3 tree
Inde1 #iles + tree Inde1 #iles %tatic 2ashing Dynamic 2ashing &uery Processing 0(er(iew
-atalog Information for -ost Estimation %election 0peration %orting ,oin 0peration Database
!uning
1. OVERVIEB OF PHYSICAL STORAGE MEDIA
C!&&+1+,t+o- o1 P7(&+,! Sto.3' M'4+
Y $ccessing %peed
Y -ost per unit of data
Y Reliability
o data loss on power failure or system crash
o physical failure of the storage de(ice
Y -an differentiate storage into=
o 2o!t+!' &to.3'" loses contents when power is switched off
o -o-92o!t+!' &to.3'=
-ontents persist e(en when power is switched off
Includes secondary and tertiary storage5 as well as batter-bac.ed up main-
memory
Sto.3' H+'..,7(
Y The *a"io#s sto"a0e media can be o"0ani8ed in a hie"a"chy acco#ntin0 to thei" speed
and thei" cost.
Y The hi0he" 'e*e' is e?pensi*e, b#t is -ast. As e mo*e don the hie"a"chy, the cost pe"
bit dec"eases, he"e as the access time inc"eases.
Y Sto"a0e hie"a"chy inc'#des 1 main cate0o"ies.
0. Primary st&ra1e- ,astest media b#t *o'ati'e Acache, main memo"yB.
2. Se"&)dary st&ra1e- ne?t 'e*e' in hie"a"chy, nonE*o'ati'e, mode"ate'y -ast access
time a'so ca''ed &)+%i)e st&ra1e
Eg flash memory5 magnetic dis.s
3. Tertiary st&ra1e- 'oest 'e*e' in hie"a"chy, nonE*o'ati'e, s'o access time a'so
ca''ed &ff+%i)e st&ra1e
CS1254 DATABASE MANAGEMENT
SYSTEMS
11
9
Eg magnetic tape5 optical storage
P 7 (&+,! S to . 3' M' 4 +
Y C,7' fastest and most costly form of storageH (olatileH managed by the computer system
hardware
Y M+- )')o.(=
o fast access (48s to 488s of nanosecondsH 4 nanosecond M 48
L
seconds)
o generally too small (or too e1pensi(e) to store the entire database
capacities of up to a few 6igabytes widely used currently
-apacities ha(e gone up and per-byte costs ha(e decreased steadily and rapidly
(roughly factor of 7 e(ery 7 to < years)
o Vo!t+!' c contents of main memory are usually lost if a power failure or system crash
occurs
Y F!&7 )')o.(
o Data sur(i(es power failure
o Data can be written at a location only once5 but location can be erased and written to again
-an support only a limited number (48" 4M) of write)erase cycles
Erasing of memory has to be done to an entire ban. of memory
o Reads are roughly as fast as main memory
o +ut writes are slow (few microseconds)5 erase is slower
o -ost per unit of storage roughly similar to main memory
o :idely used in embedded de(ices such as digital cameras
o Is a type of EEPR0M (Electrically Erasable Programmable Read-0nly Memory)
Y M3-'t+, D+&0
o Data is stored on spinning dis.5 and read)written magnetically
CS1254 DATABASE MANAGEMENT
SYSTEMS
12
0
o Primary medium for the long-term storage of dataH typically stores entire database
o Data must be mo(ed from dis. to main memory for access5 and written bac. for storage
Much slower access than main memory (more on this later)
o 4+.',t9,,'&& possible to read data on dis. in any order5 unli.e magnetic tape
o -apacities range up to roughly C88 6+ currently
Much larger capacity and cost)byte than main memory)flash memory
6rowing constantly and rapidly with technology impro(ements (factor of 7 to < e(ery
7 years)
o %ur(i(es power failures and system crashes
dis. failure can destroy data5 but is rare
Y O*t+,! &to.3'
o non-(olatile5 data is read optically from a spinning dis. using a laser
o -D-R0M (9C8 M+) and DVD (C; to 4; 6+) most popular forms
o :rite-one5 read-many (:0RM) optical dis.s used for archi(al storage (-D-R5 DVD-R5
DVD3R)
o Multiple write (ersions also a(ailable (-D-R:5 DVD-R:5 DVD3R:5 and DVD-R$M)
o Reads and writes are slower than with magnetic dis.
o J50'9%o6 systems5 with large numbers of remo(able dis.s5 a few dri(es5 and a mechanism
for automatic loading)unloading of dis.s a(ailable for storing large (olumes of data
Y T*' &to.3'
o non-(olatile5 used primarily for bac.up (to reco(er from dis. failure)5 and for archi(al data
o &'?5'-t+!9,,'&& much slower than dis.
o (ery high capacity (C8 to <88 6+ tapes a(ailable)
o tape can be remo(ed from dri(e storage costs much cheaper than dis.5 but dri(es are
e1pensi(e
o !ape Du.ebo1es a(ailable for storing massi(e amounts of data
hundreds of terabytes (4 terabyte M 48
L
bytes) to e(en a petabyte (4 petabyte M 48
47
bytes)
2. MAGNETIC9DIS$
a Data is stored on spinning dis.5 and read)written magnetically
CS1254 DATABASE MANAGEMENT
SYSTEMS
12
1
b Primary medium for the long-term storage of dataH typically stores entire database
c Data must be mo(ed from dis. to main memory for access5 and written bac. for storage
i Much slower access than main memory (more on this later)
d 4+.',t9,,'&& possible to read data on dis. in any order5 unli.e magnetic tape
e -apacities range up to roughly C88 6+ currently
i Much larger capacity and cost)byte than main memory)flash memory
ii 6rowing constantly and rapidly with technology impro(ements (factor of 7 to < e(ery
7 years)
f %ur(i(es power failures and system crashes
i dis. failure can destroy data5 but is rare
M3-'t+, H.4 D+&0 M',7-+&)
Y Read-write head
a Positioned (ery close to the platter surface (almost touching it)
b Reads or writes magnetically encoded information
Y %urface of platter di(ided into circular trac.s
a 0(er B8"-488" trac.s per platter on typical hard dis.s
Y Each trac. is di(ided into sectors
a $ sector is the smallest unit of data that can be read or written
b %ector si/e typically B47 bytes
c !ypical sectors per trac.= B88 (on inner trac.s) to 4888 (on outer trac.s)
Y !o read)write a sector
a dis. arm swings to position head on right trac.
b platter spins continuallyH data is read)written as sector passes under head
Y 2ead-dis. assemblies
a multiple dis. platters on a single spindle (4 to B usually)
b 0ne head per platter5 mounted on a common arm
Y -ylinder i consists of i
th
trac. of all the platters
Y Earlier generation dis.s were susceptible to head-crashes
a %urface of earlier generation dis.s had metal-o1ide coatings which would disintegrate on
head crash and damage all data on dis.
b -urrent generation dis.s are less susceptible to such disastrous failures5 although indi(idual
sectors may get corrupted
Y Dis. controller interfaces between the computer system and the dis. dri(e hardware
CS1254 DATABASE MANAGEMENT
SYSTEMS
12
2
a accepts high-le(el commands to read or write a sector
b initiates actions such as mo(ing the dis. arm to the right trac. and actually reading or writing
the data
c -omputes and attaches chec.sums to each sector to (erify that data is read bac. correctly
i If data is corrupted, ith *e"y hi0h p"obabi'ity sto"ed chec/s#m on5t
match
recomputed chec.sum
d Ensures successful writing by reading bac. sector after writing it
e Performs remapping of bad sectors
D+&0 S 5% &(&t')
#igureDis. %ubsystem
Multiple dis.s connected to a computer system through a controller
f -ontrollers functionality (chec.sum5 bad sector remapping) often carried out by indi(idual
dis.sH reduces load on controller
CS1254 DATABASE MANAGEMENT
SYSTEMS
12
3
Dis. interface standards families
g $!$ ($! adaptor) range of standards
h %$!$ (%erial $!$)
i %-%I (%mall -omputer %ystem Interconnect) range of standards
D %e(eral (ariants of each standard (different speeds and capabilities)
P' .1 o. )- ,' M'&5 .'& o1 D+&0 &
Y A,,'&& t+)' the time it ta.es from when a read or write reEuest is issued to when data transfer
begins -onsists of=
a S''0 t+)' time it ta.es to reposition the arm o(er the correct trac.
b Rott+o-! !t'-,( time it ta.es for the sector to be accessed to appear under the head
Y Dt9t.-&1'. .t' the rate at which data can be retrie(ed from or stored to the dis.
a 7B to 488 M+ per second ma1 rate5 lower for inner trac.s
Y M'- t+)' to 1+!5.' <MTTF= the a(erage time the dis. is e1pected to run continuously without
any failure
a !ypically < to B years
O* t+)+;t+o- o1 D+&0 9B!o,0 A,,'&&
Y B!o,0 a contiguous seEuence of sectors from a single trac.
a data is transferred between dis. and main memory in bloc.s
b si/es range from B47 bytes to se(eral .ilobytes
i %maller bloc.s= more transfers from dis.
ii 'arger bloc.s= more space wasted due to partially filled bloc.s
iii !ypical bloc. si/es today range from C to 49 .ilobytes
T',7-+?5'& 5&'4 1o. ,,'&&+-3 4t 1.o) 4+&0
c D+&09.)9&,7'45!+-3 algorithms order pending accesses to trac.s so that dis. arm
mo(ement is minimi/ed
d '!'2to. !3o.+t7) = mo(e dis. arm in one direction (from outer to inner trac.s or (ice
(ersa)5 processing ne1t reEuest in that direction5 till no more reEuests in that direction5 then
re(erse direction and repeat
Y F+!' o.3-+;t+o- optimi/e bloc. access time by organi/ing the bloc.s to correspond to how data
will be accessed
a Eg %tore related information on the same or nearby cylinders
CS1254 DATABASE MANAGEMENT
SYSTEMS
12
4
b #iles may get 1.3)'-t'4 o(er time
i Eg if data is inserted to)deleted from the file
ii 0r free bloc.s on dis. are scattered5 and newly created file has its bloc.s scattered
o(er the dis.
iii %eEuential access to a fragmented file results in increased dis. arm mo(ement
c %ome systems ha(e utilities to defragment the file system5 in order to speed up file access
Y No-2o!t+!' /.+t' %511'.& speed up dis. writes by writing bloc.s to a non-(olatile R$M buffer
immediately
a *on-(olatile R$M= battery bac.ed up R$M or flash memory
i E(en if power fails5 the data is safe and will be written to dis. when power returns
b -ontroller then writes to dis. whene(er the dis. has no other reEuests or reEuest has been
pending for some time
c Database operations that reEuire data to be safely stored before continuing can continue
without waiting for data to be written to dis.
d Writes can !e reordered to minimiXe dis arm movement
Y Lo3 4+&0 a dis. de(oted to writing a seEuential log of bloc. updates
a >sed e1actly li.e non(olatile R$M
i :rite to log dis. is (ery fast since no see.s are reEuired
ii *o need for special hardware (*V-R$M)
Y #ile systems typically reorder writes to dis. to impro(e performance
a Jo5.-!+-3 1+!' &(&t')& write data in safe order to *V-R$M or log dis.
b Reordering without Dournaling= ris. of corruption of file system data
CS1254 DATABASE MANAGEMENT
SYSTEMS
12
5
:. RAID
Y RAID" R'45-4-t A..(& o1 I-4'*'-4'-t D+&0&
a dis. organi/ation techniEues that manage a large numbers of dis.s5 pro(iding a (iew of a
single dis. of
i high capacity and high speed by using multiple dis.s in parallel5 and
ii high reliability by storing data redundantly5 so that data can be reco(ered e(en if a
dis. fails
Y !he chance that some dis. out of a set of = dis.s will fail is much higher than the chance that a
specific single dis. will fail
Y 0riginally a cost-effecti(e alternati(e to large5 e1pensi(e dis.s
I)*.o2')'-t o1 R'!+%+!+t( 2+ R'45-4-,(
Y R'45-4-,( store e1tra information that can be used to rebuild information lost in a dis. failure
Y Eg5 M+..o.+-3 (or &74o/+-3)
a Duplicate e(ery dis. 'ogical dis. consists of two physical dis.s
b E(ery write is carried out on both dis.s
i Reads can ta.e place from either dis.
c If one dis. in a pair fails5 data still a(ailable in the other
i Data loss would occur only if a dis. fails5 and its mirror dis. also fails before the
system is repaired
4 Probability of combined e(ent is (ery small
a E1cept for dependent failure modes such as fire or building collapse or
electrical power surges
Y Mean time to data loss depends on mean time to failure5 and mean time to repair
I)*.o2')'-t +- P'.1o.)-,' 2+ P.!!'!+&)
!wo main goals of parallelism in a dis. system=
Y ;oad ba'ance m#'tip'e sma'' accesses to inc"ease th"o#0hp#t
Y %a"a''e'i8e 'a"0e accesses to "ed#ce "esponse time.
Impro(e transfer rate by striping data across multiple dis.s
CS1254 DATABASE MANAGEMENT
SYSTEMS
12
6
0. Bit+%e4e% stripi)1 sp'it the bits o- each byte ac"oss m#'tip'e dis/s
a In an array of eight dis.s5 write bit i of each byte to dis. i.
b Each access can read data at eight times the rate of a single dis.
c +ut see.)access time worse than for a single dis.
i +it le(el striping is not used much any more
2. B%&"'+%e4e% stripi)1 ith n dis/s, b'oc/ i o- a d'e 0oes to dis/ Ai mod nB e &
d ReEuests for different bloc.s can run in parallel if the bloc.s reside on different dis.s
e $ reEuest for a long seEuence of bloc.s can utili/e all dis.s in parallel
RAID L'2'!&
Y %chemes to pro(ide redundancy at lower cost by using dis. striping combined with parity bits
a Different R$ID organi/ations5 or R$ID le(els5 ha(e differing cost5 performance and
reliability characteristics
Y RAID L'2'! L= +loc. stripingH non-redundant
a >sed in high-performance applications where data lost is not critical
Y RAID L'2'! 1= Mirrored dis.s with bloc. striping
a 0ffers best write performance
b Popular for applications such as storing log files in a database system
Y RAID L'2'! 2= Memory-%tyle Error--orrecting--odes (E--) with bit striping
Y RAID L'2'! := +it-Interlea(ed Parity
CS1254 DATABASE MANAGEMENT
SYSTEMS
12
7
a a single parity bit is enough for error correction5 not Dust detection5 since we .now which
dis. has failed
i :hen writing data5 corresponding parity bits must also be computed and written to a
parity bit dis.
ii !o reco(er data in a damaged dis.5 compute G0R of bits from other dis.s (including
parity bit dis.)
b #aster data transfer than with a single dis.5 but fewer I)0s per second since e(ery dis. has to
participate in e(ery I)0
c %ubsumes 'e(el 7 (pro(ides all its benefits5 at lower cost)
Y RAID L'2'! 4" +loc.-Interlea(ed ParityH uses bloc.-le(el striping5 and .eeps a parity bloc. on a
separate dis. for corresponding bloc.s from = other dis.s
a Pro(ides higher I)0 rates for independent bloc. reads than 'e(el <
i bloc. read goes to a single dis.5 so bloc.s stored on different dis.s can be read in
parallel
b Pro(ides high transfer rates for reads of multiple bloc.s than no-striping
c +efore writing a bloc.5 parity data must be computed
4 More efficient for writing large amounts of data seEuentially
Y RAID L'2'! 5" +loc.-Interlea(ed Distributed ParityH partitions data and parity among all = 3 4
dis.s5 rather than storing data in = dis.s and parity in 4 dis.
a Eg5 with B dis.s5 parity bloc. for nth set of bloc.s is stored on dis. (n mod B) 3 45 with the
data bloc.s stored on the other C dis.s
b 2igher I)0 rates than 'e(el C
CS1254 DATABASE MANAGEMENT
SYSTEMS
12
8
i +loc. writes occur in parallel if the bloc.s and their parity bloc.s are on different
dis.s
c %ubsumes 'e(el C= pro(ides same benefits5 but a(oids bottlenec. of parity dis.
Y RAID L'2'! @= P3& Redundancy schemeH similar to 'e(el B5 but stores e1tra redundant information
to guard against multiple dis. failures
a +etter reliability than 'e(el B at a higher costH not used as widely
C7o+,' o1 RAID L'2'!
Y #actors in choosing R$ID le(el
a Monetary cost
b Performance= *umber of I)0 operations per second5 and bandwidth during normal operation
c Performance during failure
d Performance during rebuild of failed dis.
i Including time ta.en to rebuild failed dis.
Y R$ID 8 is used only when data safety is not important
a Eg data can be reco(ered Euic.ly from other sources
Y 'e(el 7 and C ne(er used since they are subsumed by < and B
Y 'e(el < is not used anymore since bit-striping forces single bloc. reads to access all dis.s5 wasting
dis. arm mo(ement5 which bloc. striping (le(el B) a(oids
Y 'e(el 9 is rarely used since le(els 4 and B offer adeEuate safety for almost all applications
Y %o competition is between 4 and B only
Y 'e(el 4 pro(ides much better write performance than le(el B
Y 'e(el 4 had higher storage cost than le(el B
Y 'e(el B is preferred for applications with low update rate5 and large amounts of data
Y 'e(el 4 is preferred for all other applications
H.4/.' I&&5'&
CS1254 DATABASE MANAGEMENT
SYSTEMS
12
9
Y So1t/.' RAID= R$ID implementations done entirely in software5 with no special hardware
support
Y H.4/.' RAID= R$ID implementations with special hardware
a >se non-(olatile R$M to record writes that are being e1ecuted
b +eware= power failure during write can result in corrupted dis.
i Eg failure after writing one bloc. but before writing the second in a mirrored system
ii %uch corrupted data must be detected when power is restored
4 Reco(ery from corruption is similar to reco(ery from failed dis.
7 *V-R$M helps to efficiently detected potentially corrupted bloc.s
a 0therwise all bloc.s of dis. must be read and compared with
mirror)parity bloc.
Y Hot &/**+-3= replacement of dis. while system is running5 without power down
a %upported by some hardware R$ID systems5
b reduces time to reco(ery5 and impro(es a(ailability greatly
Y Many systems maintain spare dis.s which are .ept online5 and used as replacements for failed dis.s
immediately on detection of failure
a Reduces time to reco(ery greatly
Y Many hardware R$ID systems ensure that a single point of failure will not stop the functioning of
the system by using
a Redundant power supplies with battery bac.up
b Multiple controllers and multiple interconnections to guard against controller)interconnection
failures
4. TERTIARY STORAGE
O*t+,! D+&0&
Y -ompact dis.-read only memory (-D-R0M)
a Dis.s can be loaded into or remo(ed from a dri(e
b 2igh storage capacity (9C8 M+ per dis.)
c 2igh see. times or about 488 msec (optical read head is hea(ier and slower)
d 2igher latency (<888 RPM) and lower data-transfer rates (<-9 M+)s) compared to magnetic
dis.s
Y Digital Video Dis. (DVD)
a DVD-B holds C; 6+ 5 and DVD-L holds FB 6+
CS1254 DATABASE MANAGEMENT
SYSTEMS
13
0
b DVD-48 and DVD-4F are double sided formats with capacities of LC 6+ and 4; 6+
c 0ther characteristics similar to -D-R0M
Y Record once (ersions (-D-R and DVD-R) are becoming popular
a data can only be written once5 and cannot be erased
b high capacity and long lifetimeH used for archi(al storage
c Multi-write (ersions (-D-R:5 DVD-R: and DVD-R$M) also a(ailable
M3-'t+, T*'&
Y 2old large (olumes of data and pro(ide high transfer rates
a #ew 6+ for D$! (Digital $udio !ape) format5 48-C8 6+ with D'! (Digital 'inear !ape)
format5 488 6+3 with >ltrium format5 and <<8 6+ with $mpe1 helical scan format
b !ransfer rates from few to 48s of M+)s
Y -urrently the cheapest storage medium
a !apes are cheap5 but cost of dri(es is (ery high
Y Very slow access time in comparison to magnetic dis.s and optical dis.s
a limited to seEuential access
b %ome formats ($ccelis) pro(ide faster see. (48s of seconds) at cost of lower capacity
Y >sed mainly for bac.up5 for storage of infreEuently used information5 and as an off-line medium for
transferring information from one system to another
Y !ape Du.ebo1es used for (ery large capacity storage
a (terabyte (48
47
bytes) to petabye (48
4B
bytes)
Sto.3' A,,'&&
Y $ database file is partitioned into fi1ed-length storage units called %!o,0& +loc.s are units of both
storage allocation and data transfer
Y Database system see.s to minimi/e the number of bloc. transfers between the dis. and memory
:e can reduce the number of dis. accesses by .eeping as many bloc.s as possible in main
memory
Y B511'. portion of main memory a(ailable to store copies of dis. bloc.s
Y B511'. )-3'. subsystem responsible for allocating buffer space in main memory
Y B#5er Ma)a1er
Y Programs call on the buffer manager when they need a bloc. from dis.
CS1254 DATABASE MANAGEMENT
SYSTEMS
13
1
a If the bloc. is already in the buffer5 the reEuesting program is gi(en the address of the bloc.
in main memory
b If the bloc. is not in the buffer5
i the buffer manager allocates space in the buffer for the bloc.5 replacing (throwing
out) some other bloc.5 if reEuired5 to ma.e space for the new bloc.
ii !he bloc. that is thrown out is written bac. to dis. only if it was modified since the
most recent time that it was written to)fetched from the dis.
iii 0nce space is allocated in the buffer5 the buffer manager reads the bloc. from the
dis. to the buffer5 and passes the address of the bloc. in main memory to reEuester
Y B#5er+Rep%a"eme)t P&%i"ies
Y Most operating systems replace the bloc. !'&t .','-t!( 5&'4 ('R> strategy)
Y Idea behind 'R> use past pattern of bloc. references as a predictor of future references
Y &ueries ha(e well-defined access patterns (such as seEuential scans)5 and a database system can use
the information in a #se"5s O#e"y to p"edict -#t#"e "e-e"ences
a 'R> can be a bad strategy for certain access patterns in(ol(ing repeated scans of data
i eg when computing the Doin of 7 relations r and s by a nested loops
for each tuple tr of r do
for each tuple ts of s do
if the tuples tr and ts match X
b Mi1ed strategy with hints on replacement strategy pro(ided by the Euery optimi/er is
preferable
Y P+--'4 %!o,0 memory bloc. that is not allowed to be written bac. to dis.
Y To&&9+))'4+t' strategy frees the space occupied by a bloc. as soon as the final tuple of that
bloc. has been processed
Y Most recently used (MR>) strategy system must pin the bloc. currently being processed $fter
the final tuple of that bloc. has been processed5 the bloc. is unpinned5 and it becomes the most
recently used bloc.
Y +uffer manager can use statistical information regarding the probability that a reEuest will reference
a particular relation
a Eg5 the data dictionary is freEuently accessed 2euristic= .eep data-dictionary bloc.s in
main memory buffer
CS1254 DATABASE MANAGEMENT
SYSTEMS
13
2
5. FILE ORGANIVATION
Y !he database is stored as a collection of files Each file is a seEuence of records. $ record is a
seEuence of fields
T(*'&
&. ,i?ed ;en0th (eco"d
2. 9a"iab'e ;en0th (eco"d
F+6'49L'-3t7 R',o.4&
Y %imple approach=
o %tore record i starting from byte n (i Q 4)5 where n is the si/e of each record
o Record access is simple but records may cross bloc.s
Modification= do not allow records to cross bloc. boundaries
Y Deletion of record i@
alternati(es@
o mo(e records i 3 45 5 n to i, . . . , n Q 4
o mo(e record n to i
o do not mo(e records5 but lin. all free records on a free list
#igure#ile -ontaining account record
F+!' o1 /+t7 R',o.4 2 D'!'t'4 -4 A!! R',o.4& Mo2'4
CS1254 DATABASE MANAGEMENT
SYSTEMS
13
3
F+!' o1A /+t7 R',o.4 2 4'!'t'4 -4 F+-! R',o.4 Mo2'4
F.'' L+&t&
Y %tore the address of the first deleted record in the file header
Y >se this first record to store the address of the second deleted record5 and so on
Y -an thin. of these stored addresses as pointers since they point to the location of a record
Y More space efficient representation= reuse space for normal attributes of free records to store
pointers (*o pointers stored in in-use records)
CS1254 DATABASE MANAGEMENT
SYSTEMS
13
4
V.+%!'9L'-3t7 R',o.4&
#igure#ile of after deletion of records 45C and 9
Y Variable-length records arise in database systems in se(eral ways=
o %torage of multiple record types in a file
o Record types that allow (ariable lengths for one or more fields
o Record types that allow repeating fields (used in some older data models)
Y +yte string representation
o $ttach an end-of-record ( ) control character to the end of each record
o Difficulty with deletion
o Difficulty with growth
B(t'9St.+-3 R'*.'&'-tt+o- o1 V.+%!'9L'-3t7 R',o.4&
#igure+yte-%tring representation of (ariable-'ength records
V.+%!'9L'-3t7 R',o.4&" S!ott'4 P3' St.5,t5.'
#igure%lotted Page %tructure
Y %lotted page header contains=
o number of record entries
o end of free space in the bloc.
o location and si/e of each record
Y Records can be mo(ed around within a page to .eep them contiguous with no empty space between
themH entry in the header must be updated
Y Pointers should not point directly to record c instead they should point to the entry for the record in
header
F+6'49!'-3t7 .'*.'&'-tt+o-"
o reser(ed space
o pointers
Y Reser(ed space can use fi1ed-length records of a .nown ma1imum lengthH unused space in shorter
records filled with a null or end-of-record symbol
Po+-t'. M't7o4
#igure#ile of using reser(ed-space method
Y Pointer method
#igure#ile of using 'in.ed 'ists
o $ (ariable-length record is represented by a list of fi1ed-length records5 chained together (ia
pointers
o -an be used e(en if the ma1imum record length is not .nown
Y D+&42-t3' to pointer structureH space is wasted in all records e1cept the first in an a chain
Y %olution is to allow two .inds of bloc. in file=
o $nchor bloc. contains the first records of chain
o 0(erflow bloc. contains records other than those that are the first records of chairs
,i0#"eAncho"Eb'oc/ and 3*e"-'oEb'oc/
St"#ct#"e
@. ORGANIVATION OF RECORDS IN FILES
Y H'* a record can be placed anywhere in the file where there is space
Y S'?5'-t+! store records in seEuential order5 based on the (alue of the search .ey of each record
Y H&7+-3 a hash function computed on some attribute of each recordH the result specifies in which
bloc. of the file the record should be placed
Y Records of each relation may be stored in a separate file In a ,!5&t'.+-3 1+!' o.3-+;t+o- records
of se(eral different relations can be stored in the same file
o Moti(ation= store related records on the same bloc. to minimi/e I)0
S'?5'-t+! F+!' O.3-+;t+o-
Y %uitable for applications that reEuire seEuential processing of the entire file
Y !he records in the file are ordered by a search-.ey
,i0#"eSeO#entia' d'e -o" acco#nt "eco"ds
Y D'!'t+o- use pointer chains
Y I-&'.t+o- locate the position where the record is to be inserted
o if there is free space insert there
o if no free space5 insert the record in an o(erflow bloc.
o In either case5 pointer chain must be updated
Y *eed to reorgani/e the file from time to time to restore seEuential order
C!5&t'.+-3 F+!' O.3-+;t+o-
#igure%eEuential file after an insertion
Y %imple file structure stores each relation in a separate file
Y -an instead store se(eral relations in one file using a ,!5&t'.+-3 file organi/ation
E6)*!'" -onsider two relation
T7' depositor R'!t+o- T7' customer R'!t+o-
Y Eg5 clustering organi/ation of customer and depositor@
,i0#"eC'#ste"in0 d'e st"#ct#"e
o good for Eueries in(ol(ing depositor customer5 and for Eueries in(ol(ing one single customer
and his accounts
o bad for Eueries in(ol(ing only customer
o results in (ariable si/e records
C!5&t'.+-3 F+!' St.5,t5.' /+t7 Po+-t'. C7+-&
,i0#"eC'#ste"in0 d'e st"#ct#"e ith
pointe" chains
6. INDE7ING AND HASHING
Y -omparison of 0rdered Inde1ing and 2ashing
Y Inde1 Definition in %&'
Y Multiple-"ey $ccess
B&+, Co-,'*t&
Y Inde1ing mechanisms used to speed up access to desired data
o Eg5 author catalog in library
Y S'.,7 $'( - attribute to set of attributes used to loo. up records in a file
Y $n +-4'6 1+!' consists of records (called +-4'6 '-t.+'&) of the form
Y Inde1 files are typically much smaller than the original file
Y !wo basic .inds of indices=
o O.4'.'4 +-4+,'&" search .eys are stored in sorted order
o H&7 +-4+,'&" sea"ch /eys a"e dist"ib#ted #ni-o"m'y ac"oss b#c/ets #sin0 a hash
-#nction.
I-4'6 E2!5t+o- M't.+,&
Y $ccess types supported efficiently Eg5
o records with a specified (alue in the attribute
o 0r records with an attribute (alue falling in a specified range of (alues
CS1254 DATABASE MANAGEMENT
SYSTEMS
14
0
Y $ccess time
Y Insertion time
Y Deletion time
Y %pace o(erhead
K. ORDERED INDICES
o !nde?in0 techniO#es e*a'#ated on basis o-@
Y In an o.4'.'4 +-4'6A inde1 entries are stored sorted on the search .ey (alue Eg5 author catalog in
library
Y P.+).( +-4'6" in a seEuentially ordered file5 the inde1 whose search .ey specifies the seEuential
order of the file
o $lso called ,!5&t'.+-3 +-4'6
o !he search .ey of a primary inde1 is usually but not necessarily the primary .ey
Y S',o-4.( +-4'6= an inde1 whose search .ey specifies an order different from the seEuential order
of the file $lso called non-clustering inde1.
P.+).( +-4'6
Y Inde1-seEuential file" ordered seEuential file with a primary inde1
S'?5'-t+! F+!' 1o. account R',o.4&
D'-&' -4 S*.&' I-4+,'&
#igure%eEuential file for account records
!here are two types of ordered indices that we can use=
D'-&' I-4'6 F+!'&
Y Dense inde1 c Inde1 record appears for e(ery search-.ey (alue in the file
S*.&' I-4'6 F+!'&
#igureDense Inde1
Y %parse Inde1= contains inde1 records for only some search-.ey (alues
o $pplicable when records are seEuentially ordered on search-.ey
Y !o locate a record with search-.ey (alue > we=
o #ind inde1 record with largest search-.ey (alue N >
o %earch file seEuentially starting at the record to which the inde1 record points
Y 'ess space and less maintenance o(erhead for insertions and deletions
Y 6enerally slower than dense inde1 for locating records
Y 6ood tradeoff= sparse inde1 with an inde1 entry for e(ery bloc. in file5 corresponding to least
search-.ey (alue in the bloc.
E6)*!' o1 S*.&' I-4'6 F+!'&
,i0#"eSpa"se !nde?
S',o-4.( I-4+,'&
Y #reEuently5 one wants to find all the records whose (alues in a certain field (which is not the search-
.ey of the primary inde1 satisfy some condition
o E1ample 4= In the account database stored seEuentially by account number5 we may want to
find all accounts in a particular branch
o E1ample 7= as abo(e5 but where we want to find all accounts with a specified balance or range
of balances
Y :e can ha(e a secondary inde1 with an inde1 record for each search-.ey (alueH inde1 record points
to a buc.et that contains pointers to all the actual records with that particular search-.ey (alue
S',o-4.( I-4'6 o- !alance 1+'!4 o1 account
,i0#"eSeconda"y !nde? on acco#nt d'e, on noncandidate /ey ba'ance
P.+).( -4 S',o-4.( I-4+,'&
Y %econdary indices ha(e to be dense
Y Indices offer substantial benefits when searching for records
Y :hen a file is modified5 e(ery inde1 on the file must be updated5 >pdating indices imposes o(erhead
on database modification
Y %eEuential scan using primary inde1 is efficient5 but a seEuential scan using a secondary inde1 is
e1pensi(e
o each record access may fetch a new bloc. from dis.
M5!t+!'2'! I-4'6
Y If primary inde1 does not fit in memory5 access becomes e1pensi(e
Y !o reduce number of dis. accesses to inde1 records5 treat primary inde1 .ept on dis. as a seEuential
file and construct a sparse inde1 on it
o outer inde1 a sparse inde1 of primary inde1
o inner inde1 the primary inde1 file
Y If e(en outer inde1 is too large to fit in main memory5 yet another le(el of inde1 can be created5 and
so on
Y Indices at all le(els must be updated on insertion or deletion from the file
I-4'6 U*4t'" D'!'t+o-
Y If deleted record was the only record in the file with its particular search-.ey (alue5 the search-.ey is
deleted from the inde1 also
Y %ingle-le(el inde1 deletion=
o Dense indices deletion of search-.ey is similar to file record deletion
o %parse indices if an entry for the search .ey e1ists in the inde15 it is deleted by replacing the
entry in the inde1 with the ne1t search-.ey (alue in the file (in search-.ey order) If the ne1t
search-.ey (alue already has an inde1 entry5 the entry is deleted instead of being replaced
I-4'6 U*4t'" I-&'.t+o-
Y %ingle-le(el inde1 insertion=
o Perform a loo.up using the search-.ey (alue appearing in the record to be inserted
o Dense indices if the search-.ey (alue does not appear in the inde15 insert it
o %parse indices if inde1 stores an entry for each bloc. of the file5 no change needs to be made
to the inde1 unless a new bloc. is created In this case5 the first search-.ey (alue appearing in
the new bloc. is inserted into the inde1
Y Multile(el insertion (as well as deletion) algorithms are simple e1tensions of the single-le(el
algorithms
9. B
U
9TREE INDE# FILES
B
U
9t.'' +-4+,'& .' - !t'.-t+2' to +-4'6'49&'?5'-t+! 1+!'&.
Y Disad(antage of inde1ed-seEuential files= performance degrades as file grows5 since many o(erflow
bloc.s get created Periodic reorgani/ation of entire file is reEuired
Y $d(antage of +
3
-tree inde1 files= automatically reorgani/es itself with small5 local5 changes5 in the
face of insertions and deletions Reorgani/ation of entire file is not reEuired to maintain
performance
Y Disad(antage of +
3
-trees= e1tra insertion and deletion o(erhead5 space o(erhead
Y $d(antages of +
3
-trees outweigh disad(antages5 and they are used e1tensi(ely
A B
U
9t.'' +& .oot'4 t.'' &t+&1(+-3 t7' 1o!!o/+-3 *.o*'.t+'&"
Y $ll paths from root to leaf are of the same length
Y Each node that is not a root or a leaf has between Sn)7T and n children
Y $ leaf node has between S(n4))7T and n4 (alues
Y %pecial cases=
o If the root is not a leaf5 it has at least 7 children
o If the root is a leaf (that is5 there are no other nodes in the tree)5 it can ha(e between 8 and (n4)
(alues
B
U
9T.'' No4' St.5,t5.'
#iguretypical node of a +
3
-!ree
o "
i
are the search-.ey (alues
o P
i
are pointers to children (for non-leaf nodes) or pointers to records or buc.ets of records (for
leaf nodes)
Y !he search-.eys in a node are ordered
K
&
^ K
2
^ K
1
^ . . . ^ K
n&
L'1 No4'& +- B
U
9T.''&
Properties of a leaf node=
Y #or i M 45 75 5 nQ45 pointer P
i
either points to a file record with search-.ey (alue >
i
5 or to a buc.et
of pointers to file records5 each record ha(ing search-.ey (alue >
i
. 0nly need buc.et structure if
search-.ey does not form a primary .ey
Y If R
i
, R
j
are leaf nodes and i N j, R
i
5s sea"ch-.ey (alues are less than R
j
5s sea"ch-.ey (alues
Y P
n
points to ne1t leaf node in search-.ey order
,i0#"eA 'ea- node -o" acco#nt +
e
ET"ees inde? AnI1B
No-9L'1 No4'& +- B
U
9T.''&
Y *on leaf nodes form a multi-le(el sparse inde1 on the leaf nodes #or a non-leaf node with m
pointers=
o $ll the search-.eys in the subtree to which P
4
points are less than >
4
o #or 7 i n 45 all the search-.eys in the subtree to which P
i
points ha(e (alues greater than or
eEual to >
i4
and less than >
mQ"
E6)*!' o1 B
U
9t.''
,i0#"e +
e
Et"ee -o" accoun d'e An ! 1B
,i0#"e +
e
Et"ee -o" accoun d'e An I 5B
Y 'eaf nodes must ha(e between 7 and C (alues ( (n4))7 and n 45 with n M B)
Y *on-leaf nodes other than root must ha(e between < and B children ( (n)7 and n with n MB)
Y Root must ha(e at least 7 children
O%&'.2t+o-& %o5t B
U
9t.''&
Y %ince the inter-node connections a"e done by pointe"s, 'o0ica''y c'ose b'oc/s need not
be
physica''y c'ose.
Y !he non-leaf le(els of the +
3
-tree form a hierarchy of sparse indices
Y !he +
3
-tree contains a relati(ely small number of le(els (logarithmic in the si/e of the main file)5
thus searches can be conducted efficiently
Y Insertions and deletions to the main file can be handled efficiently5 as the inde1 can be restructured
in logarithmic time
Q5'.+'& o- B
U
9T.''&
Y #ind all records with a search-.ey (alue of .
o %tart with the root node
E1amine the node for the smallest search-.ey (alue O .
If such a (alue e1ists5 assume it is >
j
. !hen follow P
i
to the child node
0therwise >
m4
5 where there are m pointers in the node !hen follow P
m
to the
child node
o If the node reached by following the pointer abo(e is not a leaf node5 repeat the abo(e
procedure on the node5 and follow the corresponding pointer
o E(entually reach a leaf node If for some i5 .ey >
i
; . follow pointer P
i
to the desired record or
buc.et Else no record with search-.ey (alue e1ists
Y In processing a Euery5 a path is tra(ersed in the tree from the root to some leaf node
Y If there are > search-.ey (alues in the file5 the path is no longer than

log
n)7
(>)
*.o,'45.' find(value Y )
set C M root node
/7+!' C is not a leaf node %'3+-
'et >i M smallest search-.ey (alue5 if any5 greater than Y
+1 there is no such (alue t7'- %'3+-
'et m M the number of pointers in the node
set C M node pointed to by Pm
'-4
'-4
'!&' set C M the node pointed to by Pi
+1 there is a .ey (alue >i in C such that >i M Y
t7'- pointer Pi directs us to the desired record or buc.et
'!&' no record with .ey (alue e1ists
'-4 *.o,'45.'
U*4t'& o- B
U
9T.''&" I-&'.t+o-
Y #ind the leaf node in which the search-.ey (alue would appear
Y If the search-.ey (alue is already there in the leaf node5 record is added to file and if necessary a
pointer is inserted into the buc.et
Y If the search-.ey (alue is not there5 then add the record to the main file and create a buc.et if
necessary !hen=
o If there is room in the leaf node5 insert (.ey-(alue5 pointer) pair in the leaf node
o 0therwise5 split the node (along with the new (.ey-(alue5 pointer) entry) as discussed in the
ne1t slide
Y %plitting a node=
o ta.e the n(search-.ey (alue5 pointer) pairs (including the one being inserted) in sorted order
Place the first

n)7

in the original node5 and the rest in a new node
o let the new node be p, and let be the least .ey (alue in p. Insert (,p) in the parent of the node
being split If the parent is full5 split it and propagate the split further up
Y !he splitting of nodes proceeds upwards till a node that is not full is found In the worst case the
root node may be split increasing the height of the tree by 4
#igureResult of splitting node containing +righton and Downtown on inserting -lear(iew
#igure+
3
-T"ee be-o"e and a-te" inse"tion o- C'ea"*ie
U*4t'& o- B
U
9T.''&" D'!'t+o-
Y #ind the record to be deleted5 and remo(e it from the main file and from the buc.et (if present)
Y Remo(e (search-.ey (alue5 pointer) from the leaf node if there is no buc.et or if the buc.et has
become empty
Y If the node has too few entries due to the remo(al5 and the entries in the node and a sibling fit into a
single node5 then
o Insert all the search-.ey (alues in the two nodes into a single node (the one on the left)5 and
delete the other node
o Delete the pair (>
iQ4
5 P
i
$, where P
i
is the pointer to the deleted node5 from its parent5 recursi(ely
using the abo(e procedure
Y 0therwise5 if the node has too few entries due to the remo(al5 and the entries in the node and a
sibling fit into a single node5 then
o Redistribute the pointers between the node and a sibling such that both ha(e more than the
minimum number of entries
o >pdate the corresponding search-.ey (alue in the parent of the node
Y !he node deletions may cascade upwards till a node which has n8#

or more pointers is found
If the root node has only one pointer after deletion5 it is deleted and the sole child becomes the root
E6)*!'& o1 B
U
9T.'' D'!'t+o-
,i0#"e +e-o"e and a-te" de'etin0
fDontong
Y The "emo*a' o- the 'ea- node containin0 Donton did not "es#'t in its pa"ent ha*in0 too
'itt'e
pointers %o the cascaded de'etions stopped ith the de'eted 'ea- node5s pa"ent.
,i0#"e De'etion o- f%e""y"id0eg -"om "es#'t o- p"e*io#s e?amp'e
CS1254 DATABASE MANAGEMENT
SYSTEMS
15
0
Y )ode ith %e""y"id0e becomes #nde"-#'' Aact#a''y empty, in this specia' caseB and me"0ed
ith its
sibling
Y $s a result %e""y"id0e node5s pa"ent became #nde"-#'', and as me"0ed ith its sib'in0
Aand an
entry was deleted from their parent)
Y Root node then had only one child5 and was deleted and its child became the new root node
,i0#"e+e-o"e and a-te" de'etion o- f%e""y"id0eg -"om ea"'ie" e?amp'e
Y Parent of leaf containing Perryridge became underfull5 and borrowed a pointer from its left sibling
Y %earch-/ey *a'#e in the pa"ent5s pa"ent chan0es as a "es#'t
1L. B9TREE INDE# FILES
Y %imilar to +3-tree5 but +-tree allow search-.ey (alues to appear only onceH eliminates redundant
storage of search .eys
Y %earch .eys in nonleaf nodes appear nowhere else in the +-treeH an additional pointer field for each
search .ey in a nonleaf node must be included
Y 6enerali/ed +-tree leaf node
CS1254 DATABASE MANAGEMENT
SYSTEMS
15
1
Y *onleaf node pointers A
i
is the buc.et or file record pointers
B9T.'' I-4'6 F+!' E6)*!'
Y A42-t3'& of +-!ree indices=
o May use less tree nodes than a corresponding +
3
-!ree
o %ometimes possible to find search-.ey (alue before reaching leaf node
Y D+&42-t3'& of +-!ree indices=
o 0nly small fraction of all search-.ey (alues are found early
o *on-leaf nodes are larger5 so fan-out is reduced !hus5 +-!rees typically ha(e greater depth
than corresponding +
3
-!ree
o Insertion and deletion more complicated than in +
3
-!rees
o Implementation is harder than +
3
-!rees
Y !ypically5 ad(antages of +-!rees do not outweigh disad(antages
CS1254 DATABASE MANAGEMENT
SYSTEMS
15
2
I-t.o45,t+o-
11. HASHING TECHNIQUES
Y 3ne disad*anta0e o- seO#entia' d'e o"0ani8ation is that e m#st access an inde?
st"#ct#"e to 'ocate data o" m#st #se bina"y sea"ch, and that "es#'ts in mo"e !_3
ope"ation.
Y ,i'e 3"0ani8ation based on the techniO#e o- hashin0 a''o #s to a*oid accessin0 an
inde? st"#ct#"e.
Y >ashin0 a'so p"o*ide a ay o- const"#ctin0 indices.
T(*'&
&. Static >ashin0
2. Dynamic >ashin0
Stt+, H&7+-3
Y $ %5,0't is a unit of storage containing one or more records (a buc.et is typically a dis. bloc.)
Y In a 7&7 1+!' o.3-+;t+o- we obtain the buc.et of a record directly from its search-.ey (alue using
a 7&7 15-,t+o-.
Y 2ash function h is a function from the set of all search-.ey (alues > to the set of all buc.et addresses
A.
Y 2ash function is used to locate records for access5 insertion as well as deletion
Y Records with different search-.ey (alues may be mapped to the same buc.etH thus entire buc.et has
to be searched seEuentially to locate a record
H&7 F5-,t+o-&
Y :orst has function maps all search-.ey (alues to the same buc.etH this ma.es access time
proportional to the number of search-.ey (alues in the file
Y $n ideal hash function is 5-+1o.), ie5 each buc.et is assigned the same number of search-.ey
(alues from the set of all possible (alues
Y Ideal hash function is .-4o)5 so each buc.et will ha(e the same number of records assigned to it
irrespecti(e of the actual distri!ution of search-.ey (alues in the file
Y !ypical hash functions perform computation on the internal binary representation of the search-.ey
o #or e1ample5 for a string search-.ey5 the binary representations of all the characters in the
string could be added and the sum modulo the number of buc.ets could be returned
H-4!+-3 o1 B5,0't O2'.1!o/&
Y +uc.et o(erflow can occur because of
o Insufficient buc.ets
o %.ew in distribution of records !his can occur due to two reasons=
multiple records ha(e same search-.ey (alue
chosen hash function produces non-uniform distribution of .ey (alues
Y $lthough the probability of buc.et o(erflow can be reduced5 it cannot be eliminatedH it is handled by
using overflow !ucets.
Y 0(erflow chaining the o(erflow buc.ets of a gi(en buc.et are chained together in a lin.ed list
Y $bo(e scheme is called closed hashing
o $n alternati(e5 called open hashing5 which does not use o(erflow buc.ets5 is not suitable for
database applications
H&7 I-4+,'&
#igure0(erflow chaining in a hash structure
Y 2ashing can be used not only for file organi/ation5 but also for inde1-structure creation
Y $ hash inde1 organi/es the search .eys5 with their associated record pointers5 into a hash file
structure
Y %trictly spea.ing5 hash indices are always secondary indices
o if the file itself is organi/ed using hashing5 a separate primary hash inde1 on it using the same
search-.ey is unnecessary
o 2owe(er5 we use the term hash inde1 to refer to both secondary inde1 structures and hash
organi/ed files
E6)*!' o1 H&7 I-4'6
,i0#"e>ash inde? on sea"ch /ey acco#ntFn#mbe" o- acco#nt d'e
D'1+,+'-,+'& o1 Stt+, H&7+-3
Y In static hashing5 function h maps search-.ey (alues to a fi1ed set of A of buc.et addresses
o Databases grow with time If initial number of buc.ets is too small5 performance will degrade
due to too much o(erflows
o If file si/e at some point in the future is anticipated and number of buc.ets allocated
accordingly5 significant amount of space will be wasted initially
o If database shrin.s5 again space will be wasted
o 0ne option is periodic re-organi/ation of the file with a new hash function5 but it is (ery
e1pensi(e
Y !hese problems can be a(oided by using techniEues that allow the number of buc.ets to be modified
dynamically
12. DYNAMIC HASHING
Y 6ood for database that grows and shrin.s in si/e
Y $llows the hash function to be modified dynamically
Y E1tendable hashing one form of dynamic hashing
o 2ash function generates (alues o(er a large range c typically !-bit integers5 with ! M<7
o $t any time use only a prefi1 of the hash function to inde1 into a table of buc.et addresses
o 'et the length of the prefi1 be i bits5 8 i <7
o +uc.et address table si/e M 7
i
Initially i M 8
o Value of i grows and shrin.s as the si/e of the database grows and shrin.s
o Multiple entries in the buc.et address table may point to a buc.et
o !hus5 actual number of buc.ets is N 7
i
!he number of buc.ets also changes dynamically due to coalescing and splitting of
buc.ets
G'-'.! E6t'-4%!' H&7 St.5,t5.'
o !n this st"#ct#"e, i
2
I i
1
I i, he"eas i
&
I i &
,i0#"e<ene"a' e?tendab'e hash st"#ct#"e
U&' o1 E6t'-4%!' H&7 St.5,t5.'
Y Each buc.et j stores a (alue i
j
I all the entries that point to the same buc.et ha(e the same (alues on
the first i
j
bits
Y !o locate the buc.et containing search-.ey >
j
=
o Comp#te "#K
$
% ! &
Y 2se the d"st i hi0h o"de" bits o- & as a disp'acement into b#c/et add"ess tab'e, and
-o''o the pointe" to app"op"iate b#c/et
Y !o insert a record with search-.ey (alue >
j
o #ollow same procedure as loo.-up and locate the buc.et5 say j
o If there is room in the buc.et j inserts record in the buc.et
o Else the buc.et must be split and insertion re-attempted
0(erflow buc.ets used instead in some cases
o To sp'it a b#c/et $ hen inse"tin0 "eco"d ith sea"chE/ey *a'#e K
$
@
Y If i O i
j
(more than one pointer to buc.et j)
o $llocate a new buc.et X5 and set i
j
and i
X
to the old i
j
-3 4
o ma.e the second half of the buc.et address table entries pointing to j to point to X
o Remo(e and reinsert each record in buc.et j.
o recompute new buc.et for >
j
and insert record in the buc.et (further splitting is reEuired if the
buc.et is still full)
Y If i ; i
j
(only one pointer to buc.et j)
o Increment i and double the si/e of the buc.et address table
o Replace each entry in the table by two entries that point to the same buc.et
o recompute new buc.et address table entry for >
j
*ow i O i
j
so use the first case abo(e
Y :hen inserting a (alue5 if the buc.et is full after se(eral splits (that is5 i reaches some limit !) create
an o(erflow buc.et instead of splitting buc.et entry table further
Y To 4'!'t' 0'( 2!5'5
o 'ocate it in its buc.et and remo(e it
o !he buc.et itself can be remo(ed if it becomes empty (with appropriate updates to the buc.et
address table)
o Coa'escin0 o- b#c/ets can be done Acan coa'esce on'y ith a b#ddy b#c/et ha*in0 same
*a'#e
of i
D
and same i
D
4 prefi15 if it is present)
o Decreasing buc.et address table si/e is also possible
*ote= decreasing buc.et address table si/e is an e1pensi(e operation and should be
done only if number of buc.ets becomes much smaller than the si/e of the table
U&' o1 E6t'-4%!' H&7 St.5,t5.'" E6)*!'
,i0#"e>ash ,#nction -o" b"anchFname
,i0#"e!nitia' >ash st"#ct#"e, b#c/et si8e I 2
Y Hash str#"t#re after i)serti&) &f &)e Bri1ht&) a)d t8& D&8)t&8) re"&rds
Y Hash str#"t#re after i)serti&) &f Mia)#s re"&rd
Y Hash str#"t#re after i)serti&) &f three Perryrid1e re"&rds
>ash st"#ct#"e a-te" inse"tion o- (edood and (o#nd >i'' "eco"ds
E,te)da%e Hashi)1 4s. Other S"hemes
Y +enefits of e1tendable hashing=
o 2ash performance does not degrade with growth of file
o Minimal space o(erhead
Y D+&42-t3'& of e1tendable hashing
o E1tra le(el of indirection to find desired record
o +uc.et address table may itself become (ery big (larger than memory)
*eed a tree structure to locate desired record in the structureZ
o -hanging si/e of buc.et address table is an e1pensi(e operation
Y 'inear hashing is an alternati(e mechanism which a(oids these disad(antages at the possible cost of
more buc.et o(erflows
Co)*.+&o- o1 O.4'.'4 I-4'6+-3 -4 H&7+-3
Y -ost of periodic re-organi/ation
Y Relati(e freEuency of insertions and deletions
Y Is it desirable to optimi/e a(erage access time at the e1pense of worst-case access timeK
Y E1pected type of Eueries=
o 2ashing is generally better at retrie(ing records ha(ing a specified (alue of the .ey
o If range Eueries are common5 ordered indices are to be preferred
03. .UER9
PROCESSING
CS1254 DATABASE MANAGEMENT
SYSTEMS
16
0
B&+, St'*& +- Q5'.( P.o,'&&+-3
o %a"sin0 and t"ans'ation
o 3ptimi8ation
o E*a'#ation
Y Parsing and translation
o !ranslate the Euery into its internal form !his is then translated into relational algebra
o Parser chec.s synta15 (erifies relations
Y E(aluation
o !he Euery-e1ecution engine ta.es a Euery-e(aluation plan5 e1ecutes that plan5 and returns the
answers to the Euery
B&+, St'*& +- Q5'.( P.o,'&&+-3" O*t+)+;t+o-
Y $ relational algebra e1pression may ha(e many eEui(alent e1pressions
Eg5
!alance 7B88
(
!alance
(account$$ is eEui(alent to
balance
(
balance 7B88
(account$$
Y Each relational algebra operation can be e(aluated using one of se(eral different algorithms
o -orrespondingly5 a relational-algebra e1pression can be e(aluated in many ways
Y $nnotated e1pression specifying detailed e(aluation strategy is called an '2!5t+o-9*!-
o Eg5 can use an inde1 on !alance to find accounts with balance N 7B885
or can perform complete relation scan and discard accounts with balance 7B88
CS1254 DATABASE MANAGEMENT
SYSTEMS
16
1
#igure$ Euery e(aluation plan
Y Q5'.( O*t+)+;t+o-= $mongst all eEui(alent e(aluation plans choose the one with lowest cost
o -ost is estimated using statistical information from the database catalog
E1ample= number of tuples in each relation5 si/e of tuples5 etc
Y In this Q5'.( P.o,'&&+-3 we study
>o to meas#"e O#e"y costs
A'0o"ithms -o" e*a'#atin0 "e'ationa' a'0eb"a ope"ations
>o to combine a'0o"ithms -o" indi*id#a' ope"ations in o"de" to
e*a'#ate a comp'ete e?p"ession
:e st#dy ho to optimi8e O#e"ies, that is, ho to dnd an e*a'#ation
p'an ith 'oest estimated cost
M'&5 .'& o1 Q5 '.( Co& t
Y -ost is generally measured as total elapsed time for answering Euery
o Many factors contribute to time cost
dis accesses, CP%5 or e(en networ. communication
Y !ypically dis. access is the predominant cost5 and is also relati(ely easy to estimate Measured by
ta.ing into account
o *umber of see.s J a(erage-see.-cost
o *umber of bloc.s read J a(erage-bloc.-read-cost
o *umber of bloc.s written J a(erage-bloc.-write-cost
-ost to write a bloc. is greater than cost to read a bloc.
data is read bac. after being written to ensure that the write was successful
Y #or simplicity we Dust use num!er of !loc transfers from dis as the cost measure
Y -osts depends on the si/e of the buffer in main memory
Y Real systems ta.e -P> cost into account5 differentiate between seEuential and random I)05 and ta.e
buffer si/e into account
Y :e do not include cost to writing output to dis. in our cost formulae
14. SELECTION OPERATION
Y Fi%e s"a) sea"ch a'0o"ithms that 'ocate and "et"ie*e "eco"ds that -#'d'' a se'ection
condition.
CS1254 DATABASE MANAGEMENT
SYSTEMS
16
2
Y A%1&rithm A0 :linear search;. Scan each d'e b'oc/ and test a'' "eco"ds to see
hethe" they satis-y the se'ection condition.
o Cost estimate I '
(
b'oc/ t"ans-e"s
e & see/
'
(
denotes n#mbe" o- b'oc/s containin0 "eco"ds -"om "e'ation (
o !- se'ection is on a /ey att"ib#te, can stop on dndin0 "eco"d
cost I A'
(
_2B b'oc/ t"ans-e"s e & see/
o ;inea" sea"ch can be app'ied
"e0a"d'ess o-
se'ection condition o"
o"de"in0 o- "eco"ds in the -i'e, o"
a*ai'abi'ity o- indices
Y A2 (!inary search$. $pplicable if selection is an eEuality comparison on the attribute on which file
is ordered
o $ssume that the bloc.s of a relation are stored contiguously
o -ost estimate (number of dis. bloc.s to be scanned)=
cost of locating the first tuple by a binary search on the bloc.s
log
7
(!
r
$ J (t
/
3 t
<
)
If there are multiple records satisfying selection
Add transfer cost of the number of bloc.s containing records that satisfy
selection condition
S'!',t+o-& U&+-3 I-4+,'&
Y I)de, s"a) sea"ch a'0o"ithms that #se an inde?
%election condition must be on search-.ey of inde1
Y A3 A)(i*a(+ in,-. on can,i,a- /-+0 -1ua2i+B. (et"ie*e a sin0'e "eco"d that
satisdes the co""espondin0 eO#a'ity condition
Cost M (h
i
3 4) J (t
/
3 t
<
)
Y A/ A)(i*a(+ in,-. on non/-+0 -1ua2i+% (et"ie*e m#'tip'e "eco"ds.
Records will be on consecuti(e bloc.s
o 'et b M number of bloc.s containing matching records
Cost M h
i
9 (t
/
3 t
<
) 3 t
<
3 t
/
J b
Y A< A-1ua2i+ on 3-a(c"4/-+ o5 3-con,a(+ in,-.%.
Retrie(e a single record if the search-.ey is a candidate .ey
o Cost ; (h
i
3 4) J (t
/
3 t
<
)
Retrie(e multiple records if search-.ey is not a candidate .ey
CS1254 DATABASE MANAGEMENT
SYSTEMS
16
3
o each of n matching records may be on a different bloc.
o -ost M (h
i
3 n$ 9 (t
/
3 t
<
)
-an be (ery e1pensi(e
S'!',t+o-& I-2o!2+-3 Co)*.+&o-&
Y -an implement selections of the form
A Y
(r) or
A Y
(r) by using
o a linear file scan or binary search5
o or by using indices in the following ways=
Y A@ (primary inde., comparison$. (Relation is sorted on $)
#or
A Y
(r$ use inde1 to find first tuple v and scan relation seEuentially from there
#or
A Y
(r) Dust scan relation seEuentially till first tuple O vI do not use inde1
Y A> (secondary inde., comparison)
#or
A Y
(r$ use inde1 to find first inde1 entry v and scan inde1 seEuentially from
there5 to find pointers to records
#or
A Y
(r) Dust scan leaf pages of inde1 finding pointers to records5 till first entry O v
In either case5 retrie(e records that are pointed to
reEuires an I)0 for each record
'inear file scan may be cheaper
I)*!')'-tt+o- o1 Co)*!'6 S'!',t+o-&
Co-85-,t+o-"
4 7

n
(r$
Y AK (conjunctive selection using one inde.$.
o %elect a combination of
i
and algorithms $4 through $; that results in the least cost for
i
(r$.
o !est other conditions on tuple after fetching it into memory buffer
Y A9 (conjunctive selection using multiple-ey inde.)
o >se appropriate composite (multiple-.ey) inde1 if a(ailable
Y A1L (conjunctive selection !y intersection of identifiers$.
o ReEuires indices with record pointers
o >se corresponding inde1 for each condition5 and ta.e intersection of all the obtained sets of
record pointers
o !hen fetch records from file
o If some conditions do not ha(e appropriate indices5 apply test in memory
D+&85-,t+o-"
4 7

n
(r$.
Y A11 (disjunctive selection !y union of identifiers$
CS1254 DATABASE MANAGEMENT
SYSTEMS
16
4
o $pplicable if all conditions ha(e a(ailable indices
0therwise use linear scan
o >se corresponding inde1 for each condition5 and ta.e union of all the obtained sets of record
pointers
o !hen fetch records from file
Y N'3t+o-" (r$
o >se linear scan on file
o If (ery few records satisfy 5 and an inde1 is applicable to
#ind satisfying records using inde1 and fetch from file
14. SORTING
Y :e may build an inde1 on the relation5 and then use the inde1 to read the relation in sorted order
May lead to one dis. bloc. access for each tuple
Y #or relations that fit in memory5 techniEues 'i/e O#ic/so"t can be #sed. ,o" "e'ations that don5t dt
in
memory5 '6t'.-! &o.t9)'.3' is a good choice
E6t'.-! So.t9M'.3'
o ;et M denote memo"y si8e Ain pa0esB.
Y C.'t' &o.t'4 .5-&
'et i be 8 initially
Repeatedly do the following till the end of the relation=
(a) Read Z bloc.s of relation into memory
(b) %ort the in-memory bloc.s
(c) :rite sorted data to run R
i
H increment i.
'et the final (alue of i be =
Y M'.3' t7' .5-& <N9/( )'.3'=
:e assume (for now) that = N Z
o >se = bloc.s of memory to buffer input runs5 and 4 bloc. to buffer output Read the first
bloc. of each run into its buffer page
o .'*'t
%elect the first record (in sort order) among all buffer pages
:rite the record to the output buffer If the output buffer is full write it to dis.
CS1254 DATABASE MANAGEMENT
SYSTEMS
16
5
Delete the record from its input buffer page
I1 the buffer page becomes empty t7'-
read the ne1t bloc. (if any) of the run into the buffer
o 5-t+! all input buffer pages are empty=
Y If = Z5 se(eral merge passes are reEuired
4 In each pass5 contiguous groups of Z - 4 runs are merged
7 $ pass reduces the number of runs by a factor of Z -45 and creates runs longer by the same
factor
Eg If MM445 and there are L8 runs5 one pass reduces the number of runs to L5 each
48 times the si/e of the initial runs
o Repeated passes are performed till all runs ha(e been merged into one
E6)*!'" E6t'.-! So.t+-3 U&+-3 So.t9M'.3'
Y -ost analysis=
o !otal number of merge passes reEuired= log
Z4
(!
r
8Z$
o +loc. transfers for initial run creation as well as in each pass is 7!
r
-o" -ina' pass, e don5t co#nt "ite cost
we ignore final write cost for all operations since the output of an operation
may be sent to the parent operation without being written to dis.
!hus total number of bloc. transfers for e1ternal sorting=
!
r
( # log
Z4
(!
r
8 Z$ 3 4)
CS1254 DATABASE MANAGEMENT
SYSTEMS
16
6
Y -ost of see.s
o During run generation= one see. to read each run and one see. to write each run
# !
r
8 Z
o During the merge phase
+uffer si/e= !
!
(read)write !
!
bloc.s at a time)
*eed # !
r
8 !
!
see.s for each merge pass
e1cept the final one which does not reEuire a write
!otal number of see.s=
# !
r
8 Z 3 !
r
8 !
!
(# log
Z4
(!
r
8 Z$ -4)
15. JOIN OPERATION
Y %e(eral different algorithms to implement Doins
4 *ested-loop Doin
7 +loc. nested-loop Doin
< Inde1ed nested-loop Doin
C Merge-Doin
B 2ash-Doin
Y -hoice based on cost estimate
Y E1amples use the following information
o *umber of records of customer= 485888 depositor= B888
o *umber of bloc.s of customer= C88 depositor= 488
N'&t'49Loo* Jo+-
Y !o compute the theta Doin r s
1o. ',7 tuple t
r
+- r 4o %'3+-
1o. ',7 t5*!' t
s
+- s 4o %'3+-
test pair (t
r
,t
s
) to see if they satisfy the Doin condition
if they do5 add t
r
t
s
to the result
'-4
'-4
Y r is called the o5t'. .'!t+o- and s the +--'. .'!t+o- of the Doin
Y ReEuires no indices and can be used with any .ind of Doin condition
Y E1pensi(e since it e1amines e(ery pair of tuples in the two relations
Y In the worst case5 if there is enough memory only to hold one bloc. of each relation5 the estimated
cost is
n
r
!
s
3 !
r
bloc. transfers5 plus
n
r
3 !
r
see.s
Y If the smaller relation fits entirely in memory5 use that as the inner relation
o Reduces cost to !
r
3 !
s
bloc. transfers and 7 see.s
Y $ssuming worst case memory a(ailability cost estimate is
o with depositor as outer relation=
B888 C88 3 488 M 758885488 bloc. transfers5
B888 3 488 M B488 see.s
o with customer as the outer relation
48888 488 3 C88 M 458885C88 bloc. transfers and 485C88 see.s
Y If smaller relation (depositor$ fits entirely in memory5 the cost estimate will be B88 bloc. transfers
B!o,0 N'&t'49Loo* Jo+-
Y Variant of nested-loop Doin in which e(ery bloc. of inner relation is paired with e(ery bloc. of outer
relation
f&r ea"h b'oc/ B
(
&f ( d& e1i)
f&r ea"h b'oc/ B
3
&f s d& e1i)
f&r ea"h t#p'e
(
i) B
(
d& e1i)
f&r ea"h t#p'e
3
i) B
3
d& e1i)
Chec/ i- A
(
0
3
% satis-y the Doin
condition i- they do, add
(
h
3
to the "es#'t.
e)d
e)
d
e)
d
e)d
Y :orst case estimate= !
r
!
s
B !
r
bloc. transfers 3 7 J !
r
see.s
Y +est case= !
r
3 !
s
bloc. transfers 3 7 see.s
I-4'6'4 N'&t'49Loo* Jo+-
Y Inde1 loo.ups can replace file scans if
o Doin is an eEui-Doin or natural Doin and
o an inde? is a*ai'ab'e on the inne" "e'ation5s Doin attribute
Y #or each tuple t
r
in the outer relation r, use the inde1 to loo. up tuples in s that satisfy the Doin
condition with tuple t
r
.
Y :orst case= buffer has space for only one page of r5 and5 for each tuple in r5 we perform an inde1
loo.up on s.
Y -ost of the Doin= !
r
(t
/
B t
<
) 3 n
r
c
o :here c is the cost of tra(ersing inde1
E6)*!' o1 N'&t'49Loo* Jo+- Co&t&
Y -ompute depositor customer, with depositor as the outer relation
Y 'et customer ha(e a primary +
3
-tree inde1 on the Doin attribute customer-name, which contains 78
entries in each inde1 node
Y %ince customer has 485888 tuples5 the height of the tree is C5 and one more access is needed to find
the actual data
Y depositor has B888 tuples
Y -ost of bloc. nested loops Doin
o C88J488 3 488 M C85488 bloc. transfers 3 7 J 488 M 788 see.s
Y -ost of inde1ed nested loops Doin
o 488 3 B888 J B M 7B5488 bloc. transfers and see.s
o -P> cost li.ely to be less than that for bloc. nested loops Doin
M'.3'9Jo+-
Y %ort both relations on their Doin attribute (if not already sorted on the Doin attributes)
Y Merge the sorted relations to Doin them
o ,oin step is similar to the merge stage of the sort-merge algorithm
o Main difference is handling of duplicate (alues in Doin attribute c e(ery pair with same
(alue on Doin attribute must be matched
Y -an be used only for eEui-Doins and natural Doins
Y Each bloc. needs to be read only once (assuming all tuples for any gi(en (alue of the Doin attributes
fit in memory)
Y !hus the cost of merge Doin is=
!
r
B !
s
bloc. transfers 3 !
r
8 !
!
3 !
s
8 !
!
see.s
3 the cost of sorting if relations are unsorted
H(%.+4 )'.3'98o+-" If one relation is sorted5 and the other has a secondary +
3
-tree inde1 on the Doin
attribute
H&79Jo+-
Y $pplicable for eEui-Doins and natural Doins
Y $ hash function h is used to partition tuples of both relations
Y h maps [oinAttrs (alues to P85 45 5 nR5 where [oinAttrs denotes the common attributes of r and s
used in the natural Doin
o r
\
, r
"
, . . ., r
n
denote partitions of r tuples
Each tuple t
r
r is put in partition r
i
where i ; h(t
r
][oinAttrs2$.
o r
\
,, r
"
. . ., r
n
denotes partitions of s tuples
Each tuple t
s
s is put in partition s
i
5 where i ; h(t
s
][oinAttrs2$.
=ote@ In boo.5 r
i
is denoted as C
ri,
s
i
is denoted as C
si
and n is denoted as n
h.
o
Y r tuples in r
i
need only to be compared with s tuples in s
i
*eed not be compared with s tuples in
any other partition
H&79Jo+- A!3o.+t7)
!he hash-Doin of r and s is computed as follows
Y %a"tition the "e'ation 3 #sin0 hashin0 -#nction ". :hen pa"titionin0 a "e'ation,
one b'oc/ o- memo"y is "ese"*ed as the o#tp#t b#ie" -o" each pa"tition.
Y %a"tition ( simi'a"'y.
Y ,o" each i6
CS1254 DATABASE MANAGEMENT
SYSTEMS
17
0
;oad 3
i
into memo"y and b#i'd an inEmemo"y hash inde? on it #sin0 the Doin
att"ib#te. This hash inde? #ses a di--e"ent hash -#nction than the ea"'ie" one
".
(ead the t#p'es in (
i
-"om the dis/ one by one. ,o" each t#p'e
(
'ocate each
matchin0 t#p'e
3
in
3
i
#sin0 the inEmemo"y hash inde?. 3#tp#t the concatenation o- thei"
att"ib#tes.
Y (e'ation 3 is ca''ed the #i%d i)p#t and ( is ca''ed the pr&e i)p#t.
Y !he (alue n and the hash function h is chosen such that each s
i
should fit in memory
Y R',5.&+2' *.t+t+o-+-3 reEuired if number of partitions n is greater than number of pages Z of
memory
H-4!+-3 o1 O2'.1!o/&
Y Partitioning is said to be &0'/'4 if some partitions ha(e significantly more tuples than some others
Y H&79t%!' o2'.1!o/ occurs in partition s
i
if s
i
does not fit in memory Reasons could be
Y Many tuples in s with same (alue for Doin attributes
Y +ad hash function
Y O2'.1!o/ .'&o!5t+o- can be done in build phase
Y Partition s
i
is further partitioned using different hash function
Y Partition r
i
must be similarly partitioned
Y O2'.1!o/ 2o+4-,' performs partitioning carefully to a(oid o(erflows during build phase
Y Eg partition build relation into many partitions5 then combine them
Y +oth approaches fail with large numbers of duplicates
Y #allbac. option= use bloc. nested loops Doin on o(erflowed partitions
Co&t o1 H&79Jo+-
Y If recursi(e partitioning is not reEuired= cost of hash Doin is
<(!
r
3 !
s
$ 3C n
h
Y !otal cost estimate is=
7(!
r
B !
s
log
ZQ4
(!
s
) 4 3 !
r
B !
s
bloc. transfers 3
7( !
r
8 !
!
3 !
s
8 !
!
) log
ZQ4
(!
s
) 4 see.s
Y If the entire build input can be .ept in main memory no partitioning is reEuired
Y -ost estimate goes down to !
r
B !
s

H(%.+4 H&7Jo+-
Y >seful when memory si/ed are relati(ely large5 and the build input is bigger than memory
Y M+- 1't5.' o1 7(%.+4 7&7 8o+-"
=eep the first partiti&) &f the #i%d re%ati&) i) mem&ry.
CS1254 DATABASE MANAGEMENT
SYSTEMS
17
1
Y Eg :ith memory si/e of 7B bloc.s5 depositor can be partitioned into fi(e partitions5 each of si/e 78
bloc.s
Y D+2+&+o- o1 )')o.("
!he first partition occupies 78 bloc.s of memory
4 bloc. is used for input5 and 4 bloc. each for buffering the other C partitions
Y customer is similarly partitioned into fi(e partitions each of si/e F8
Y the first is used right away for probing5 instead of being written out
Y -ost of <(F8 3 <78) 3 78 3F8 M 4<88 bloc. transfers for
hybrid hash Doin5 instead of 4B88 with plain hash-Doin
Co)*!'6 Jo+-&
Y ,oin with a conDuncti(e condition=
(
& 2 ... n
3
Y Either use nested loops)bloc. nested loops5 or
Y -ompute the result of one of the simpler Doins r
i
s
final result comprises those tuples in the intermediate result that satisfy the remaining
conditions
&
. . .
i & i e&
. . .
n
Y ,oin with a disDuncti(e condition
(
& 2 ... n
3
Y Either use nested loops)bloc. nested loops5 or -ompute as the union of the records in
indi(idual Doins
A(
&
3B A(
2
3% . . . A(
n
3%
Ot7'. O*'.t+o-&
Y D5*!+,t' '!+)+-t+o- can be implemented (ia hashing or sorting
Y 0n sorting duplicates will come adDacent to each other5 and all but one set of duplicates can
be deleted
Y 2ashing is similar duplicates will come into the same buc.et
Y P.o8',t+o-"
Y Perform proDection on each tuple followed by duplicate elimination
Y A33.'3t+o- can be implemented in a manner similar to duplicate elimination
Y %orting or hashing can be used to bring tuples in the same group together5 and then the
aggregate functions can be applied on each group
CS1254 DATABASE MANAGEMENT
SYSTEMS
17
2
Y S't o*'.t+o-& ( 5 and )= can either use (ariant of merge-Doin after sorting5 or (ariant of hash-
Doin
r s=
o $dd tuples in s
i
to the hash inde1 if they are not already in it
o $t end of s
i
add the tuples in the hash inde1 to the result
r s=
o output tuples in s
i
to the result if they are already there in the hash inde1
r s@
o for each tuple in s
i
, if it is there in the hash inde15 delete it from the inde1
o $t end of s
i
add remaining tuples in the hash inde1 to the result
Y O5t'. 8o+- can be computed either as
Y $ Doin followed by addition of null-padded non-participating tuples
Y by modifying the Doin algorithms
Y Modifying merge Doin to compute r s
Y In r s5 non participating tuples are those in r
R
(r s)
Y Modify merge-Doin to compute r s@ During merging5 for e(ery tuple t
r
from r that do
not match any tuple in s, output t
r
padded with nulls
Y Right outer-Doin and full outer-Doin can be computed similarly
Y Modifying hash Doin to compute r s
Y If r is probe relation5 output non-matching r tuples padded with nulls
Y If r is build relation5 when probing .eep trac. of which r tuples matched s tuples $t end of
s
i
output non-matched r tuples padded with nulls
CS1254 DATABASE MANAGEMENT
SYSTEMS
17
3
4 Define Euery optimi/ation
7 :hat is an inde1K
< :hat are called Du.ebo1 systemsK
C :hat are the types of storage de(icesK
B :hat is called remapping of bad sectorsK
9 Define access time
; Define see. time
F Define a(erage see. time
L Define rotational latency time
48 Define a(erage latency time
44 :hat is meant by data-transfer rateK
47 :hat is meant by mean time to failureK
4< :hat is a bloc. and a bloc. numberK
4C :hat are called Dournaling file systemsK
4B :hat is the use of R$IDK
49 :hat is called mirroringK
4; :hat is called mean time to repairK
4F :hat is called bit-le(el stripingK
4L :hat is called bloc.-le(el stripingK
2 M . 0 Q 5 ' &t+o - &
78 :hat are the two main goals of parallelismK
74 :hat are the factors to be ta.en into account when choosing a R$ID le(elK
77 :hat is meant by software and hardware R$ID systemsK
7< Define hot swappingK
7C :hat are the ways in which the (ariable-length records arise in database systemsK
7B :hat is the use of a slotted-page structure and what is the information present in the headerK
79 :hat are the two types of bloc.s in the fi1ed length representationK Define them
7; :hat is .nown as heap file organi/ationK
7F :hat is .nown as seEuential file organi/ationK
7L :hat is hashing file organi/ationK
<8 :hat is .nown as clustering file organi/ationK
<4 :hat are the types of indicesK
<7 :hat are the techniEues to be e(aluated for both ordered inde1ing and hashingK
<< :hat is .nown as a search .eyK
<C :hat is a primary inde1K
<B :hat are called inde1-seEuential filesK
<9 :hat are the two types of indicesK
<; :hat are called multile(el indicesK
<F :hat is +-!reeK
<L :hat is a +3-!ree inde1K
C8 :hat is a hash inde1K
C4 :hat is called Euery processingK
C7 :hat are the steps in(ol(ed in Euery processingK
C< :hat is called an e(aluation primiti(eK
CC :hat is called a Euery e(aluation planK
CB :hat is called a Euery e1ecution engineK
C9 :hat are called as inde1 scansK
C; :hat is called as e1ternal sortingK
CF :hat is called as recursi(e partitioningK
CL :hat is called as an *-way mergeK
B8 :hat is .nown as fudge factorK
B4 2ow to choose the best e(aluation plan for &ueryK
1@ M.0 Q5 '&t+o- &
1. E1plain (arious hashing techniEues
7 E1plain R$ID
< E1plain the steps in &uery processing
C E1plain +3 !ree and +-tree
B a) E1plain the types of #ile 0rgani/ation
b) E1plain some basic algorithms used in selection operation5 %orting and ,oin operation
9 E1plain physical storage media with e1ample
"r evio us $ ear %nna &ni versit y 'u est io ns
+E)+!eeh DE6REE EG$MI*$!I0*5 M$?),>*E 788;
#ifth %emester
(Regulation 788C) -omputer
%cience and Engineering
-% 4<84 c D$!$+$%E M$*$6EME*! %?%!EM%
(-ommon to Information !echnology)
(-ommon to +E (Partc!ime) #ourth %emester Regulation 788B)
!ime = !hree hours Ma1imum 7 488 mar.s
$nswer $'' EuestionH P$R!
$c (48 17 M 78 nar.s)
4 'ist fi(e responsibilities of the D+ Manager
7 6i(e the limitations of E-R model 2ow do you o(ercome thisK
< Define Euery language 6i(e the classification of Euery language
C :hy it is necessary to decompose a relationK
B 6i(e any two um antages of sparse inde1 o(er dense inde1
9 *ame the different types of Doins supported in %&'
; :hat are the types of transparencies that a distributed database must supportK :(hyK
F :hat benefit is pro(ided by strict-two-phase loc.ingK :hat are the disad(antages resultK
L +riefly write the o(erall process of data warehousing
48 :hat is an acti(e databaseUK
P$R! + c (B 1 49 M F8 mar.s)
44 (a) (i) :hat are the types of .nowledge disco(ered during data miningK E1plain with suitable e1ample
(ii) 2ighlight the features of obDect oriented database (F)
0r
(b) (i) :hat is nested relationsK 6i(e e1ample (F)
(ii) E1plain the structure of GM' with suitable e1ample (F)
47 (a) (i) -ompare file system with database system (F)
(ii) E1plain the architecture of D+M% (F)
0r
(b) (i) :hat are the steps in(ol(ed in designing ' database applicationK E1plain with an application (48)
(ii) 'ist the possible types of relation that may e1ist between two entities 2ow would you resist that into
tables for a binary relationK (9)
4< (a) (i) :hat are the relational algebra operations supported in %&'K :rite the %&' statement for each
operation (F)
(ii) ,ustify the need for normali/ation with e1amples (F)
0r
(b) (i) :hat is normali/ationK E1plain I*#5 7*#5 <*# and +-*# with simple e1ample (F) [
(ii) :hat is #DK E1plain the role of #D in the process of normali/ation (F)
4C (a) (i) E1plain the security features pro(ided in commercial Euery languages (F)
(ii) :hat are the steps in(ol(ed in Euery processingK 2ow would you estimate the cost of the EueryK (F)
0r
(b) (i) E1plain the different properties of inde1es in detail (F)
(ii) E1plain (arious hashing techniEues5 (F)
4B (a) (i) E1plain the four important properties of transaction that a D+M% must ensure to maintain
database (F)
(ii) :hat is R$IDK 'ist the different le(els in R$ID technology and e1plain its features (F)
0r
(b) (i) :hat is concurrency controlK 2ow is it implemented in D+M%K E1plain (F)
(ii) E1plain (arious reco(ery techniEues during transaction in detail (F)
+E)+!ech DE6REE EG$MI*$!I0*5 *0VEM+ER)DE-EM+ER 7889
#irth %emester
-omputer %cience and Engineering
-% 4<84 c D$!$+$%E M$*$6EME*! %?%!EM%
(-ommon to Information !echnology and +E Partc!ime R 788B #ourth %emester)
(Regulation 788C)
!ime = !hree hours Ma1imum = 488 mar.s
$nswer $'' Euestions P$R!
$ c (48 1 7 M 78 mar.s)
4 -ompare database systems with tile systems
7 6i(e the distinction between primary .ey -andidate .ey5 and super .ey
< :rite a %&' statement to find the names and loan numbers of all customers who ha(e a loan at -hennai
branch
C :hat is multi(alued dependency K
5. <i*e the meas#"es o-the O*.JiC* o- a dis/.
9 :hat are the two types o\ udered indicesK
; 'ist out the $-ID properties
F :hat is %hadow Du giagK
' L -ompare D+M% ersus obDectcoriented D+M%
48 :hat is data warehousingK
P$R! + c (B 1 49 M F8 mar.s)
44 (a[) (ii) Describe the system structure of a database system (47)
(ii) 'ist out the functions of a D+$ (C)
0r
(b) (i) Illustrate the issues to be considered while de(eloping an ER-diagram (F)
(ii) -onsider the relational database
employee(empname street5 city)
wor.s(einpname companyname5 salary)
company(cornpanyname5 city)
rnanages(ernpname5 managername)
6i(e an e1pression in the relational algebra for each reEuest=
(4) #ind the names of all employees who wor. for #irst +an. -orporation
(7) #ind the names5 street addresses5 and cities of residence of all employees who wor. for #irst +an.
-orporation and earn more than 788888 per annum
(<) #ind the names of all employees in this database who li(e in the same city as the company for which
they wor.
(C) #ind the names of all the employees who earn more than e(ery employees of %mall +an. -orporation (C
17 M F)
47 (a) (i) Discuss about triggers 2ow do triggers offer a powerful mechanism for dealing with the changes
to a database with suitable e1ampleK (48)
(ii) :hat are nested EueriesK E1plain with e1ample (9)
0r
(b) (i) :hat is normali/ationK 6i(e the (arious normal forms of relational schema and define a relation
which is in +-*# and e1plain with suitable e1ample (47)
(ii) -ompare +-*# re ]s]4s <*# (C)
4< (a) (i) Describe 'ent R$ID le(els (48)
(ii) E1pla[= (liy allocations of records to bloc.s affects database system perl uinance significantly (9)
0r
(b) (i) Describe the structure of +t tree and gi(e the algorithm for search in the +3 tree with e1ample (47)
(ii) 6i(e the comparison between ordered inde1ing and hashing (C)
4C (a) (i) E1plain the different forms of %eriali/ability (48)
(ii) :hat are different types of schedules are acceptable for reco(erabilityK (9)
0r
(b) (i) Discuss on twocphase loc.ing protocol and tin4cstampcbased protocol (47)
(ii) :rite short notes on logcbased reco(ery (C)
4B (a) (i) Discuss in detail about the obDect relational databases and its ad(antages (F)
(ii) Illustrate the issues to implement distributed databases (F)
0r
(b) (i) 6i(e the basic structure of GM' and its document schema (F)
(ii) :hat are the two important classes of data mining problemsK E1plain about rule disco(ery using those
classes (F)
+E)+!ech DE6REE EG$MI*$!I0*5 *0VEM+ER)DE-EM+ER 788;
!hird %emester
-omputer %cience and Engineering
-% 7<C c D$!$+$%E M$*$6EME*! %?%!EM%
!ime = !hree hours Ma1imum = 488 mar.s
$nswer $'' Euestions P$R!
$ c (48 1 7 M 78 mar.s)
4 :hat is physical data independence and why is it importantK
7 Define the concept of aggregation 6i(e two e1amples of where this concept is useful
< %tate the (arious operators used in relational algebra
C :hat is the need for triggers in %&'K %tate its usage
5. :hat is o"de"ed inde?j <i*e +T.1k.iC;'l.
9 Define tri(ial functional Dependencies
; :hat are the steps in(si( ei in Euery processing
F Distinguish between time stamp based protocols and loc. based protocol
L -ompare 0'!P and 0'$P
48 :hat are d(tn Il agmentatiorisK %tate the (arious fragmentations with e1ample
CS1254 DATABASE MANAGEMENT
SYSTEMS
18
0
P$R! + c (B 1 49 M F8 mar.s)
44 (a) (i) Differentiate between hierarchical5 networ.5 relational model with e1ample (F)
(ii) E1plain clearly the steps in(ol(ed in database de(elopment process while building an applicationK (F)
0r
(b) (i) :hat are the main differences between a file-processing system and a D+M%K %tate the
disad(antages of D+M% (F)
(ii) E1plain the distinction among the terms primary .ey5 foreign .ey and super .ey with a suitable e1ample
(F)
47 (a) (i) :hat are the (arious aggregate operators does the %&' support 6i(e a suitable e1ample for each
aggregate operator (F)
(ii) ,ustify the need for static %&' and dynamic %&' -onsider the relation Employee (empno name5 age5
salary) :rite a embedded %&' statement in - language to retrie(e all the employee record whose salary is
between 485888 to 78888 (F)
0r
(b) -onsider the employee-company relation :rite %&' statement for the following Eueries (C 1 C M 49)
Employee-company relation =
employees(personIname5 street5 city)
wor.s(personIname5 con]ipanyIname5 salary)
company(c8mpanyIname5 city)
manages(personIname5 managerIname)
(i) #ind all employees in the database wl o li(e in the same cities as the companies for which they wor.
(ii) Modify the database so that da(id now li(es in Mumbai
(iii) #ind all employees who are under the mana0e"
CDohn
l(i() 6i(e all mangers in the database a 48V percent raise
4< (a) (i) Draw the ER diagram 4 for the abo(e gi(en EmployeeIcompany relation (F)
(ii) :hat is the difference between primary inde1 and secondary inde1K (F)
0r
(b) %tate the goal of decomposition)normali/ationK E1plain the different le(el of normali/ation with
e1ample
CS1254 DATABASE MANAGEMENT
SYSTEMS
18
1
4C (a) (i) :hat are the different drawbac.s in concurrent usage of databaseK (F)
(ii):hat is the difference between implicit and e1plicit loc.ingK E1plain in detail with e1ample (F)
0r
(b) (i) 2ow do D+M% bac. out the changes made to database if a system failure occurs during the middle of
transaction (F)
(ii) 'ist the $-ID properties E1plain the usefulness of each (F)
4B (a) (i) E1plain two important classes of data mining problems in detail with e1ample (F)
(ii) E1plain the feature of distributed database system 'ist out some application of distributed database
system (F)
0r
(b) (i) -ompare 0bDect-0riented (ersus 0bDect-Relational database %tate with suitable e1ample (F)
(ii) Describe the (arious component of data warehouse and e1plain the multidimensional data model with
e1ample (F)