Você está na página 1de 359

Database Management Systems

BITS Pilani Dr.R.Gururaj


CS&IS Dept.
Hyderabad Campus
Course Content

1. Introduction and Overview of DBMS


2. Conceptual Database Design (ER Modeling)
3. Relational Model
4. Relational Algebra and Calculus
5. SQL
6. Schema Refinement and Normal Forms
7. Disk Storage
8. Hashing and Indexing
9. Transaction Management and Concurrency
Control
10.Database Recovery

BITS Pilani, Hyderabad Campus


Books

1. R Ramakrishnan & J Gehrke, Database Management


Systems, Mc Graw Hill, 3rd Ed., 2003.

2. Elmasri, Ramez, Shamkant B. Navathe, Fundamentals of


Database Systems, Pearson Education, 5th Ed., 2007

3. Date C.J., An Introduction to Database Systems, Pearson, 8th


Ed., 2006.

4. Korth H F and A Silberschatz, Database System Concepts,


MGHISE, 3rd Ed., 1997.

BITS Pilani, Hyderabad Campus


Lecture Session-1
Introduction to DBMS
Content
 Database Systems
 DBMS
 Database System environment
 Traditional file systems for storing data
 Advantages of DBMS over traditional file systems

BITS Pilani, Hyderabad Campus


Introduction
Databases and Systems to manage them have become significant
components of any present day business of any nature.

These databases help businesses to perform their day-to-day


activities in an efficient and effective manner.

• Banking
• Travel ticket reservation
• Library catalog search

Here some program access the database.


Advances in technology have given raise to new concepts-
 Multimedia databases
 GIS
 Web data
 Data warehousing and mining
BITS Pilani, Hyderabad Campus
Data: Known fact that can be recorded and that has
implicit meaning.
Ex. Name, Tel_no, city etc.

This data can be stored in a file on a computer.

Database: Is a collection of related data.

 It is a collection of logically related data.

 A database is designed, built and populated with


data for a specific purpose.

BITS Pilani, Hyderabad Campus


DBMS
DBMS: Is a collection of programs that enables users to create and maintain
databases in a convenient and effective manner.

DBMS is a software system that facilitates the following:

1.Defining the database: This includes defining the structures, data types,
constraints, indexes etc.
Database catalog/Data dictionary/ called as Meta-data

2.Constructing the database: This means storing data into the database
structures and storing on some storage medium.

3.Manipulating database for various applications: This encompasses activities like


– querying the database, inserting new records into the database, updating some
data items, and deleting certain items from the database.

What is DBMS?
What is a Database System?

BITS Pilani, Hyderabad Campus


BITS Pilani, Hyderabad Campus
Traditional file systems for
storing the data
If we take the example of savings bank enterprise, information about
customers and savings accounts etc. need to be stored.

One way to keep the information on computers is to store in files


provided by operating systems (OS).

Disadvantages of the above System


 Difficulty in accessing data (possible operations need to be hard-
coded in programs).
 Redundancy leading to inconsistency.
 Inconsistent changes made by concurrent users.
 No recovery on crash.
 The security provided by OS in the form of password is not
sufficient.
 Data Integrity is not maintained.

BITS Pilani, Hyderabad Campus


Advantages of using DBMS

 Data independence
 Efficient data access
 Data integrity and security
 Data Administration
 Concurrent access and Crash recovery
 Reduced application development time

BITS Pilani, Hyderabad Campus


Disadvantages of DBMS

1. Extra cost due to SW, HW and training.


2. Not suitable or effective for certain applications (Real-time
constraints; well-defined limited operations)
3. Data manipulation not supported by Query languages.

BITS Pilani, Hyderabad Campus


Summary

 What is Data, Database, and DBMS


 Importance of DBMS
 Storing data in Traditional file systems
 Advantages of DBMS over traditional file systems

BITS Pilani, Hyderabad Campus


Database Management Systems

BITS Pilani Dr.R.Gururaj


CS&IS Dept.
Hyderabad Campus
Lecture Session-2
DBMS Concepts

Content

 Describing and Storing data in DBMS


 Three schema Architecture
 Data independence
 Queries
 Transactions
 Structure of a DBMS
 People who work with DBMS

BITS Pilani, Hyderabad Campus


Describing and storing data in
DBMS
Data model

Is a collection of high-level data description constructs that hide many


low-level details.
DBMS allows a user to define the data to be stored in terms of a data model.

Semantic data models: More abstract high-level data models that make it easier
for a user to come up with a good initial description of the data in an enterprise.
Contain wide variety of constructs that help describe a real-world enterprise data.
Ex. ER model
Representational / Implementation data models:
These are DBMS specific data models and are built around just few basic constructs.
Ex. Relational data model, Object data model

A database design in terms of a semantic model serves as a useful starting point and
is subsequently translated into a database design in terms of the data model the
DBMS supports.

BITS Pilani, Hyderabad Campus


Relational Model:

The central data description construct in this model is a relation,


which can be thought of as a set of records.

Schema: Description of data in terms of a data model is called a schema.


A relation schema specifies the name of the relation, field, type etc.

Ex. Student (sid: string; name: string; age: integer)

every row follows the schema of the relation.

BITS Pilani, Hyderabad Campus


Instance of a relation:
Student
sid name age
A120 Raju 21
A134 Kiran 19
C110 John 22

A schema can be regarded as a template for describing a student.

We can specify integrity constraints which are conditions that need to be

satisfied by records in the relation. Ex. uniqueness

BITS Pilani, Hyderabad Campus


The following are some important representational data models (DBMS Specific)

1. Network Model: Though the basic structure is a record,


the relationships are captured using links.
The database can be seen as an arbitrary network of records connected by links.
Ex.: GE’s Integrated Data store (IDS), in Early 1960s
2. Hierarchical Model: The records containing data are organized
as a collection of trees. Ex.: IBMs IMS (Information Management System),
in late 1960s
3. Relational Model: (early 1970s)Data & relationships are captured as tables & keys.
Ex.: Oracle, IBMs DB2, MySQL, Informix, Sybase, MS Access, Ingress, MySQL etc.
The basic storage structure is a record.
4. Object Data Model: Objects created through object–oriented programs
can be stored in database.
Ex.: Object Store
5. Object Relational Model: Objects can be stores in tables.
Ex.: Oracle, Informix

BITS Pilani, Hyderabad Campus


Database Schema
Database Schema: Description of a database is called as database Schema

Three-Schema Architecture
A database can be described using three different levels of abstractions.
Description at each level can be defined by a schema. For each abstraction we
focus on one of the specific issues such as user views, concepts, storage etc.

1. External schema: Used to describe the database at external level.


Also described in terms of the data model of that DBMS. This allows data
access to be customized at the level of individual users/groups/applications.
Any external schema has one or more views and relations from the conceptual
schema. This schema design is guided by end user requirements.
2. Conceptual schema (logical schema) Describes the stored data in terms of the
data model specific to that DBMS. In RDBMS conceptual schema describes
all relations that are stored in the database. Arriving at good choice of
relations, fields and constraints is known as conceptual database design.
3. Physical schema: Describes the physical storage strategy for the database.

BITS Pilani, Hyderabad Campus


Three Schema Architecture

External Schema 1 External Schema 2 External Schema 3

External Level

Conceptual Level Conceptual Schema

Physical Schema
Physical/Internal
Level

Storage

Three schema architecture of


DBMS

BITS Pilani, Hyderabad Campus


Data Independence

Data Independence:
The three-level architecture which is the result of the three-level
abstraction on database, leads to data independence.

1. Logical data independence: changes in conceptual level schema


should not affect the application level or external level schemas.

2. Physical data independence: The changes in physical features of


storage, i.e., changes to the physical storage format should not affect
schema at conceptual level.

The above data independence is one of the important advantages of


DBMS.

The DBMS stores the description of schemas as System catalog.


BITS Pilani, Hyderabad Campus
Queries
Queries in RDBMS
The ease with which information can be obtained from a database
often determines its value to the user.

RDBMS allows users to pose a rich class of questions in the form of


queries.

Relational data model has powerful query languages:

Formal query languages (based on strong mathematical logic)


Relational algebra
Relational Calculus
Commercial query language
SQL

BITS Pilani, Hyderabad Campus


Transactions

Transaction Management
A transaction is a collection of operations that perform a single logical
operation or function in a database application.

Each transaction is a unit of atomicity.

A transaction is an atomic unit of work that is either completed in its


entirety or not done at all.

Concurrent Transactions

Incomplete Transactions and system crash


For recovery purposes, the system needs to keep track of
when the transaction starts, terminates, and commits or aborts.

BITS Pilani, Hyderabad Campus


DBMS Structure

W e b fo r m A p p lic a t io n SQL
fr o n t e n d in te r fa c e

SQL
C om m and

Q u e r y E n g in e

T r a n s a c t io n
M anager
R e c o v e ry
B u ffe r / D is k / F ile M anager
L ock M anager
M anager

C o n c u rre n c y
c o n tro l m a n a g e r

DBMS

In d e x file s
/s y s te m
c a ta lo g /d a ta
b lo c k s

BITS Pilani, Hyderabad Campus


People who work with DBMS

 Database Implementers
 End users
 Application Programmers
 Database administrator (DBA)

DBA’s role:
1. Design of physical & Conceptual schemas
2. Security and authorization
3. Data availability , recovery and backup
4. Database tuning- modifying the schemas to meet the
requirements

BITS Pilani, Hyderabad Campus


Summary
 How data is described in a DBMS
 What is a data model
 What is a schema
 What is three schema architecture of a DBMS
 What is data independence
 Queries and Query languages
 Transaction management
 Components of a DBMS
 People working with DBMS

BITS Pilani, Hyderabad Campus


Contents

1. Mapping ER to Relational model

2. Mapping EER to Relational model

1 CSC 352/ISC 332 Dr.R.Gururaj BITS Pilani, Hyderabad Campus


ER-to- Relational Mapping

Step 1: Mapping of Regular Entity Types.


 For each regular (strong) entity type E in the ER schema, create
a relation R that includes all the simple attributes of E.
 Choose one of the key attributes of E as the primary key for R.
 If the chosen key of E is composite, the set of simple attributes
that form it will together form the primary key of R.

Example: We create the relations EMPLOYEE, DEPARTMENT, and


PROJECT in the relational schema corresponding to the regular
entities in the ER diagram.

 SSN, DNUMBER, and PNUMBER are the primary keys for the
relations EMPLOYEE, DEPARTMENT, and PROJECT as shown.

2 CSC 352/ISC 332 Dr.R.Gururaj BITS Pilani, Hyderabad Campus


M int LNam e
FN a m e Num ber
Add ress W o rk s_fo r

Se x (1, 1 ) (4, N ) N am e
L oca tio n
N am e Salar y
Em ploye e d ep artm ent

EM PL O Y EE Num ber of D EPA R T ME N T


Ssn E mp lo yee
(0 , 1 ) Start Date
Bd a te M a na ge r (1 , 1 )
Co ntro llin g (0 , N)
(0, N ) de partm en t
M A N A GE S
(0, 1 )
S up erviso r Su pe rv iso r H o urs
C O N T R O LS

S U P ER V IS IO N
(0, N) W O R K S_O N
C on trolled
Em p loye e (1 , 1 )
(1 , N ) proje ct
P ro ject
D E PE N D E N T S_ OF
PR O JE C T

Nam e

D e pe nd ent N u m be r
(1, 1)
Lo ca tio n

D EP EN D EN T

R e la tio ns hi p
Nam e Se x B d ate

CSC 352/ISC 332 Dr.R.Gururaj BITS Pilani, Hyderabad Campus


3 CSC 352/ISC 332 Dr.R.Gururaj BITS Pilani, Hyderabad Campus
Step 2: Mapping of Weak Entity Types
 For each weak entity type W in the ER schema with owner entity type
E, create a relation R & include all simple attributes (or simple
components of composite attributes) of W as attributes of R.
 Also, include as foreign key attributes of R the primary key attribute(s)
of the relation(s) that correspond to the owner entity type(s).
 The primary key of R is the combination of the primary key(s) of the
owner(s) and the partial key of the weak entity type W, if any.

Example: Create the relation DEPENDENT in this step to


correspond to the weak entity type DEPENDENT.
 Include the primary key SSN of the EMPLOYEE relation as a foreign key
attribute of DEPENDENT (renamed to ESSN).
 The primary key of the DEPENDENT relation is the combination {ESSN,
DEPENDENT_NAME} because DEPENDENT_NAME is the partial key of
DEPENDENT.

4 CSC 352/ISC 332 Dr.R.Gururaj BITS Pilani, Hyderabad Campus


Step 3: Mapping of Binary 1:1 Relation Types

For each binary 1:1 relationship type R in the ER schema, identify the relations S
and T that correspond to the entity types participating in R.

There are three possible approaches:


1. Foreign Key approach: Choose one of the relations-say S-and include a foreign
key in S that refers to the primary key of T. It is better to choose an entity type
with total participation in R in the role of S.
• Example: 1:1 relation MANAGES is mapped by choosing the participating
entity type DEPARTMENT to serve in the role of S, because its
participation in the MANAGES relationship type is total.
2. Merged relation option: An alternate mapping of a 1:1 relationship type is
possible by merging the two entity types and the relationship into a single
relation. This may be appropriate when both participations are total.
3. Cross-reference or relationship relation option: The third alternative is to set
up a third relation R for the purpose of cross-referencing the primary keys of
the two relations S and T representing the entity types.

5 CSC 352/ISC 332 Dr.R.Gururaj BITS Pilani, Hyderabad Campus


Step 4: Mapping of Binary 1:N Relationship Types.
 For each regular binary 1:N relationship type R, identify the
relation S that represent the participating entity type at the N-
side of the relationship type.
 Include as foreign key in S the primary key of the relation T
that represents the other entity type participating in R.
 Include any simple attributes of the 1:N relation type as
attributes of S.
Example: 1:N relationship types WORKS_FOR,
CONTROLS, and SUPERVISION in the figure.
 For WORKS_FOR we include the primary key DNUMBER of
the DEPARTMENT relation as foreign key in the EMPLOYEE
relation and call it DNO.

6 CSC 352/ISC 332 Dr.R.Gururaj BITS Pilani, Hyderabad Campus


M int LNam e
FN a m e Num ber
Add ress W o rk s_fo r

Se x (1, 1 ) (4, N ) N am e
L oca tio n
N am e Salar y
Em ploye e d ep artm ent

EM PL O Y EE Num ber of D EPA R T ME N T


Ssn E mp lo yee
(0 , 1 ) Start Date
Bd a te M a na ge r (1 , 1 )
Co ntro llin g (0 , N)
(0, N ) de partm en t
M A N A GE S
(0, 1 )
S up erviso r Su pe rv iso r H o urs
C O N T R O LS

S U P ER V IS IO N
(0, N) W O R K S_O N
C on trolled
Em p loye e (1 , 1 )
(1 , N ) proje ct
P ro ject
D E PE N D E N T S_ OF
PR O JE C T

Nam e

D e pe nd ent N u m be r
(1, 1)
Lo ca tio n

D EP EN D EN T

R e la tio ns hi p
Nam e Se x B d ate

7 CSC 352/ISC 332 Dr.R.Gururaj BITS Pilani, Hyderabad Campus


Step 5: Mapping of Binary M:N Relationship Types.
 For each regular binary M:N relationship type R, create a new relation
S to represent R.
 Include as foreign key attributes in S the primary keys of the relations
that represent the participating entity types; their combination will form
the primary key of S.
 Also include any simple attributes of the M:N relationship type (or simple
components of composite attributes) as attributes of S.
Example: The M:N relationship type WORKS_ON from the ER
diagram is mapped by creating a relation WORKS_ON in the
relational database schema.
 The primary keys of the PROJECT and EMPLOYEE relations are included as
foreign keys in WORKS_ON and renamed PNO and ESSN, respectively.
 Attribute HOURS in WORKS_ON represents the HOURS attribute of the
relation type. The primary key of the WORKS_ON relation is the
combination of the foreign key attributes {ESSN, PNO}.

8 CSC 352/ISC 332 Dr.R.Gururaj BITS Pilani, Hyderabad Campus


Step 6: Mapping of Multivalued attributes.
 For each multivalued attribute A, create a new relation R.
 This relation R will include an attribute corresponding to A, plus the
primary key attribute K-as a foreign key in R-of the relation that
represents the entity type of relationship type that has A as an attribute.
 The primary key of R is the combination of A and K. If the multivalued
attribute is composite, we include its simple components.
Example: The relation DEPT_LOCATIONS is created.
 The attribute DLOCATION represents the multivalued attribute
LOCATIONS of DEPARTMENT, while DNUMBER-as foreign key-represents
the primary key of the DEPARTMENT relation.
 The primary key of R is the combination of {DNUMBER, DLOCATION}.

9 CSC 352/ISC 332 Dr.R.Gururaj BITS Pilani, Hyderabad Campus


Step 7: Mapping of N-ary Relationship Types.
 For each n-ary relationship type R, where n>2, create a new
relationship S to represent R.
 Include as foreign key attributes in S the primary keys of the
relations that represent the participating entity types.
 Also include any simple attributes of the n-ary relationship type
(or simple components of composite attributes) as attributes of
S.
Example: The relationship type SUPPY in the ER on the next
slide.
 This can be mapped to the relation SUPPLY shown in the relational
schema, whose primary key is the combination of the three foreign
keys {SNAME, PARTNO, PROJNAME}

10 CSC 352/ISC 332 Dr.R.Gururaj BITS Pilani, Hyderabad Campus


11 CSC 352/ISC 332 Dr.R.Gururaj BITS Pilani, Hyderabad Campus
EER-to- Relational Mapping

13 CSC 352/ISC 332 Dr.R.Gururaj BITS Pilani, Hyderabad Campus


EER-to- Relational Mapping

Step8: Options for Mapping Specialization or Generalization.


Convert each specialization with m subclasses {S1,
S2,….,Sm} and generalized superclass C, where the
attributes of C are {k,a1,…an} and k is the (primary) key,
into relational schemas using one of the four following
options:
• Option 8A: Multiple relations-Superclass and subclasses
• Option 8B: Multiple relations-Subclass relations only
• Option 8C: Single relation with one type attribute
• Option 8D: Single relation with multiple type attributes

14 CSC 352/ISC 332 Dr.R.Gururaj BITS Pilani, Hyderabad Campus


• Option 8A: Multiple relations-Superclass and subclasses
Create a relation L for C with attributes Attrs(L) = {k,a1,…an} and PK(L)
= k. Create a relation Li for each subclass Si, 1 < i < m, with the
attributesAttrs(Li) = {k} U {attributes of Si} and PK(Li)=k. This option
works for any specialization (total or partial, disjoint of over-lapping).
• Option 8B: Multiple relations-Subclass relations only
Create a relation Li for each subclass Si, 1 < i < m, with the attributes
Attr(Li) = {attributes of Si} U {k,a1…,an} and PK(Li) = k. This option only
works for a specialization whose subclasses are total (every entity in
the superclass must belong to (at least) one of the subclasses).

15 CSC 352/ISC 332 Dr.R.Gururaj BITS Pilani, Hyderabad Campus


Option 8A: Multiple
relations-Superclass
and subclasses

16 CSC 352/ISC 332 Dr.R.Gururaj BITS Pilani, Hyderabad Campus


Option 8B: Multiple
relations-Subclass
relations only

17 CSC 352/ISC 332 Dr.R.Gururaj BITS Pilani, Hyderabad Campus


Option 8C: Single
relation with
one type attribute

EMPLOYEE

Ssn Fname Minit Lname Birthdatae Address jobtype Typing speed Tgrade EngType

18 CSC 352/ISC 332 Dr.R.Gururaj BITS Pilani, Hyderabad Campus


Option 8D: Single
relation with multiple
type attributes

EMPLOYEE

Part_no Descr Mflag Drawing_no Batch_no Man_date Pflag supp_name list_price

19 CSC 352/ISC 332 Dr.R.Gururaj BITS Pilani, Hyderabad Campus


ER-Relational mapping for
Company Database

BITS Pilani, Hyderabad Campus


Exercise 1

Explain how you would map the following EER/ER


Constructs to Relational model. Give simple
examples.

Mapping specialization.

Mapping 1:1 binary relationship, where one entity


type has total participation, and the other entity
type has partial participation.

Mapping Complex attribute of an entity type.

Ternary (3-ary) relationship.


20 CSC 352/ISC 332 Dr.R.Gururaj BITS Pilani, Hyderabad Campus
21 CSC 352/ISC 332 Dr.R.Gururaj BITS Pilani, Hyderabad Campus
Summary

We learnt ER/EER to relational mapping.

24 CSC 352/ISC 332 Dr.R.Gururaj BITS Pilani, Hyderabad Campus


Database Management Systems

BITS Pilani Dr.R.Gururaj


CS&IS Dept.
Hyderabad Campus
Lecture Session-3
Conceptual Database Design (ER Modeling)

Content

 Steps in Database Design Process


 ER Concepts (Entities, Attributes, Associations etc.)
 ER Notations
 Class Hierarchies
 Conceptual modeling using UML

BITS Pilani, Hyderabad Campus


Major Steps in Database
Design Process

Requirement analysis
Understanding the domain
Identifying the data to be stored
Identifying the constraints
Conceptual Database design
E-R modeling/UML
Logical Database Design
Designing tables and relationships
Refinement of schema
Physical database design
Indexing
Clustering
 Storage formats

BITS Pilani, Hyderabad Campus


ER Modeling
ER Model is a popular high-level (conceptual) data model.
It is an approach to designing Semantic Conceptual schema of a Database.
ER model allows us to describe the data involved in a real-world environment in
terms of objects and their relationships, which are widely used in design of
database.
ER model provides preliminary concepts or idea about the data representation
which is later modified to achieve final detailed design.

Important concepts/notions used in ER modeling are-

Entity is an object in real-world or some idea or concept which can be


distinguished from other objects.
Ex.: person, school, class, department, weather, salary, temperature etc.
Entity has independent existence.

Each entity belongs to an Entity type that defines the structure.

Entity Set is a Collection of similar objects.


BITS Pilani, Hyderabad Campus
Concepts used in ER
Attribute: reflects a property of an object or entity. We have following
types of attributes.
> Simple attribute
> Composite attribute
> Single valued attribute
> Multi-valued attribute
> Derived attribute
> Stored attribute
Candidate Key (simply called a key): Is an Attribute of an entity type
whose value can uniquely identify an entity in a set.

Primary key: one of the candidate keys can become PK of an entity


type.

Alternate keys: The candidate keys other than the PK, are known as
alternate keys.

BITS Pilani, Hyderabad Campus


Concepts used in ER

Relationship: The association between entities is known as


relationship.
Domain of an attribute: The set of possible values is known as domain
of an attribute

BITS Pilani, Hyderabad Campus


Notations used in ER
Notations used in ER modeling are shown below.

Entity Type

Weak Entity Type

Relationship Type

Identifying Relationship type

Attribute

BITS Pilani, Hyderabad Campus


Notations used in ER

Key Attribute

Multivalued Attribute

Composite Attribute

Derived Attribute

BITS Pilani, Hyderabad Campus


Notations used in ER

R Total Participation of E1 in R
E1 E2

Cardinality ratio 1; N for E1; E2 in


1 N
E1 R E2 R

(min, max) Structural Constraint (min, max)


R E on Participation of E in R

BITS Pilani, Hyderabad Campus


Relationships in ER
Relationships

Manager Manages Employee

Manages
Manager Employee

Degree of a Relationship
Manages
• If there are two entity types involved it is a binary relationship type Manager Employee

• If there are three entity types involved it is a ternary relationship type


Sales Sells
Product
Assistant

• Unary relationships are also known as a recursive relationship Customer


Manages

Employee

• It is possible to have n-ary relationship (e.g. quaternary or unary)

BITS Pilani, Hyderabad Campus


Relationships in ER
Cardinality of a relationship

Relationships are rarely one-to-one.


For example, a manager usually manages more than one employee.
This is described by the cardinality of the relationship, for which there are four possible categories.

1 is married to 1
One to one (1:1) relationship Man Women

1 Manages m
Manager Employee
One to many (1:M) relationship

m 1
Studies
Many to one (M:1) relationship Student Course

m n
Teaches
Many to many (M:N) relationship Lecturer Student

BITS Pilani, Hyderabad Campus


Relationships in ER
Participation Constraint

If all the entities of an entity type are involved in the relationship then that entity type’s
involvement said to be total in that relationship. In the below relationship if each employee is
associated with at least one dept. Then the participation of EMP is total. Here, EMP works for
DEPT.
If, only few entities of the set are involved the participation is partial.

E1 R E2

EMP DEPT Association Role:


Worker Employer

* 1
Multiplicity
EMP DEPT

Association Name & Direction:


Works_for
EMP DEPT

BITS Pilani, Hyderabad Campus


ER Diagram for the Company DB
schema, with all role names
Mint LName
FName Number
Address Works_for

Sex (1, 1) (4, N) Name


Location
Name Salary
Employee department

EMPLOYEE Number of DEPARTMENT


Ssn Employees
(0, 1) Start Date
Bdate Manager (1, 1)
Controlling (0, N)
(0, N) department
MANAGES
(0, 1)
Supervisor Supervisor Hours
CONTROLS

SUPERVISION (0, N) W ORKS_ON


Controlled
Employee (1, 1)
(1, N) project
Project
DEPENDENTS_OF
PROJECT

Name

Dependent Number
(1, 1)
Location

DEPENDENT

Relationship
Name Sex Bdate

BITS Pilani, Hyderabad Campus


Class Hierarchies
Some times it is natural to classify entities in a set into subclasses.

eid name age

Employee

IS A
No_Hrs Cid

Hourly_Emp Contract_Emp

Specialization : Employee is specialized into Hourly_emp and Contract_emp

Generalization: Hourly_emp and Contract_emp are generalized by Employee

BITS Pilani, Hyderabad Campus


UML for Conceptual data
modeling
We can model a database at conceptual level using UML.
UML constructs can be drawn as diagrams.
It encompasses broader spectrum of software design process than
ER modeling.
We can do:
 Business modeling (describe the business process involved in the SW)
 System modeling (specify requirements)
 Conceptual database modeling (like ER)
 Physical DB modeling (model indexes and table spaces)
 Hardware System modeling (describe hardware system configuration)

Class diagrams can be used to describe the database at conceptual level, like ER
diagrams.

BITS Pilani, Hyderabad Campus


Summary
 Various steps in database design process
 What is ER modeling
 Concepts and notations used in ER
 Class hierarchies in ER
 Use of UML for Conceptual database modeling

BITS Pilani, Hyderabad Campus


Database Management Systems

BITS Pilani Dr.R.Gururaj


CS&IS Dept.
Hyderabad Campus
Lecture Session-4
Relational Data Model & Relational Constraints

Content

1. What is Relational model


2. Characteristics
3. Relational constrains
4. Representation of schemas

BITS Pilani, Hyderabad Campus


Relational Model
Edgar Codd proposed Relational Data Model in 1970.
It is a representational or implementation data model.

Using this representational (or implementation) model we


represent a database as collection of relations.

The notion of relation here is different from the notion of


relationship used in ER modeling.

Relation is the main construct for representing data in


relational model.
Every relation consists of a relation schema and Relation
instance.

BITS Pilani, Hyderabad Campus


R (A1, A2, A3,……,An)
Relation Schema is denoted by
Relation name Attribute list

The number of columns in a relation is known as its degree or arity’.

Relation instance or Relation State (r) of R (thought of as a table)


Each row in the table represents a collection of related data.
Each row contains facts about some entity of some entity-set.

R = (A1, A2, A3,……., An)


r(R) is a set of n tuples in R
r = {t1, t2, t3,…….,tn}
r is an instance of R each t is a tuple and is a ordered list of values.

t = (v1 , v2 ,…, vn ) where vi is an element of domain of Ai

BITS Pilani, Hyderabad Campus


Entities of each type/set are stored as rows in a single relation.

Hence in general, a relation corresponds to a single entity type in


ER.
In some cases a relationship between two entities can have some
specific attributes which can be captured in a relation (table).

A row is called a tuple.


The columns of the table represent attributes of that entity-set.

The column header is known as attribute or field.


Data type or format of an attribute: is the format of data for that
attribute. Ex. Character strings, numeric, alphanumeric etc.

Values that can appear in a column for any row is called the
domain of that attribute.
BITS Pilani, Hyderabad Campus
Relational Database Schema is denoted by

S ={R1, R2, ……,Rn)

Database Relations in the


name database (tables)

BITS Pilani, Hyderabad Campus


Attribute A of relation R is accessed by notation- R.A.

Ex: Student (name, age, branch). Here Student is the relation name.
Student.age - denotes age attribute of Student relation.

Characteristics of a Relation:

Ordering of tuples is not significant.

Ordering of values in a tuple is important.

Values in a tuple under each column must be atomic (simple & single).

BITS Pilani, Hyderabad Campus


BITS Pilani, Hyderabad Campus
Relational Model Terminology

Informal Terms Formal Terms


Table Relation
Column Header Attribute
All possible Column Domain
Values
Row Tuple
Table Definition Schema of a Relation
Populated Table State of the Relation

BITS Pilani, Hyderabad Campus


Relational Constraints

Constraints are restrictions on data of a relation.

Domain level Constrains – Format of data Ex. Character numeric


etc.
Semantic – Not NULL etc.
Entity integrity constraints – Primary key, unique key

Referential integrity constraints– Foreign key

Dependencies –
Functional dependency : What attributes value defines the value of
another attribute is known as dependency.

This concept is used in database design.

BITS Pilani, Hyderabad Campus


Referential integrity
The Referential Integrity constraint is specified between two relations and is used to
maintain consistency among the tuples of two relations. FK
PK
R1 R2
a b c p q r

c in R1 refers to p in R2

The FK attribute R1 has the same domain as the primary key attribute of R2.

The attribute c in R1 is said to reference the attribute p in R2.

The value of FK in a tuple t of R1 either occurs as a value under p in R2 for some


tuple, or is a NULL.

R1  is known as referencing relation


R2 is known as referenced relation

Constraints can be specified while defining the structure & also as triggers.

BITS Pilani, Hyderabad Campus


Relational Schema
Representation

BITS Pilani, Hyderabad Campus


Relational Schema
Representation

BITS Pilani, Hyderabad Campus


Operations on Relations and
constraints
The following table indicates the constraints need to be taken care while
performing certain operations on a relation.

Operation on relations Constraints need to be


taken care
Insert Null, Not Null, PK, unique, FK,
format, Domain

Delete FK

Update Null, Not Null, PK, unique, FK,


domain, and Semantic

BITS Pilani, Hyderabad Campus


Actions need to be taken when FK is set , on operations like
update, insert, and delete. FK
PK
R1 R2
a b c p q r

c in R1 refers to p in R2

If insert a tuple in R1 where the value for c is not in p of R2, then don’t allow.

What if a tuple in R2 is deleted: Cascade, don’t allow, set to default, set to null.

What if update on R2’s p happens:


Cascade, don’t allow, set to default, set to null.

BITS Pilani, Hyderabad Campus


Summary
 What are basics of relational model
 Relation instance
 Relational data constraints
 Referential integrity
 Relational scheme representation

BITS Pilani, Hyderabad Campus


Database Management Systems

BITS Pilani Dr.R.Gururaj


CS&IS Dept.
Hyderabad Campus
Lecture Session-5
ER to Relational Mapping
Content

1. Mapping Regular Entity types


2. Mapping Weak Entity types
3. Mapping 1:1 Relationships
4. Mapping 1:N Relationships
5. Mapping N:M Relationships
6. Mapping Multivalued attributes
7. Mapping ternary relationships
8. Mapping Class Hierarchies

BITS Pilani, Hyderabad Campus


Mapping entity types
1. Mapping of Regular Entity Types.
 For each regular (strong) entity type E in the ER schema, create
a relation R that includes all the simple attributes of E.
 Choose one of the key attributes of E as the primary key for R.
 If the chosen key of E is composite, the set of simple attributes
that form it will together form the primary key of R.

BITS Pilani, Hyderabad Campus


2. Mapping of Weak Entity Types
 For each weak entity type W in the ER schema with owner
entity type E, create a relation R & include all simple
attributes (or simple components of composite attributes)
of W as attributes of R.
 Also, include as foreign key attributes of R the primary key
attribute(s) of the relation(s) that correspond to the owner
entity type(s).
 The primary key of R is the combination of the primary
key(s) of the owner(s) and the partial key of the weak
entity type W, if any.

BITS Pilani, Hyderabad Campus


Mapping Relationship types
3. Mapping of Binary 1:1 Relation Types

For each binary 1:1 relationship type R in the ER schema, identify


the relations S and T that correspond to the entity types
participating in R.

There are three possible approaches:


1. Foreign Key approach: Choose one of the relations-say S-and include a
foreign key in S that refers to the primary key of T. It is better to choose
an entity type with total participation in R in the role of S.
2. Merged relation option: An alternate mapping of a 1:1 relationship
type is possible by merging the two entity types and the relationship
into a single relation. This may be appropriate when both
participations are total.
3. Cross-reference or relationship relation option: The third alternative
is to set up a third relation R for the purpose of cross-referencing the
primary keys of the two relations S and T representing the entity types.

BITS Pilani, Hyderabad Campus


4. Mapping of Binary 1:N Relationship Types.
 For each regular binary 1:N relationship type R, identify the
relation S that represent the participating entity type at the N-
side of the relationship type.
 Include as foreign key in S the primary key of the relation T
that represents the other entity type participating in R.
 Include any simple attributes of the 1:N relation type as
attributes of S.

BITS Pilani, Hyderabad Campus


5. Mapping of Binary M:N Relationship Types.
 For each regular binary M:N relationship type R, create a new
relation S to represent R.
 Include as foreign key attributes in S the primary keys of the
relations that represent the participating entity types; their
combination will form the primary key of S.
 Also include any simple attributes of the M:N relationship
type (or simple components of composite attributes) as
attributes of S.

BITS Pilani, Hyderabad Campus


Mapping Multivalued
attributes
6. Mapping of Multivalued attributes.
 For each multivalued attribute A, create a new relation R.
 This relation R will include an attribute corresponding to A, plus the
primary key attribute K-as a foreign key in R-of the relation that
represents the entity type of relationship type that has A as an attribute.
 The primary key of R is the combination of A and K. If the multivalued
attribute is composite, we include its simple components.

BITS Pilani, Hyderabad Campus


Mapping n-ary relationships

7. Mapping of N-ary Relationship Types.


 For each n-ary relationship type R, where n>2, create a new
relationship S to represent R.
 Include as foreign key attributes in S the primary keys of the
relations that represent the participating entity types.
 Also include any simple attributes of the n-ary relationship type
(or simple components of composite attributes) as attributes of
S.
Example: The relationship type SUPPY in the ER on the next
slide.
 This can be mapped to the relation SUPPLY shown in the relational
schema, whose primary key is the combination of the three foreign
keys {SNAME, PARTNO, PROJNAME}

BITS Pilani, Hyderabad Campus


Mapping Class hierarchies

BITS Pilani, Hyderabad Campus


Mapping Class hierarchies

8. Options for Mapping Specialization or Generalization.


• Option 8A: Multiple relations-Superclass and subclasses
• Option 8B: Multiple relations-Subclass relations only

BITS Pilani, Hyderabad Campus


Option 8A: Multiple
relations-Superclass
and subclasses

BITS Pilani, Hyderabad Campus


Option 8B: Multiple
relations-Subclass
relations only

BITS Pilani, Hyderabad Campus


ER-Diagram for Company
Database
M int LN am e
FN am e N um ber
A ddress W o rks_fo r

Sex (1, 1) (4, N ) N am e


Location
N am e S alary
E m ployee departm ent

EM PLO YE E N um ber of D EP A R T M EN T
Ssn
E m ployee
(0, 1) S tart D ate
Bdate M anager (1, 1)
C ontrolling (0, N )
(0, N ) departm ent
M A N A G ES
(0, 1)
S upervisor S upervisor H ours
C O N T R O LS

S U P E RV IS IO N (0, N ) W O RK S _O N
C ontrolled
E m ployee (1, 1)
(1, N ) project
P roject
D E P E N DE N T S _O F
PR O JEC T

N am e

D ependent N um ber
(1, 1)
Location

D EP EN D E N T

R e la tio nship
N am e Sex B date

BITS Pilani, Hyderabad Campus


ER-Relational mapping for
Company Database

BITS Pilani, Hyderabad Campus


Summary
 We have learnt the rules and guidelines for mapping ER to
Relational model.
 Rules for mapping Entity types
 Rules for mapping Relationships
 Rules for mapping Class hierarchies

BITS Pilani, Hyderabad Campus


Database Management Systems

BITS Pilani Dr.R.Gururaj


CS&IS Dept.
Hyderabad Campus
Lecture Session-6
Relational Algebra & Relational Calculus

Content

 Query languages & Formal query languages for Relational data model
 Introduction to Relational Algebra
 Relational operators
 Set operators
 Join operators
 Aggregate functions
 Grouping operator
 Relational Calculus concepts

BITS Pilani, Hyderabad Campus


Query Languages for
Relational data model
Querying means extracting data from the database for the purpose of
processing it.

Every data model has some formal query languages to support


specification of data retrieval and manipulate requests.

Formal query languages


1. Relational Algebra
2. Relational Calculus
(a) Tuple Relational Calculus
(b) Domain Relational Calculus
Commercial query languages
1. Structured Query Language (SQL)
2. Query by Example (QBE)

BITS Pilani, Hyderabad Campus


Introduction to
Relational Algebra
Relational Algebra is a formal query language for relational data model.

A basic set of relational model operations constitute the relational


algebra.
These operations enable the user to specify basic data retrieval
requests.
The result of a relational algebra query is also a new relation which may
have been formed from one or more relations.

A sequence of relational algebraic operations forms a relational


algebraic expression, whose result is also a relation.

BITS Pilani, Hyderabad Campus


Operations in
Relational Algebra
A. Set Operations

o Union,
o Intersection,
o Difference,
o Cartesian product.

B. Relational Operations

o Select,
o Project,
o join,
o Division etc.

BITS Pilani, Hyderabad Campus


Select Operation: is to select subset of tuples that satisfy some
selection condition.
Symbol used is  (sigma)

Ex:  dno 4 (EMP)


The above expression selects all tuples from EMP table, where the
value of the column ‘dno’ is 4.

The general form of ‘select’ clause is  <select condition> ( R )


Projection Operation:
Selects certain columns. Symbol is ( pi)
 name, age, dno(EMP)
Selects columns name, age, dno for all tuples from the table EMP
BITS Pilani, Hyderabad Campus
Note:
We can apply the expressions in sequence or we can nest them in
single expression.
Ex.:
 name, age  dno=5 ( EMP) 
The above expression selects name and age of employees
working with dno 5.

The above query can also be written as

R1   dno=5 ( EMP )

R2   name,age ( R1 )
R1 & R2 are the names given to intermediate results(relations).

BITS Pilani, Hyderabad Campus


Union:
If two relations R1 & R2 are compatible ( i.e., have same type of tuples)
then we can merge them by union operation.

Duplicate tuples are eliminated. Ex: (R1  R2).

Intersection R1  R2

Only equivalent tuples from R1 & R2 are selected.

Difference R1 – R2

Only those tuples seen in R1 and not seen in R2 are selected.

Note: (R1-R2) is not same as (R2-R1)

BITS Pilani, Hyderabad Campus


 a11 a12  no rows = 3 Rows=3
 b11 b12 
R1  a21 a22  no Columns -2 Column=2
R2 b21 b22 
 a31 a32  b31 b32 

(R1  R2) =  a11 a12 b11 b12 


a a12 b21 b22 
 11
 a11 a12 b31 b32 
 
 a21 a22 b11 b12 
No of rows = 3  3 = 9  a21 a22 b21 b22 
 
No Columns = 2 + 2 = 4  a21 a22 b31 b32 
a a32 b11 b12 
 31 
 a31 a32 b21 b22 
a a32 b31 b32 
 31
Cross product or Cartesian product

BITS Pilani, Hyderabad Campus


Rename operator

(rho)

Ex:  S (b ,b ,b ) ( R)
1 2 3

Renames R to S and new names of attributes are b1, b2,


b3

 S ( R)

Renames R to S with same attribute names

BITS Pilani, Hyderabad Campus


Division ()
Used when we want to check the meeting of all the criteria

Let R(A, B) and S(A) TRS

Selects all values for B column in R which contains all values under A
in S.
Hence the no. column in T is only B.

Join: ( ) Used to join tuples from different tables based on same


condition. Result is new tuple with different arity.

D  DEPT Mgtssn = ssn EMP

Joins tuples from DEPT & EMP where Mgrssn in DEPT is equal to ssn
in EMP and stores the new tuples in relation D.

BITS Pilani, Hyderabad Campus


Theta Join: Joining on some condition with comparison that involves
operator like (=, >, , , ) etc.

Equijoin is a special type of join where the join condition is ‘=’ (equals
operator) only.

Natural Join: is an equijoin on attributes in R and S having the same


name.

In the resulting relation only one column is listed.

Ex. D  DEPT *EMP.

The joining is on common attribute with same name (Dept Name).

BITS Pilani, Hyderabad Campus


Employee Dept Employee * Dept

Name EmpID Dept Name Dept Name Manager Name EmpID Dept Name Manager

Harry 3415 Finance Finance George Harry 3415 Finance George

Sally 2241 Sales Sales Harriet Sally 2241 Sales Harriet

George 3401 Finance George 3401 Finance George

Harriet 2202 Sales Production Charles Hariet 2202 Sales Harriet

BITS Pilani, Hyderabad Campus


Inner Join (R S) An inner join only combines tuples from R and S if
they meet the conditions. Tuples that do not meet the conditions are not
showed in the final result. (This is the usual type of join).
Outer join: An outer join displays the tuples of one of the relations even if
there is no match for the tuple in the other relation.
Left outer join: (R S) In the result relations, in addition to all the
matching tuples from R and S, all the remaining tuples from left side
relation (R) are also showed. For these tuple from R, columns under S will
have null values(padding).
Right Outer Join: (R S) In the result relation, matching tuples will
occur from R & S. In addition all the tuples form S will also appear with
null values for the R attributes.
Full Outer join (R S) In the result, all tuples from R & S will appear
with null values for the other relations attributes.

BITS Pilani, Hyderabad Campus


Additional Relational Operations
Aggregate functions: Sum Average Max Min Count

Grouping:

The tuples of a relation are first grouped by the value of some attribute and
then aggregate functions are applied on individual groups.
Symbol use is – £

Ex. Dno £Count (ssn) (EMP)

The above expression first group the tuples in EMP table based on Dno, and
then applies count function on individual groups this will output no. of
employees in each department.
Dno Count (SSn)
Result relation 

BITS Pilani, Hyderabad Campus


Company Database Schema (set of tables/relations)

BITS Pilani, Hyderabad Campus


1. Get the list of employee IDs who have no dependents.

It is equivalent to:
{ {set of all employees} - {set of employees with
Dependents} }

R1  Π ssn (Employee)
R2  Π essn (Dependent)
Result  R1- R2

BITS Pilani, Hyderabad Campus


2. Get the list of employee IDs who have more than two
dependents.

R1  essn £ COUNT Dependent_name (Dependent)

Result  Π essn (σ COUNT_Depenedent_name>2 (R1))

R1
essn Count_Dependent_name

101 3
102 1

BITS Pilani, Hyderabad Campus


3. Get the list of projects controlled by department with
name “ACCOUNTS”.

R1  σ Dname=‘ACCOUNTS’ (Department)

Result  Π pnumber, pname (Project Dnum=‘Dnumber (R1))

BITS Pilani, Hyderabad Campus


4. Get the list of employee IDs working on all projects

A B B A
a1 b1 b1 a1
R1  Πessn, pno(Works_on) a1 b2 b2 a4
a1 b3 b3
R2  Πpnumber(Project) a2 b2
a2 b3
Result  R1 ÷ R2 a3 b1
a3 b2
a3 b3
a4 b3

BITS Pilani, Hyderabad Campus


5. Find the projects controlled by departments located in
Mumbai.

R1  ΠDnumber(σ Dlocation=‘Mumbai’ (Dept_locations)

R2  Π pnumber, pname (Project Dnum=‘Dnumber (R1))

BITS Pilani, Hyderabad Campus


Tuole Relational Calculus
Relational Calculus is a formal query language for relational model
where we write one declarative expression to specify a retrieval
request and hence there is no description of how to evaluate the query.

A calculus expression specifies what is to be retrieved rather than how


to do it.

Hence, relational calculus is non–procedural language where as


relational algebra discussed in the previous section is procedural,
where we write sequence of operations to retrieve data.

Any expression for data retrieval written in relational algebra can also
be written in relational calculus and vice-versa.

Hence expression power of relational algebra and relational calculus is


same.
BITS Pilani, Hyderabad Campus
Tuple Rational Calculus(TRC) is based on specifying a number of tuple
variables.

Each tuple variable usually ranges over a particular database relation.


Variables can take values of individual tuples from the relation.
A simple relational calculus query is in the form-

{t | condition (t)}

t – tuple variable

condition (t) – is a conditional expression involving t.

Result is a set of all tuples that satisfy the conditions specified in


condition (t).

BITS Pilani, Hyderabad Campus


Ex. Find all employees whose salary is above 50,000

{t | EMP (t) and t. salary > 50,000}

Selects all tuples from EMP such that for each tuple selected, the salary
value is > 50,000.

The expression EMP(t) specifies from where the tuple t must be chosen.

Hence EMP relation in this case is known as a range relation.

Note: The above query retrieves all the attributes of relation EMP.

BITS Pilani, Hyderabad Campus


The universal (), and existential () quantifiers can be
applied to tuples.
Ex.:
{t.name, t age | EMP (t) and (d) (Dept (d)and d.dname =
‘Research’ and d.dno = t.dno )}

To retrieve the name and age of all employees who work


for ‘Research’ department.

If the tuple variable t occurs with  or  quantifiers the


variable is known as bound variable and otherwise called
as free variable.

BITS Pilani, Hyderabad Campus


Safe Relation Calculus Expression

Is one that guarantees to yield a finite set of tuples as result.

Ex. {t | not (EMP (t))}

Is unsafe because it yields all tuples in the universe that are


not in EMP relation, which are infinitely numerous.

An expression is safe if all values in its result are from the


domain of the expression.

BITS Pilani, Hyderabad Campus


Relational Completeness:

This notion is used to compare high level query


languages.

Any relational query language L is considered to be


relationally complete if we can express in L any query that
is expressed in relational algebra (RA) or relational
calculus (RC).

BITS Pilani, Hyderabad Campus


Summary
 What is a query language
 Formal query languages for Relational data model
 Basic concepts of Relational Algebra
 Operations in Relational Algebra
 Relational Calculus
 Examples

BITS Pilani, Hyderabad Campus


Database Management Systems

BITS Pilani Dr.R.Gururaj


CS&IS Dept.
Hyderabad Campus
Lecture Session-7
Structured Query Language (SQL)-1

Content

 Introduction to SQL
 Features of SQL
 DDL Statements
 DML commands

BITS Pilani, Hyderabad Campus


Introduction to SQL

SQL (Structured Query language) is the most widely used


commercial query language for relational databases.

SQL was introduced by IBM(1970).

 The present standard SQL -3 or SQL – 99 was introduced in


1999 by ANSI (American National Standards Institute) and
ISO jointly.

SQL is a user friendly query language.

 Now-a-days almost all relational databases like – Oracle,


MySQL, IBM’s DB2, Informix etc., support SQL.

BITS Pilani, Hyderabad Campus


SQL is a high-level declarative language to specify data retrieval
requests for data stored in relational databases.

 Its declarative because we just specify what to be extracted,


rather than how to do it.

SQL is relationally complete, meaning that any query that is


expressed in relational algebra or calculus can also be written
in SQL.

SQL also supports additional features that are not existing in


formal languages.

SQL is a standard and many vendors implement it in their own


way without deviating from the standard specifications.

BITS Pilani, Hyderabad Campus


Features of SQL
1. DDL (Data Definition Language) Set of commands to support creation,
deletion and modification of table structures and views.

2. DML (Data Manipulation Language) Set of commands to pose queries,


insert new tuples, and update/delete existing tuples.

3. Embedded SQL: Allows users to call SQL code from host languages like
C, C++ & Java.

4. Triggers: Actions executed by the DBMS whenever changes to the


database meet specified conditions. Action to be performed and the set of
conditions can be defined in “Triggers”.

5. Transaction Management: to perform roll-back / commit actions.

6. Indexes: Indexes can be created to speed up the access to data stored in


DB.

BITS Pilani, Hyderabad Campus


DDL Commands
The DDL (Create) statement for creating Employee table.
CREATE TABLE EMPLOYEE(
FNAME VARCHAR(15) NOT NULL,
MINIT CHAR,
LNAME VARCHAR(15) NOT NULL,
SSN CHAR(9),
BDATE DATE,
ADDRESS VARCHAR(30),
SEX CHAR,
SALARY DECIMAL(10, 2),
SUPERSSN CHAR(9),
DNO INT NOT NULL DEFAULT 1,
PRIMARY KEY (SSN),
FOREIGN KEY (SUPERSSN) REFERENCES EMPLOYEE (SSN),
FOREIGN KEY (DNO) REFERENCES DEPARTMENT (DNUMBER)
);

ALTER TABLE EMPLOYEE ADD CONSTRINT EMPFK FOREIGN KEY (DNO)

REFERENCES DEPT(DNUMBER) ON DELETE SET DEFAULT/ SET NULL/CASCADE ON UPDATE


CASCADE;

BITS Pilani, Hyderabad Campus


CREATE TABLE DEPARTMENT(
DNAME VARCHAR(15) NOT NULL,
DNUMBER INT NOT NULL,
MGRSSN CHAR(9) NOT NULL,
MGRSTARTDATE DATE,

PRIMARY KEY (DNUMBER),


UNIQUE (DNAME),
FOREIGN KEY (MGRSSN)REFERENCES EMPLOYEE (SSN));

BITS Pilani, Hyderabad Campus


DROPPING TABLE EMP
Drop table EMP;

Adding New column to EMP

ALTER TABLE EMP ADD ‘CITY’ VARCHAR (20);

TO DROP A COLUMN

ALTER TABLE EMP DROP AGE CASCADE/RESTRICT;

We can also give names to constraints and later use the names
to access those constraints and alter them.

BITS Pilani, Hyderabad Campus


DML Commands

DML (Data Manipulation)

 Selecting tuples, columns (querying)


 Inserting new tuples
 Updating existing tuples
 Deleting existing tuples

BITS Pilani, Hyderabad Campus


Basic Query Statements
SQL has ‘SELECT’ statement for retrieving information from the database.
This SELECT has no relationship with select () operation in relational
algebra. All the queries mentioned here are specified on the COMPANY
database given in Fig. 3.1.

THE SELECT – FROM – WHERE CONSTRUCT:

SELECT < attribute list> // attribute names to be retrieved


FROM < table list > // names of relation involved
WHERE < condition> // Boolean expression to identify the
tuples to be extracted.

BITS Pilani, Hyderabad Campus


Ex. 1
SELECT bdate, address
FROM EMPLOYEE
WHERE Fname = ‘john’;

Retrieves the birthdate & address of the employees whose first name
is ‘John’.
Ex. 2 Join operation
SELECT Fname, Lname, Address
FROM EMPLOYEE, DEPARTMENT
WHERE Dname = ‘Research” and Dnumber = Dno;

Retrieves first name, last name and address from joined truples from
employee and department. The joining condition is Dnumber in
department table is equal to Dno in employee table. We can also do
aliasing (renaming) of tables to avoid ambiguity.

BITS Pilani, Hyderabad Campus


Ex. 3 SELECT E.Fname, S.Fname
FROM EMPLOYEE AS E, EMPLOYEE AS S
WHERE E.superssn = S.ssn;

Retrieves the employee’s first name and his immediate


supervisor’s first name. This is an example of self joining.

Ex. 4 SELECT ssn


FROM EMPLOYEE;

Retrieves all ‘ssn’ from employee table.

BITS Pilani, Hyderabad Campus


Ex. 5

SELECT ssn, dname


FROM EMPLOYEE, DEPARTMENT;

This will retrieve ssn, Dname from the relation which is result of cross
product of employee and department tables.

Ex. 6: SELECT *
FROM EMPLOYEE
WHERE Dno = 5;

The above query will retrieve all the columns from employee table for
the tuples where Dno = 5.

BITS Pilani, Hyderabad Campus


Ex. 7: SELECT ALL salary
FROM EMPLOYEE;

Retrieves all salaries (including duplicates) from employee table.

Ex. 8 SELECT DISTINCT salary


FROM EMPLOYEE;
Retrieves distinct values for ‘salary’ attribute

We also have following operations in SQL


Union (for Union)
Except (for Difference)
Intersect (for Intersection)

Duplicate tuples are eliminated from the result.

BITS Pilani, Hyderabad Campus


Substring Comparisons in SQL
The character ‘%’ replaces an arbitrary number of characters, and ‘_‘
(underscore) replaces a single character.
Ex. 9 To retrieve all employees whose address is in Houston, Texas

SELECT fname
FROM EMPLOYEE
WHERE Address LIKE ‘% Houston, Texas %’;

Ex. 10 To retrieve the resulting salaries if every employee working in the


‘Accounts’ project is given a 10% raise.

SELECT Fname, 1.1* salary


FROM EMPLOYEE, WORKS_ON, PROJECT
WHERE ssn = essn AND pno = pnumber AND pname = ‘Accounts’;

BITS Pilani, Hyderabad Campus


Ex. 11
Retrieve all employees in department 5 whose salary is between
30,000 and 40,000.
SELECT *
FROM EMPLOYEE
WHERE (Salary BETWEEN 30000 AND 40000) and Dno= 5;

Order By:

The default ordering of the result is ascending. We can specify the key
word DESC if we wish a descending order of values.
SELECT Fname, Dno, age
Ex. 12 FROM EMPLOYEE
WHERE salary > 30,0000
ORDER BY Dno;

BITS Pilani, Hyderabad Campus


Summary
 What is SQL
 What are the features supported by SQL
 How to create relational schemas using SQL
 How to specify queries in SQL

BITS Pilani, Hyderabad Campus


Database Management Systems

BITS Pilani Dr.R.Gururaj


CS&IS Dept.
Hyderabad Campus
Lecture Session-8
Structured Query Language (SQL)-2

Content

 Nested queries and correlated nested queries


 Use of EXISTS and NOT EXISTS
 Explicit join operations
 Aggregate functions
 Group by and Having classes
 Insert/ Update / Delete operations
 Views

BITS Pilani, Hyderabad Campus


Nested Queries

Ex.1 Retrieve the name of each employee who has a dependent


with the same name as the employee.

SELECT E.Fname
FROM EMPLOYEE AS E
WHERE E.ssn IN(SELECT ESSN FROM DEPENDENT
WHERE E.FNAME = DEPENDENT_NAME);

Correlated Nested Queries:

Whenever a condition in the WHERE clause of a nested query


references some attribute of a relation declared in the outer query, then
the two queries are said to be correlated.

BITS Pilani, Hyderabad Campus


Use of NOT EXISTS clause
Ex. 2
Retrieve the names, salary of employees who have no dependents

SELECT Fname, Salary


FROM EMPLOYEE
WHERE NOT EXISTS (SELECT * FROM DEPENDENT WHERE SSN
= ESSN);

We can also use ‘EXISTS’ to check the existence of at least one tuple
in the result.

It is also possible to use an explicit set of values in the WHERE –


clause.
We can also check whether a value is NULL

BITS Pilani, Hyderabad Campus


Renaming Attributes in the Result
Ex. 3
SELECT name AS Emp_name
FROM EMPLOYEE
WHERE Dno = 5;

BITS Pilani, Hyderabad Campus


Join Operation
We can also perform
Join – using key word ‘JOIN’
Natural join – using key word ‘NATURAL JOIN’
Left outer join – using key word ‘LEFT OUTER JOIN’
Right outer join – using key word ‘RIGHT OUTER JOIN’

Aggregate Functions and Grouping


COUNT
SUM
MAX
MIN
AVG

BITS Pilani, Hyderabad Campus


Ex. 4 SELECT SUM (Salary), AVG (Salary) from EMPLOYEE;

Ex. 5 To retrieve number of rows in Employee table


SELECT count (*)
FROM EMPLOYEE;

Ex. 6 Retrieve the name of employees who have two or more dependents

SELECT Fname
FROM EMPLOYEE
WHERE (SELECT COUNT (*) FROM DEPENDENT WHERE SSN
= ESSN) > = 2;

BITS Pilani, Hyderabad Campus


Group by

Ex. 7 For each department retrieve the department number and no of


employees.
SELECT dno, count (*)
FROM EMPLOYEE
GROUP BY Dno;
Group by and Having clause

Ex. 8 Retrieve the department number and no of employees for the


departments which have more than 5 employees working for it.

SELECT dno, count (*)


FROM EMPLOYEE
GROUP BY Dno
HAVING count(*)>5;
BITS Pilani, Hyderabad Campus
INSERT operation
For Inserting a new tuple into the relation
General Form
INSERT INTO <table name>
VALUES(v1, v2, v3,………….vn);

Ex. 9 INSERT INTO DEPARTMENT


VALUES(‘MARKETING’,10, 103, ‘2000-06-25’);
Deleting a tuple

Ex. 10 DELETE FROM <table name>


WHERE <condition>;

Ex. 11 DELETE FROM DEPARTMENT


WHERE dnumber=10;
If we don’t specify the condition all tuples are deleted.
BITS Pilani, Hyderabad Campus
Update command

Ex. 12 UPDATE EMPLOYEE


SET salary = 60000
WHERE ssn = 141;

Updates tuples in Employee table for the tuples with ssn = 141, sets
the value of the attribute salary to 60,000

BITS Pilani, Hyderabad Campus


Views in SQL

A view in SQL is a single table that is derived from other tables.

These other tables are known as base tables.

A view does not necessarily exist in physical form, it can be considered


as a virtual table.

The tuples of base tables are actually stored in database.

This limits the updates on views.

In fact when a view is updated, the corresponding base tables are the
structures which are to be updated.

This makes update operations on views complex.

BITS Pilani, Hyderabad Campus


Creating View
CREATE VIEW EMP_DETAILS
AS SELECT name, salary, dname, age, dloc
FROM EMPLOYEE, DEPARTMENT
WHERE dno = dnumber;

Whenever the view definition is executed, the new temporary table is


generated with specified attributes from specified base tables.

View definitions are stored in database, not the result of the view.
From then onwards view can be seen as a table and queries can be
posed on it.

BITS Pilani, Hyderabad Campus


Ex. SELECT name, dname FROM EMP_DETAILS
WHERE dno = 5;

Here EMP_DETAILS is a view. Where this query is executed, first the


view definition for EMP_DETAILS is executed and the select and
where operation are performed on the temporary table.

BITS Pilani, Hyderabad Campus


Note:

• A view is always up to date.


• Updates are generally not possible on views.
• Meant for querying only.
• Some times it is possible to store views for some
duration.
• Those views are known as materialized views.

BITS Pilani, Hyderabad Campus


Example SQL statements

EMPLOYEE

FNAME MINIT LNAME SSN BDATE ADDRESS SEX SALARY SUPERSSN DNO

DEPARTMENT
DNAME DNUMBER MGRSSN MGRSTARTDATE

DEPT_LOCATIONS

DNUMBER DLOCATION

PROJECT

PNAME PNUMBER PLOCATION DNUM

WORKS_ON

ESSN PNO HOURS

DEPENDENT

ESSN DEPENDENT_NAME SEX BDATE RELATIONSHIP

BITS Pilani, Hyderabad Campus


1. Get the list of employee IDs who have no dependents.

select ssn
from Employee
where ssn NOT IN ( select essn
from Dependent
);

(select ssn from Employee)


except
(select essn from Dependent);

BITS Pilani, Hyderabad Campus


2. Get the list of employee IDs who have more than two
dependents.

select essn
from Dependent
group by essn
having count(*) > 2;

BITS Pilani, Hyderabad Campus


3. Get the list of projects controlled by department with
name “ACCOUNTS”.

select pnumber, pname


from Projects
where Dnum IN ( select dnumber
from Department
where Dname=‘ACCOUNTS’);

select pnumber, pmname


from Project, Department
where Dnum=Dnumber AND Dname= ‘ACCOUNTS’;

BITS Pilani, Hyderabad Campus


4. Get the list of employee IDs working on all projects

Select essn
From Works_on
Group By essn
Having COUNT(*) = (select COUNT(*) from project);

select E.essn
from Works_on as E
where ((select pno from Works_on where essn=E.essn)
contsins
(select pnumber from Project));
BITS Pilani, Hyderabad Campus
5. Find the projects controlled by departments located in
Mumbai.

select pnumber, pname


from project
where dnum = (select dnumber
from Dept_locations
where Dlocation=‘Mumbai’);

BITS Pilani, Hyderabad Campus


6. Update the salary of those employees working with
department- HR , to Rs. 20000

update Employee
set salary=20000
where dnum = (select dnumber
from Department
where Dname=‘HR’);

BITS Pilani, Hyderabad Campus


7. Delete the records of employees who get salary less than
5000.

delete
from Employee
where salary < 5000;

delete
from Employee;

BITS Pilani, Hyderabad Campus


Summary
 How to write nested queries in SQL
 Writing queries using the clauses EXISTS, NOT XISTS, BETWEEN
AND, IN, NOT IN
 How to perform explicit JOIN operations
 How to use GROUP BY and HVING
 The concept of views in SQL
 Some examples on SQL

BITS Pilani, Hyderabad Campus


Database Management Systems

BITS Pilani Dr.R.Gururaj


CS&IS Dept.
Hyderabad Campus
Lecture Session-9
Schema Refinement -1

Content

 Introduction to Schema Refinement


 Functional Dependencies
 Inference Rules
 Normalization
 Normal Forms (1NF and 2NF)

BITS Pilani, Hyderabad Campus


Schema Refinement (Database Design)

All database applications have certain constraints that must hold for the
data.

These set of constraints help to make the system to accept correct and
valid data.

A DBMS must provide facilities for defining and enforcing these


constraints.

Types of Integrity Constraints for Relational data model


Domain constraints – Data type, Null, Check for certain range.

Entity constraints – Primary key and Unique key

Referential integrity – Foreign key

BITS Pilani, Hyderabad Campus


A good database design practice is essential to develop
good relational schemas at logical level.

Good database design is needed for:

Clarity in understanding the database and

To formulate good queries

This is achieved by schema refinement performed on


the conceptual schema which is the result of mapping
high-level conceptual schema(ER) to Data model
specific conceptual schema(relational schema)
BITS Pilani, Hyderabad Campus
Functional Dependencies

Functional Dependency is a constraint between two sets


of attributes from the database.

If a relational database schema has n attributes A1, A2,


A3,….., An, then think of it as a universal database schema
R = {A1, A2, A3,……A4}.

This is not a real table, this is conceptual for developing


formal theory of data dependencies.

BITS Pilani, Hyderabad Campus


Function Dependency
Denoted by X  Y between two sets of attributes in R,
and specifies a constraint on the possible tuples that can
form a relational instance r of R.
Values of Y component are determined by X component.
(or) Y is functionally dependent on X.
Thus, X functionally determines Y in a relation schema R if
and only if whenever two tuples of r(R) agree on their X
values they must necessarily agree on their Y values, but
Y  X is not true (need not be)
Ex: ssn  ename; {ssn, pnumber}  Hours

Note: FDs cannot be inferred. They should be defined by someone


who knows the semantics of the database very well.
BITS Pilani, Hyderabad Campus
Diagrammatic Notation

Department {Dnumber Dname, Mgrssn,mgrstartdate}


Mgrstubdate}
Dname Dnumber Mgrssn Mgrstartdate

Work_on {Essn, pno}  Hours

Essn Pno Hours

BITS Pilani, Hyderabad Campus


Inference rules for FDs
If F denotes a set of FDs, we can infer some new FDs from specified
FDs, set of all possible functional dependencies is called as closure of
F and denoted as (F+ ).

If F = { ssn {Ename, address, dnumber},


Dnumber  {Dname, dlocation}
}

We can infer new FDs as below


ssn  {Dname, Dlocation}
ssn ssn
Dnumber  Dname

If X  Z we can say that XY  Z.


BITS Pilani, Hyderabad Campus
Inference Rules for FDs

Rule 1 (1R1): (Reflexing)


If X  Y then X  Y otherwise non trivial
Rule 2 (1R2) (Argumentation)
X  Y; then XZ  YZ
Rule 3 (1R3) (Transitive)
XY ; Y  Z; Then X  Z;
Rule 4 (IR4) (Decomposition or projective rule)
X  YZ then X  Y; & X  Z;
Rule 5 (IR5) (union rule)
X  A; XB ; then X  AB
Rule 6 (IR6) (Pseudo transitive)
XY ; WY  Z; then WX  Z;

We can find the clusure F+ of F, by repeated application of rules IR-1 to


IR-3. These rules are called as Armstrong’s Inference rules.

BITS Pilani, Hyderabad Campus


Equivalence of sets of FDs

F covers E if every FD in E is in F+
F and E are equivalent if E+ = F+

A set of FDs F is minimal if it satisfies the following conditions.

 Every dependency in F has a single attribute for its RHS.

 We can’t replace any dependency X  A in F with any dependency Y  A


where Y is proper subset of X, and still have a set of dependencies that
are equivalent to F.

 We can’t remove any dependency from F and still have equivalent FD to F.

BITS Pilani, Hyderabad Campus


Normalization & Normal forms

Normalization process is first proposed by


Raymond Boyce and Edgar Codd in 1972.

Normalization of data – is the process of analyizing relation


schemas based on their FDs and PKs/Keys to achieve the
desirable properties of –(i)Minimal redundancy (ii)Minimal
anomalies

BITS Pilani, Hyderabad Campus


In the process of normalization, unsatisfactory relations
that do not meet the requirements are de-composed into
smaller relations.

Every level of NF need to satisfy certain conditions.

Normal Form (NF) of a relation refers to highest NF


condition that it satisfies.

BITS Pilani, Hyderabad Campus


Schema Refinement (Database Design) encompasses
(i) Normalization – bringing the database to the desired level of NF
(ii) Checking for other desired properties like –
Lossless join property
Dependency preserving property

The above properties are desirable during the process of decomposition.

Some definitions useful in database design

Key: Is a minimal superkey is also called as candidate key. One of these


becomes PK. Other candidate keys are called alternate keys.

key Attribute: An attribute which is part of some key (any Candidate key)

We study the general definitions of NF in terms of keys, not just PK.


A relation can have any number of keys but ha only one PK.
BITS Pilani, Hyderabad Campus
1. First Normal Form (INF)

It states that the domain of any attribute must include only


atomic (single / simple/ individual) values.
In the example given below, under the column Dloc each
row has more than on values.

Ex.: Dept DId Dname Dloc

10 Engg HYD

CHENNAI

20 Mark HYD

MUMBAI

BITS Pilani, Hyderabad Campus


2.Second Normal Form (2NF)
It is based on full functional dependency.
{X  A} is fully functional if we remove any attribute from X then
that FD does not hold anymore.

Condition for 2NF: All non-key attributes are fully functionally dependent on
key (or) no non-key attribute should be dependent on part of key(partial
dependency).

eid pnum Hours ename ploc

BITS Pilani, Hyderabad Campus


Here, {ename} is a non key attribute and
determined by {eid} which is part of the key. Hence
we say that ename not fully functionally dependent
on key.
The relation shown is not in 2NF. Now we can
decompose this in to three relations as shown
below.

eid ename pnum ploc eid pnum hours

BITS Pilani, Hyderabad Campus


Summary
 What is the Schema Refinement process
 Functional Dependencies
 What are the Inference Rules
 What is Normalization & 1 NF and 2NF

BITS Pilani, Hyderabad Campus


Database Management Systems

BITS Pilani Dr.R.Gururaj


CS&IS Dept.
Hyderabad Campus
Lecture Session-10
Schema Refinement-2
Content

 3 NF and BCNF
 Decomposition requirements
 Lossless join decomposition
 Dependency preserving decomposition
 Examples

BITS Pilani, Hyderabad Campus


Recap of 1NF and 2 NF

1. First Normal Form (INF)

It states that the domain of any attribute must include only


atomic (single / simple/ individual) values.
In the example given below, under the column Dloc each
row has more than on values.

Ex.: Dept DId Dname Dloc

10 Engg HYD

CHENNAI

20 Mark HYD

MUMBAI

BITS Pilani, Hyderabad Campus


2.Second Normal Form (2NF)
It is based on full functional dependency.
{X  A} is fully functional if we remove any attribute from X then
that FD does not hold anymore.

Condition for 2NF: All non-key attributes are fully functionally dependent on
key (or) no non-key attribute should be dependent on part of key(partial
dependency).

eid pnum Hours ename ploc

BITS Pilani, Hyderabad Campus


3. Third Normal form (3NF)

It is based on transitive dependency.


According to this, a relation should not have a non key attribute
functionally determined by another non key attribute. i.e., there should
be no transitive dependency.

ename eid address dnum dname dloc

Not in 3NF, because Dname is transitively dependent on eid.

BITS Pilani, Hyderabad Campus


Now we can decompose the above into 2 relations.

ename eid address dnum Dnum Dname dloc

Condition for 3NF

For each FD, X  A in database

i) X must be a superkey or
ii) A is key attribute
BITS Pilani, Hyderabad Campus
BCNF (Boyce Codd Normal Form)
It is a stricter form of 3NF

Condition

For each FD XA


X must be a superkey

4th NF: Is based on multivalued dependency

5th NF: Is based on join dependency normally database designers


go up to 3NF only, and 4NF & 5NF are beyond the scope of our
discussion.

BITS Pilani, Hyderabad Campus


Decomposition and Desirable properties

As we have seen, decomposition (of a bigger relation R


into smaller ones), is a major step in the process of
normalization.

But during this activity of decomposition, we need to


make sure that the decomposition is lossless and
dependency preserving

BITS Pilani, Hyderabad Campus


Loss-less join Decomposition

Let C represent a set of constraints on the database. A decomposition


{R1, R2, R3,……….R4} of a relation schema R is a lossless join
decomposition for R if all relation instances r on R that are legal under
C.

r   R (r )
1
*  R2
(r )………….= r

 R1
(r ) = projection of r on R1

r – relation instance in R

F = FDs on R

(or) {R}  {R1, R2}

BITS Pilani, Hyderabad Campus


Test for Lossless join property

BITS Pilani, Hyderabad Campus


BITS Pilani, Hyderabad Campus
Dependency Preserving Decomposition

Given a set of dependencies F on R, the projection of F on Ri


denoted by

(where Ri is a subset of R); is the set of FDs X  Y in F+ such that the


attributes in X  Y are contained in Ri.


 
R1 (F) R2 (F) ...., Rm (F)  F 

Then it is dependency preserving decomposition.

 R1 ( f ) - is projection of F on R1.

BITS Pilani, Hyderabad Campus


This dependency preserving condition makes sure that no
FD in original relation is lost as a result of decomposition.
The FDs represent constraints (business logic).

Note:
•Not every BCNF is dependency preserving
•Limited amount of redundancy in 3NF in the form of
transitive dependency is better than losing FDs as result of
bringing 3NF to BCNF.

BITS Pilani, Hyderabad Campus


Summary
 Recap of 2NF
 What is 3NF and BCNF
 Decomposition into 3NF and BCNF
 Lossless join decomposition
 Dependency preserving decomposition

BITS Pilani, Hyderabad Campus


Database Management Systems

BITS Pilani Dr.R.Gururaj


CS&IS Dept.
Hyderabad Campus
Lecture Session-11
Data Storage

Content

 Disk pack features


 Records and Files
 File operations
 Ordered and unordered features

BITS Pilani, Hyderabad Campus


Disk Storage

 Disk is the preferred secondary storage device


for high storage capacity and low cost.
 Data stored as magnetized areas on magnetic
disk surfaces.
 A disk pack contains several magnetic disks
connected to a rotating spindle.
 Disks are divided into concentric circular
tracks on each disk surface.
 Track capacities vary typically from 4 to 50 Kbytes
or more

BITS Pilani, Hyderabad Campus


BITS Pilani, Hyderabad Campus
 A track is divided into smaller blocks or sectors.

 The division of a track into sectors is hard-coded on the disk


surface and cannot be changed.

 A track is divided into blocks.


1. The block size B is fixed for each system.
Typical block sizes range from B=512 bytes to B=4096 bytes.
2. Whole blocks are transferred between disk and main memory
for processing.

BITS Pilani, Hyderabad Campus


BITS Pilani, Hyderabad Campus
 A read-write head moves to the track that contains the block to be
transferred.
Disk rotation moves the block under the read-write head for reading or
writing.
 A physical disk block (hardware) address consists of:
 a cylinder number (imaginary collection of tracks of same radius from
all recorded surfaces)
 the track number or surface number (within the cylinder)
 and block number (within track).
 Reading or writing a disk block is time consuming because of the seek time
s (time to position the head on required track)
3-7msec and rotational delay (latency) – time to position at the
beginning of the required block rd.
3-4 msec with 15000rpm

Block transfer time. Smaller than above two.

BITS Pilani, Hyderabad Campus


Files and Records

• A file is a sequence of records, where each record is a


collection of data values (or data items).
• A file descriptor (or file header) includes information that
describes the file, such as the field names and their data types,
and the addresses of the file blocks on disk.
• Records are stored on disk blocks.
• The blocking factor (bfr) for a file is the (average) number of
file records stored in a disk block.
• A file can have fixed-length records or variable-length records.

BITS Pilani, Hyderabad Campus


• File records can be unspanned or spanned
– Unspanned: no record can span two blocks
– Spanned: a record can be stored in more than one block
• The physical disk blocks that are allocated to hold the records
of a file can be contiguous, linked.
• In a file of fixed-length records, all records have the same
format. Usually, unspanned blocking is used with such files.
• Files of variable-length records require additional information
to be stored in each record, such as separator characters and
field types.
– Usually spanned blocking is used with such files.

BITS Pilani, Hyderabad Campus


File operations

Typical file operations include:


 OPEN: Readies the file for access, and associates a pointer that will refer to a
current file record at each point in time.
 FIND: Searches for the first file record that satisfies a certain condition, and makes it
the current file record.
 FINDNEXT: Searches for the next file record (from the current record) that satisfies
a certain condition, and makes it the current file record.
 READ: Reads the current file record into a program variable.
 INSERT: Inserts a new record into the file & makes it the current file record.
 DELETE: Removes the current file record from the file, usually by marking the record
to indicate that it is no longer valid.
 MODIFY: Changes the values of some fields of the current file record.
 CLOSE: Terminates access to the file.
 REORGANIZE: Reorganizes the file records.
For example, the records marked deleted are physically removed from the file or a new
organization of the file records is created.
 READ_ORDERED: Read the file blocks in order of a specific field of the file.

BITS Pilani, Hyderabad Campus


Unordered Files

Also called a heap or a pile file.


New records are inserted at the end of the file.
A linear search through the file records is necessary to
search for a record.
– This requires reading and searching half the file
blocks on the average, and is hence quite
expensive.
Record insertion is quite efficient.
Reading the records in order of a particular field
requires sorting the file records.
BITS Pilani, Hyderabad Campus
Ordered Files

• Also called a sequential file.


• File records are kept sorted by the values of an ordering field.
• Insertion is expensive: records must be inserted in the correct
order.
A binary search can be used to search for a record on its
ordering field value.
– This requires reading and searching log2 of the file blocks
on the average, an improvement over linear search.
• Reading the records in order of the ordering field is quite
efficient.

BITS Pilani, Hyderabad Campus


BITS Pilani, Hyderabad Campus
Summary
 What is Disk storage
 Disk characteristics
 Disk pack structure
 Files and Records
 Ordered and unordered files

BITS Pilani, Hyderabad Campus


Database Management Systems

BITS Pilani Dr.R.Gururaj


CS&IS Dept.
Hyderabad Campus
Lecture Session-12
Hashing Techniques
Content

1. Introduction to hashing
2. Internal hashing
3. Collision
4. External hashing
5. Static hashing
6. Dynamic hashing

BITS Pilani, Hyderabad Campus


Hashing

Hashing technique is an alternative to indexing, for fast retrieval of data


records based on search key.

The search field is called as hash field of the file.

In most cases the hash field is also a key field of the file, in which case it
is called as hash key.

The basic idea of hashing is that a hash function h, when supplied a


hash field value K of a record produces the address B of the disk block
that contains the record with specified key value.

BITS Pilani, Hyderabad Campus


h : K  B

Hash Key Disk block


function value address

Once the disk block is known, the actual search for the record within the
block is carried out in main memory buffer.

For most records we require only one block access.

BITS Pilani, Hyderabad Campus


Internal Hashing
0 to (M-1)
Used for internal files. 0
1
A hash table is implemented through 2
use of an array of records. (M-1)
Array with M
locations

The most common hash function used is h(k) = K mod M


This gives the index of the location in the array.
For example- if M = 10 and the key value is 24
K mod M  24 mod 10 = 4
Hence the record with key value 24 will be stored in 5th location of
the array.
If two or more records are hashed to same location it is called as
collision.
Then we need to find some other location for the new record. This
process is known as collision resolution.

BITS Pilani, Hyderabad Campus


Methods for collision resolution

Open addressing: When collision occurs try with alternate cells


until an empty cell is formed.
Chaining: for this various overflow locations are kept by
extending the array by number of overflow positions. A pointer
field is added to each record location. Collision is resolved by
allocating an unused overflow position.
Multiple hashing: We apply a second hash function if the first
hashing results in a collision.

The goal of a good hashing function is to distribute the records


uniformly over the address space so as to minimize collisions
while not leaving many unused locations.

BITS Pilani, Hyderabad Campus


External Hashing

Hashing used for disk files is called as external hashing. The


disk block contains records. A single disk block or cluster of
contiguous blocks is known as a bucket.
The hashing function maps a key value into a relative bucket
number. A table maintained in the file header converts the
bucket number into the corresponding disk block address, as
shown in the figure below. Block h:KB
Bucket address on Bucket number
number disk

1
2

M-2
M-1
Disk

BITS Pilani, Hyderabad Campus


BITS Pilani, Hyderabad Campus
The above scheme is called as static hashing because the
number of buckets allocated is fixed. This is a big constraint
for files that are dynamic.

When a bucket is filled to capacity and if the new record is


hashed on to the same bucket, then chaining is adopted,
where a pointer is maintained in each bucket to a linked list
of overflow records for the bucket.

The pointers are record pointers which include both block


address and a relative record position with in that block.

BITS Pilani, Hyderabad Campus


Handling overflows in Static External
Hashing

BITS Pilani, Hyderabad Campus


Dynamic Hashing

This scheme allows us to expand or shrink the


hash address space dynamically.

Each result of applying the hash function is a


nonnegative integer and hence can be
represented with a binary pattern. This we call
it as hash value of the record.

Records are distributed among the buckets


based on the values of the leading bits in their
hash value.
BITS Pilani, Hyderabad Campus
Extendible Hashing

The first technique is called as extendible hashing.


This scheme stores a directory structure in addition to
the file. This access structure is based on the result of
the hash function to the search field. The major
advantage of extendible hashing is that performance
does not degrade because of chaining, as the file
grows as we have seen in static hashing. In extendible
hashing no additional space is wasted towards the
allocations for future growth, but additional buckets
can be allocated dynamically as needed. The only
overhead in this scheme is that a directory structure
needs to be searched before the buckets are
accessed.
BITS Pilani, Hyderabad Campus
BITS Pilani, Hyderabad Campus
BITS Pilani, Hyderabad Campus
Linear Hashing

In the second scheme called linear hashing,


no directory structure is used. Here instead of
one hash function, multiple hash functions are
used. When collision occurs with one hash
function, the bucket that overflows is split in to
two and the records in the original bucket are
distributed among two buckets using the next
hash function h(i +1) (k). Hence we have
multiple hash functions.

BITS Pilani, Hyderabad Campus


BITS Pilani, Hyderabad Campus
Summary

• What is hashing
• Internal hashing
• External hashing
• What is static external hashing
• What is dynamic hashing
• How Extendible and Linear hashing techniques work

BITS Pilani, Hyderabad Campus


Database Management Systems

BITS Pilani Dr.R.Gururaj


CS&IS Dept.
Hyderabad Campus
Lecture Session-13
Indexing -1

Content

 What is Indexing
 Primary and Secondary indexes
 Dense and Sparse Indexing
 Multilevel Indexing
 Designing Primary and Multilevel Indexes

BITS Pilani, Hyderabad Campus


Introduction to Indexing

An index for a file works in much the same way as a catalog


in a library.
In a library cards are kept in alphabetical order. So we don’t
have to search all cards.
In real world databases, indexes may be too large to be
handled efficiently.
Hence some sophisticated techniques are to be used.

Techniques for efficient retrieval of required records from


disk are:
• Hashing
• Indexing

BITS Pilani, Hyderabad Campus


The criteria for evaluating the hashing or indexing techniques –
 Access time
 Insertion time (new indexes or new records)
 Deletion time
 space overhead

Some times more than one indexing may be required for a file.

The attribute /field used for constructing index structure for a


file is called a ‘indexing field/attribute’ .

BITS Pilani, Hyderabad Campus


If the index field is a key, it is called as search key or indexing key.

Indexes on key attributes:

1. Built on ordering key(PK) – Primary index


2. Non-ordering Key - Secondary index on key attribute

Indexes on non-key attributes:


1. Ordering non-key -- Clustering Index
2. Non-ordering non-key attribute – Secondary index on non-key

Hence, a file can have at most one primary index o one clustering
index, but not both.

BITS Pilani, Hyderabad Campus


Indexing

Ordering field
Nonordering field
(secondary index)

Key
Non-key
(primary index) key Non-key
(Clustering index)

BITS Pilani, Hyderabad Campus


Data record: Similar kind of records(of a relation/table) are
stored in a single file containing blocks. These are called
data records and will have fields specified on the relation.

Index record: Like data records, index records are also


stored in database. Any index record normally has two
fields.

Value Pointer

Key value Location address of


the record containing
the key

BITS Pilani, Hyderabad Campus


25
30
Key Pointer
25 41
30
41
84

Index file
Data file
(data records)

BITS Pilani, Hyderabad Campus


Dense Index : In this, an index record appears for every data file record.

Sparse Index : Index records are created only for some data file
records. This occupies less space. Sparse index can be on primary or
secondary key.

A primary index and clustering index are non-dense.

BITS Pilani, Hyderabad Campus


Primary Indexing

Data files / Blocks

Ind ex 2
Files/Blocks 5
6
Pointer to 9
Ke y block
2
15 15
25 17
18
30 19
38
45 25
27
60 29

30
35
6
9

BITS Pilani, Hyderabad Campus


Dense and Sparse Indexing

24
24
32
32
36 24
36
40 40
40
50 54
50
54
54
56
56
60
60
Sparse index
Dense indexing File with data
records

BITS Pilani, Hyderabad Campus


Secondary Indexing
( Built on non-ordering non-key attribute)

Buckets Index attribute


Secondary Index 28
35
Field 28
Value ptr
28 39
35
35
39 39

45 45
28
39
45
45

Data records

BITS Pilani, Hyderabad Campus


Primary Indexing

Data files / Blocks

Ind ex 2
Files/Blocks 5
6
Pointer to 9
Ke y block
2
15 15
25 17
18
30 19
38
45 25
27
60 29

30
35
6
9

BITS Pilani, Hyderabad Campus


Designing a Primary index

Assume that we have an ordered file with 80000 records


stored on disk. Block size is 512 Bytes. Record length is
fixed and it is 70 Bytes. Key field(PK) length is 6 Bytes
and block pointer is 4 Bytes. Assume unspanned record
organization

Design a Primary index on primary key.

BITS Pilani, Hyderabad Campus


Sizeof disk block=512 Bytes; record length=70 Bytes
Block pointer=4 Bytes. Key field=6 bytes; total records=80000
No. records per block(Bfr)= floor (512/70)=7.31=7
No. of data blocks needed= ceil( 80000/7)= 11429
Index record length= key + pointer=6+4=10 Bytes
Blocking factor for index (Bfri) = floor(512/10)=51
(known as fanout)
No. of index blocks = Ceil(11429/51)= 225

No. of block accesses= floor of (log2 225) + 1 = 8+1=9

BITS Pilani, Hyderabad Campus


BITS Pilani, Hyderabad Campus
Multilevel Indexing (Two levels)

Data files / Blocks

First Level 5
6
Pointer to 9
Key block
2
Second Level
15 15
Pointer to 25 17
Key next level
18
2
30 19
30
38
60
45 25
27
60 29

30
35
6
9

BITS Pilani, Hyderabad Campus


Designing a multilevel index

Assume that we have an ordered file with 80000 records


stored on disk. Block size is 512 Bytes. Record length is
fixed and it is 70 Bytes. Key field(PK) length is 6 Bytes
and block pointer is 4 Bytes. Assume unspanned record
organization

Design a multilevel index on primary key.


How many levels are there.
How many blocks are there in each index level.

BITS Pilani, Hyderabad Campus


Size of the disk block=512 Bytes; record length=70 Bytes
Block pointer=4 Bytes. Key field=6 bytes; total records=80000
No. records per block(Bfr)= floor (512/70)=7.31=7
No. of data blocks needed= ceil( 80000/7)= 11429
Index record length= key + pointer=6+4=10 Bytes
Blocking factor for index = floor(512/10)=51 - fanout
No. of index blocks in first level= Ceil(311429/51)= 225
No. of index blocks in 2nd level= Ceil(225/51)= 15
No. of index blocks in 3rd level= Ceil(5/51)= 1 top level

No. of levels=t=3
No. of block accesses= No. index levels + 1= t+1=4

BITS Pilani, Hyderabad Campus


BITS Pilani, Hyderabad Campus
Action on deletion of records
If the record is the last record with that value delete the
entry in index file too. If it is dense index delete it like record
in a file. If it is sparse, we delete the entry and replace with
next key value, if it is not already existing.

Action on Inserting a new record


If the indexing is dense, insert the new key into the index. If
sparse no change is to be made unless new block is
created.

BITS Pilani, Hyderabad Campus


Summary
 What is Indexing and its importance
 How Primary and Secondary indexes work
 Examples of Dense and Sparse Indexes
 What is Multilevel Indexing
 Some example problems on designing Primary and Multilevel
Indexes

BITS Pilani, Hyderabad Campus


Database Management Systems

BITS Pilani Dr.R.Gururaj


CS&IS Dept.
Hyderabad Campus
Lecture Session-14
B+ Tree Indexing

Content

 What is Tree Indexing


 B+ tree
 Inserting and deleting keys into B+ Trees
 B Tree
 Constructing a B+ tree
 Designing a B+ Tree node structure

BITS Pilani, Hyderabad Campus


Tree Indexing

Adopting Tree structure for implementing indexes


A tree consists of nodes & leaves. The number of arcs from a node in the
tree to root is known as path-length.
The height of non empty tree is equal to max.level of a node in a tree. For
empty tree height is zero.

Binary tree
Each node has max two children (left and right). Hence at ith level, no of
nodes present are 2(i-1) (root is at level 1).

Complete binary tree: All nodes except at last level are present.

Binary Search Trees: for each node in the tree, all values stored in its left
subtree are less than value stored in the node and all values stored in the
right subtree are greater than the value in the node.

BITS Pilani, Hyderabad Campus


Multilevel Search Tree of order m
(or)
M-way search tree

• Each node has m children and (m -1) keys

• Keys in each node are in ascending order


24 32 40 60
K1 K2 K3 K4

Child 1 Child 2 Child 3 Child 4 Child 5

No of children = (m) = 5
No of keys = (m -1) = (5-1) = 4

BITS Pilani, Hyderabad Campus


B+ Tree Indexing

B+ Tree is a multilevel search tree used to implement


dynamic multilevel indexing. The primary disadvantage of
implementing multilevel indexes is that the performance
degrades as the file grows. It can be remedied by
reorganization, but frequent reorganization is not advisable.
B+ tree is best suited for multilevel indexing of files, because
it is dynamic.

B+ Tree of Order p
It is a balanced tree, (all leaves are at same level).
Each internal node is of the form- 24 32 40 60
K1 K2 K3 K4

Child 1 Child 2 Child 3 Child 4 Child 5

BITS Pilani, Hyderabad Campus


B+ Trees

For a B+ tree of order p

 With in each internal node K1 < K2


 P1, P2… are tree pointers
 K1, K2, K3,… are key values which are in ascending
order from left to right
 Each internal node has at most p (order) pointers.
 Each internal node except the root node has at least
ceil (p/2) tree pointer to next level. Root has at least 2
pointers.

BITS Pilani, Hyderabad Campus


 An internal node with q pointers has (q -1) field values.

 All record pointers are available at leaf node only.

 Once we get a key value at leaf node, from there


accessing next value in sequence is easy because all keys
at leaf level are in ascending order.

BITS Pilani, Hyderabad Campus


EX: B+ Tree of order 3 i.e., p=3

Min. no. pointers in any node = 3


 2 
Root
17

5 14 19 17 40

Leaf
level 3 4 10

Record
pointer

BITS Pilani, Hyderabad Campus


B- Tree

BITS Pilani, Hyderabad Campus


Note
In a B+ tree record pointer for a record with given
key can be found only at leaf node.
But if it is in case of B-tree it can happen at
intermediate node also.
Hence in B+ tree search, success or failure can be
declared only after reaching leaf_level.
Where as in B-tree search can be successful at
intermediate level as well.
On failure we reach the leaf level.

BITS Pilani, Hyderabad Campus


Constructing a B+ Tree

Construct a B+ tree with given specifications. The order of the tree, p=3
and pleaf =2. The tree should be such that all the keys in the subtree pointed
by a pointer which is preceding the key must be equal to or less than the
key value , and all the keys in the subtree pointed by a pointer which is
succeeding the key must be greater than the key.

Insert the following keys in same order- 56, 22, 78, 42, 102, 90, 96, 35.
Show how the tree will expand after each insertion, and the final tree.

Next, delete 56, 46, 22 in the same order and show the status of the tree
after each deletion.

BITS Pilani, Hyderabad Campus


BITS Pilani, Hyderabad Campus
Node design for B+ tree

We need to design a B+ tree indexing for Student


relation, on student_id attribute; the key of the relation.
The attribute student_id is of 4 bytes length. Other
attributes are- student_age(4 bytes), student_name(20
bytes), student_address(40 bytes), student_branch(3
bytes). The Disk block size is 1024 Bytes. If the tree-
pointer takes 4 bytes, for the above situation, design the
best possible number of pointers per node(internal) of the
above B+ tree. Each internal node is a disk block which
contains search key values and pointers to subtrees.

BITS Pilani, Hyderabad Campus


Disk block size=1024 Bytes
Size of B+ tree node= size of disk block
Each tree pointer points to disk block and takes 4 Bytes.
Each key (student_id) takes 4 Bytes
In a B+ tree node, No. of pointers = no. keys +1
Assume that no. keys = n
Then no. pointers= n+1
Then min. size for a node= {(no.Keys* size of each key)+
(no.pointers * size of each pointer)} <= 1024
(n*4)+(n+1)*4 <=1024
4n+4n+4 <= 1024
8n+4 <= 1024
8n <= 1024-4= 1020
n <= 127.5 or 127
hence In each internal node, no. keys=127; and no. pointers=128

BITS Pilani, Hyderabad Campus


Summary
 What is Tree Indexing
 B tree and B+ tree concepts
 Constructing a B+ tree (Insert/Delete operations)
 Designing a B+ Tree node structure

BITS Pilani, Hyderabad Campus


Database Management Systems

BITS Pilani Dr.R.Gururaj


CS&IS Dept.
Hyderabad Campus
Lecture Session-15
Transaction Processing

Content

 What is Transaction Model


 Significance of Transaction Model
 States of a transaction
 ACID Properties

BITS Pilani, Hyderabad Campus


Introduction to the Transaction Model

Multi-user Database systems:


Multiple users access the database simultaneously (multi processing).

Concurrency There occurs concurrency in data access.

Transaction:

A transaction is a collection of operations that perform a single logical


operation or function in a database application.

Each transaction is a unit of atomicity.

BITS Pilani, Hyderabad Campus


Storage Types
Volatile Storage: Ex- main memory, cache
Nonvolatile storage : Ex- disk, tapes, etc.

Storage Hierarchy
DB system resides on nonvolatile storage.

Database is partitioned into blocks of fixed length storage,


which are units for storage allocation and transfer.

Transactions input and output data from disk to main memory,


and main memory to the disk.

The data transfer is done in terms of blocks.

A buffer block is same as disk block but it is in main memory,


but the size is same.
BITS Pilani, Hyderabad Campus
The block movement between disk and main memory is initiated by
following operations.

Input (X) – The physical block with data item X is brought from disk into
main memory.

Output (X) – Buffer block containing the data item X is sent to disk to
replace the appropriate physical block.

BITS Pilani, Hyderabad Campus


Input (A) Disk
Buffer blocks
blocks A

B
Disk
Output (B)

Main Memory

BITS Pilani, Hyderabad Campus


Transactions interact with DB by transferring data from program variables to
the DB and DB to program variables.

This transfer of data is achieved through the following two operations.

I.read (X, xi) -where xi is local variable X is DB data item and


represents the operation
xi  X
If the block with data item X is not in main memory issued Input (X)
Assign xi the value of X from buffer blocks

II. write (X, xi) performs xi  X


If block with X is not there Input (X), assign xi to X in the buffer block

Note: Reading is must but transaction need not write every item. The modified
blocks can be written back onto the disk during page replacement in main
memory.

BITS Pilani, Hyderabad Campus


Steps followed by Transactions while accessing data for processing

Read (X, xi)  uses (Input (X))



Modify xi

Write (X, xi)  uses (Output (X))

If the system crashed before the new value is written to the disk, then the
new value is lost forever and never written to the disk.

BITS Pilani, Hyderabad Campus


A transaction is an atomic unit of work that is
either completed in its entirety or not done at
all.
– For recovery purposes, the system needs to keep
track of when the transaction starts, terminates,
and commits or aborts.

BITS Pilani, Hyderabad Campus


Transaction Model

A transaction is a program unit that access and update several data


items.
Read ( ) and Write ( ) are the basic operations.

Data prior to Exe. of After Execution


execution of transaction. Data of the transaction
transaction may be in
inconsistent state

Difficulties when Goes to next


Consistent failure occur in consistent state
state the process

Hence, as a result of failure, state of the system will not reflect the state of
the real world that the database is supposed to capture.

We call that state as inconsistent state.

It is important to define transactions such that they preserve consistency.

BITS Pilani, Hyderabad Campus


ACID Properties of a Transaction

Transaction should possess the following properties called as ACID


properties.

Atomicity: A transaction is an atomic unit of processing. It is either


performed in its entirety or not performed at all.

Consistency Preservation: The successful execution of a transaction


takes the database from one consistent state to another.

Isolation: A transaction should be executed as if it is not interfered by


any other transaction.

Durability: The changes applied to the data by a transaction must be


permanent.

BITS Pilani, Hyderabad Campus


Transaction States

Active State: Initial state when a transaction starts.

Partially committed State: This state is reached when the last


statement is executed and the outcome is not written to the DB.

Failed State: After discovering that the normal execution cannot be


continued a transaction is aborted and reaches failed state.

Comitted State: Is reached after successful completion of the


transaction.

Terminated State: Is reached after failure or success of the transaction.

BITS Pilani, Hyderabad Campus


BITS Pilani, Hyderabad Campus
Those transactions which are not completely successful are called
failed transactions.

In order to ensure atomicity property, failed transactions should have


no effect on the database.

Hence the state of the DB must be restored to the state it was in just
before the transaction started its execution.

We say such transaction is rolled back.

BITS Pilani, Hyderabad Campus


When the transaction rolls back the modifications done to the DB by
the half compete transaction are removed so that the state reflects the
state before the start of execution of the transaction.

A transaction which is completely successful is called committed


transaction.

A committed transaction brings the DB to new consistent state. The


effects of committed transactions cannot be rolled back.

Once a transaction is aborted, it must be terminated and new


transaction must be started.

A transaction reaches committed state if it has partially committed and


it is guaranteed that it will never be aborted.

BITS Pilani, Hyderabad Campus


Summary
 What is a transaction
 Basic database operations performed by a transaction
 Properties of a transaction
 States of a transaction
 Transaction execution and the database consistency

BITS Pilani, Hyderabad Campus


Database Management Systems

BITS Pilani Dr.R.Gururaj


CS&IS Dept.
Hyderabad Campus
Lecture Session-16
Concurrent Transactions and Schedules

Content

 Concurrent Transactions
 Transaction Schedule
 Serial and Concurrent Schedules
 Need for Concurrency Control
 Conflicting Operations
 Conflict Equivalent Schedule
 Test for Conflict Serializability
 View Equivalent Schedule
 View Serializability

BITS Pilani, Hyderabad Campus


Introduction

 Multiprogramming in modern systems increases the throughput


drastically, as the resources are shared by more than one process.

 Similarly in a DBMS multiple transactions are executed concurrently.

A transaction is a collection of operations that perform a single logical


operation or function in a database application. Each transaction is a
unit of atomicity.

 Here, for transactions we consider data items as resources


because transactions process data by accessing them.

 When multiple transactions access data elements in a concurrent


way, this may destroy the consistency of the database.

BITS Pilani, Hyderabad Campus


Transaction Schedule

The descriptions that specify the execution sequence of instructions in


a set of transactions are called as schedules.

Hence schedule can describe the execution sequence of more than


one transaction.
T1 T2

Here, in the above schedule the T1 and T2 are transaction


Read (A)
transactions T1 & T2 are executed in a A = A + 50 Read (A) – Reads data item
serial manner i.e., first all the Read (B) A
instructions of the transaction T1 are B = B +A Write (B) – Writes the data
executed, and then the instructions of Write (B) item B
T2 are executed. Hence the above Read (B)
schedule is known as serial schedule. B = B + 75
Write (B)

BITS Pilani, Hyderabad Campus


In a serial schedule, instructions belonging to one single transaction
appear together.

A serial schedule does not exploit the concurrency. Hence, it is less


efficient.

If the transactions are executed concurrently then the resources can be


utilized more efficiently hence more throughput is achieved.

BITS Pilani, Hyderabad Campus


A serial schedule always results in correct database state that reflect
the real world situations.

When the instructions of different transactions of a schedule are


executed in an interleaved manner use call such schedules are called
concurrent schedules.
This kind of concurrent schedules may result in incorrect database
state.
T1 T2

Read (A)
A = A + 50
Time
Read (A)
A = A + 100
Write (B)
Write (A)

BITS Pilani, Hyderabad Campus


Why Concurrency control is needed?
The Lost Update Problem
This occurs when two transactions that access the same database
items have their operations interleaved in a way that makes the value of
some database item incorrect.
The Temporary Update (or Dirty Read) Problem
This occurs when one transaction updates a database item and then
the transaction fails for some reason (see Section 17.1.4).
The updated item is accessed by another transaction before it is
changed back to its original value.
The Incorrect Summary Problem
If one transaction is calculating an aggregate summary function on a
number of records while other transactions are updating some of these
records, the aggregate function may calculate some values before they
are updated and others after they are updated.

BITS Pilani, Hyderabad Campus


It is desirable that a schedule, after execution must leave
the database in a consistent state.

The result of a concurrent execution must be same as the


result of executing the transactions in serial way.

A concurrent schedule whose result is same as that of a


serial schedule is called as concurrent serializable
schedule.

BITS Pilani, Hyderabad Campus


Conflicting Operations

For transactions T1 & T2 the order of read operation on any


data element does not matter.

{T1R(Q), T2R(Q)} or {T2R(Q), T1R(Q)} does not matter.

The result is same and does not lead to any conflict.

Here, Q is the data element.

But {T1R(Q), T2W(Q)} is not same as {T2W(Q), T1R(Q)}

If Ii and Ij are the operations (instructions) two different


transactions on the same data item, and at least one of these
instructions is a WIRTE operation then we say that Ii and Ij are
conflict operations.

Here, I stands for instruction and i and j are transactions.


BITS Pilani, Hyderabad Campus
Hence it is evident that if we swap non-conflicting operations of a
concurrent schedule, it will not affect the final result.
Look at the following example.
T1 T2 T1 T2 T1 T2
R(A) R(A) R(A)
W(A) W(A) W(A)
R(A) R(A) R(B)
W(A) R(B) R(A)
R(B) W(A) W(B)
W(B) W(B) W(A)
R(B) R(B) R(B)
W(B) W(B) W(B)
(S1) (S2) (S3)
Concurrent schedule Swap W(A) in T2 with Swap R(A) of T2 with
with T1 & T2 R(B) in T1 (because R(B) of T1 and W(A)
accessing A, B (data they are non in T2 with W(B) in T1
item) conflicting) (Since non
conflicting)

BITS Pilani, Hyderabad Campus


T1 T2
R(A) Now, the final schedule is a serial schedule
W(A)
R(B)
W(B)
R(A)
W(A)
R(B)
W(B)
(S4)
Swap R(A) of T2 with
W(B) of T1

BITS Pilani, Hyderabad Campus


Conflict Equivalent Schedules

If a schedule S can be transformed into a schedule S by


a series of swaps of non-conflicting instructions, we say
that S and S are conflict equivalent.

Further, we say that a schedule S is conflict serializable if


it is conflict equivalent to a serial schedule.

In our example S4 in the above example is a serial


schedule and is conflict equivalent to S1. Hence S1 is a
conflict serializable schedule.

BITS Pilani, Hyderabad Campus


T1 T2 In this schedule we cannot perform any swap between instructions of T1 and T2. Hence it
R() is not conflict serializable

W()

W()

BITS Pilani, Hyderabad Campus


Test for Conflict Serializability

Let S be a schedule.

We construct a precedence graph.

Each transaction participating in the schedule will become a


vertex.

The set of edges consist of all edges Ti  Tj for which one of


the following three conditions hold-

Ti executes W(Q) before Tj executes R(Q)


Ti executes R(Q) before Tj executes W(Q)
Ti executes W(Q) before Tj executes W(Q)

BITS Pilani, Hyderabad Campus


T0 T1
R(A)
Infact their schedule is non conflict serializable
T0 T1
R(A)

W(A)

W(A)

R(B)

R(B)

W(B)

T1 writes(A) after T0 writes(A) hence we draw on edge from T0  T1


T1 reads(B) before T0 write (B) hence we can draw an edge from T1
 T0
At any moment of time, while developing the graph in the above
manner, if we see a cycle then the schedule is not conflict
serializable. If no cycles at the end, then it is conflict serializable.
Hence the above schedule is not serializable.
BITS Pilani, Hyderabad Campus
Now let us consider the following transaction which is conflict serializable
and discussed earlier.
T1 T2
R(A)
Now, let us draw a precedence graph for the above
W(A)
schedule –
R(A) To write A before T1 reads A. hence we have T0  T1.
W(A)
R(B)
W(B)
R(B) T0 T1
W(B)

We have only one edge in this graph, and no


cycles. Hence it is conflict serializable.

BITS Pilani, Hyderabad Campus


Serial schedule:
– A schedule S is serial if, for every transaction T
participating in the schedule, all the operations of
T are executed consecutively in the schedule.
– Otherwise, the schedule is called nonserial
schedule.
Serializable schedule:
– A schedule S is serializable if it is equivalent to
some serial schedule of the same n transactions.

BITS Pilani, Hyderabad Campus


Result equivalent:
– Two schedules are called result equivalent if they
produce the same final state of the database.
Conflict equivalent:
– Two schedules are said to be conflict equivalent if
the order of any two conflicting operations is the
same in both schedules.
Conflict serializable:
– A schedule S is said to be conflict serializable if it is
conflict equivalent to some serial schedule S’.
BITS Pilani, Hyderabad Campus
• Being serializable is not the same as being
serial
• Being serializable implies that the schedule is a
correct schedule.
– It will leave the database in a consistent state.
– The interleaving is appropriate and will result in a
state as if the transactions were serially executed,
yet will achieve efficiency due to concurrent
execution.

BITS Pilani, Hyderabad Campus


View Equivalent Schedules

Two schedules S and S (where same set of transactions participate in both


schedules), are said to be view equivalent if the following three conditions are met.

For each data item Q, if the transaction Ti reads the initial value of Q in S, then
transaction Ti must in schedule S, also read the initial value of Q.

For each data item Q, if transaction Ti executes read (Q) in S, and the value
produced by transaction Tj (if any) then transaction Ti must in schedule S also read
the value of Q that was produced by transaction Tj.

For each data item Q, the transaction if any that performs the final write(Q)
operation in schedule S must perform the final write(Q) operation in schedule S.

Now, we say that a schedule S is view serializable if it is view equivalent to a serial


schedule.

Note: Every conflict serializable schedule is view serializable. But not all view
serializable schedules are conflict serializable.
BITS Pilani, Hyderabad Campus
Summary
 What are concurrent Transactions
 What are Serial and Concurrent Schedules
 Why Concurrency Control needed in DBMS
 Conflict Equivalent Schedule and its importance
 Test for Conflict Serializability

BITS Pilani, Hyderabad Campus


Database Management Systems

BITS Pilani Dr.R.Gururaj


CS&IS Dept.
Hyderabad Campus
Lecture Session-17
Concurrency Control

Contents

 Introduction to Concurrency Control


 Implementing Serializability
 Lock-based protocols
 Deadlock condition
 Two-phase locking protocol
 Time-phase locking protocol

BITS Pilani, Hyderabad Campus


Introduction

 In a DBMS multiple transactions are executed concurrently.

 If the transactions are executed concurrently then the resources can


be utilized more efficiently hence more throughput is achieved.

 Here, for transactions we consider data items as resources


because transactions process data by accessing them.

 When multiple transactions access data elements in a concurrent


way, this may destroy the consistency of the database.

BITS Pilani, Hyderabad Campus


Implementing Serializabilty

One way to ensure serializability is to allow the transactions to access the


data items in a mutually exclusive manner.

This is to make sure that when one transaction access a data item no other
transaction can modify that data item.

The following techniques implement mutual exclusion and control


concurrency.

1. Lock-based protocols

2. Timestamp-based protocols

BITS Pilani, Hyderabad Campus


1. Concurrency Control Using Locks: A data item may be locked in
various modes.

i) Shared (denoted by S): if a transaction obtains a shared mode lock on


a data item Q, it can read Q but not modify Q.

(ii) Exclusive (denoted by X): if this lock is obtained, a transaction can


read or write the data item.

BITS Pilani, Hyderabad Campus


Lock Compatibility Matrix
S X

S True False

X False False

This says that if a transaction Ti obtains a lock on a data item in S-


mode, other transaction can get a lock on the same item in S-mode
but not in X-mode.

If a transaction obtains a lock in X-mode on a data item no other


transaction can obtain a lock on the same data item in any mode.

BITS Pilani, Hyderabad Campus


Deadlock

The Mutual exclusion mechanism leads to deadlock situation.

For example, if transaction Ti holds a lock on a data item (Q) in


X-mode and waits for a lock on another data item (P) which is
locked by another transaction Tj in X-mode, further to release
the lock on P, Tj must acquire a lock on Q, which is locked by
Ti. This is a circular wait condition and results in a deadlock
situation.

BITS Pilani, Hyderabad Campus


Wait-for Graph

Deadlock condition can be determined by a wait-for graph.

All transactions of the schedule become vertices.

And we have an edge between two transactions Ti and Tj. if Ti


is waiting for Tj to release a lock on a data item.
If the graph has a cycle then we can say that the schedule will
result in a deadlock.

BITS Pilani, Hyderabad Campus


S(A) means transaction locks A
T1 T2 T3 T4
in share mode
S(A)
R(A) – transaction reads A
R(A)

X(B)
X(C) – Transaction locks
W(B)
C in X-mode
S(B)
W(B) – transaction write B
S(C)

R(C) T1S(B) after


X(C) T2  (B)
X(B)
T1 T2
X(A) T3X(A) after
T1 S(A) T2X(C) after
T3 S(C)

T4X(B) after
T2 X(B)

T4 T3

BITS Pilani, Hyderabad Campus


In the above graph there exists a cycle hence this schedule
leads to deadlock.
If a transaction Ti requests a lock and transaction Tj holds a
conflicting lock. The lock manager can use one of the
following policies to prevent deadlocks.

Timestamp based:
Wait-Die: If Ti is older than Tj it is allowed to wait otherwise
aborted.
Wound-wait: If Ti older than Tj allowed to run by aborting Tj
else Ti will wait.

Priority based
Wait-Die: If Ti has higher priority than Tj it is allowed to wait
otherwise aborted.
Wound-wait: If Ti lies higher priority it is allowed to run by
aborting Tj else Ti will wait.
BITS Pilani, Hyderabad Campus
Two-phase locking protocol:

This protocol answers serializability.

According to this each transaction issues lock and unlock requests in two
phases.
i) Growing phase: In this phase, a transaction may obtain locks but
may not release any lock.
ii) Shrinking phase: In this phase, a transaction may release locks
but may not obtain any new locks.

The two-phase locking protocol ensures conflict serializability.

It does not ensure freedom from deadlock.

BITS Pilani, Hyderabad Campus


2. Timestamp-based Concurrency Control

Maintaining the ordering between every pair of conflicting transactions is


significant.

If we select the ordering in advance, we can achieve serializability. Time-


stamping is a method to fix the ordering.

Each transaction is assigned a unique fixed timestamp.


If TS(Ti) < TS(Tj), this implies that Ti should be executed before Tj.

BITS Pilani, Hyderabad Campus


The time-stamps determine the serializability order.
Each data item is associated with two timestamp values.

W-timestamp(Q) – represents the largest timestamp of any transaction that


successfully executes Write(Q).

R-timestamp(Q) - which denotes the largest time stamp of any transaction


that successfully executed Read(Q).

These values are updated whenever read(Q) or write(Q) are executed.

BITS Pilani, Hyderabad Campus


Timestamp ordering Protocol:

This protocol operates as follows:


i)Suppose Transaction Ti issues read(Q)
If TS(Ti) < W-stamp(Q), then it implies that Ti need to read Q which was
already overwritten.
Hence read operation is rejected and Ti is rolled back.
If TS(Ti)  W-timestamp (Q) then read operation is executed.

ii)Suppose Ti issues write (Q)


If TS(Ti) < R-timestamp(Q) it implies that the value of Q being produced by
Ti had to be written long back.
Hence reject Ti & roll back.
If TS(Ti) < W-timestamp(Q), Ti is attempting to write some absolute value of
Q.
Hence reject Ti & roll back.
Otherwise write operation is executed.

BITS Pilani, Hyderabad Campus


Summary
 Concepts related to Concurrency Control
 Approaches for Implementing Serializability
 How lock-based protocols work
 Detecting the Deadlock condition and resolving
 Two-phase locking protocol
 How timestamp-based protocol works

BITS Pilani, Hyderabad Campus


Database Management Systems

BITS Pilani Dr.R.Gururaj


CS&IS Dept.
Hyderabad Campus
Lecture Session-18
Database Recovery

Content

 Introduction to Recovery
 Recovery strategies
 Log-based recovery
 Check-pointing
 Shadow paging

BITS Pilani, Hyderabad Campus


Introduction to Recovery

Data prior to Exe. of After Execution


execution of transaction / of the transaction
transaction Data may be in
inconsistent state

Difficulties when Goes to next


Consistent failure occur in consistent state
state the process

As a result of failure of a transaction, the state of the system will not


reflect the state of the real world that the database is supposed to
capture. We call that state as inconsistent state.

When such thing happens we must make sure that the database is
restored to its previous consistent which existed before the start of the
transaction (which has failed).
This process is known as recovery process.
BITS Pilani, Hyderabad Campus
Recovery Techniques

If a transaction T performed multiple database


modifications, several output operations may be required
and a failure may occur after some of these modifications
have been made but before all of them are made.

In order to restore to the recent consistent state, we must


first write the information describing the modifications to
System log without modifying the database itself.

This helps us to remove the modification done by a failled


transaction.
Now we discuss some important recovery techniques.
BITS Pilani, Hyderabad Campus
Log-based Recovery

Database System Log:


Each log record describes a single database write operation
and contains the following details.

•Transaction name
•Data item name
•Old value
•New value

BITS Pilani, Hyderabad Campus


Types of log records

< Ti start> - indicates transaction Ti started

<Ti, Xj, V1, V2> - transaction Ti has performed a write


operation on data item Xj and value V1
before the write and will have value V2
after the write.
<Ti commit > - transaction Ti commits.

With these log records we have the ability to undo or redo a


modification that has already been output to the DB.

BITS Pilani, Hyderabad Campus


I. Deferred Database Modification

This technique ensures atomicity by recording all


database modifications(updates) in the log, but deferring
(postpone) the actual updates to the database until the
transaction commits.

As no data item is written before commit record of the


transaction, we need only new value. Hence we perform
only redo operation.
The redo (Ti) operation sets the value of all data items
updated by transaction Ti to the new values.

All new values will be found in the log records.


BITS Pilani, Hyderabad Campus
Redoing is needed when we have all modifications on log, and have
doubts about successful writing to the DB.
Ex. Log Database
< T1 starts >
<T1, A, 900>
<T1, C, 800>
<T1, commits>
A = 900
C = 800
<T2, start >
< T2, B, 700>
<T2 commit >
B = 700

On failure, a transaction need to be redone if and only if the log


contains both <start> and <commit> records.
Otherwise we don’t have to do anything.
BITS Pilani, Hyderabad Campus
Log Database
< T1 starts >
<T1, A, 900>
<T1, C, 800>
<T1, commits>
A = 900
C = 800
<T2, start >
< T2, B, 700>
<T2 commit >
B = 700

<T3, start >


< T3, C, 200>
//FAIL//

BITS Pilani, Hyderabad Campus


II. Immediate Database Modification
In this, database modifications to be output to the database while the
transaction is still in the active state.
If such is the case for incomplete transactions, on failure, undo
operation is needed and for committed transactions redo may be
required.
System Log Database
< T1 starts >
<T1, A, 600, 900>
A=900
<T1, C, 300, 800>
C=800
<T1, commits>
<T2, start >
< T2, B, 400, 700>
B = 700
<T2 commit >
BITS Pilani, Hyderabad Campus
System Log Database
< T1 starts >
<T1, A, 600, 900>
A=900
<T1, C, 300, 800>
C=800
<T1, commits>

<T2, start >


< T2, B, 400, 700>
B = 700
<T2 commit >

<T3, start >


< T3, C, 100, 200>
C=200
//FAIL//
BITS Pilani, Hyderabad Campus
Checkpointing

In case of failure, the log needs to be searched to


determine the transactions that need to be redone or
undone.

But this searching is time consuming and most of the time


the algorithm will redo the transactions which actually
written their updates to the DB, redoing them is waste of
time.

In order to reduce these types of overheads check pointing


is helpful.

BITS Pilani, Hyderabad Campus


< T1 start>
< T1, D, 20>
< T1 Commit>
T2 and T3 are ignored because they did not reach their
[check point]
commit point.
< T4 start>
< T4,B, 12> T4 is redone as its commit occurred after latest check
pointing.
< T4, A, 20>
< T4 Commit> T1 committed before the latest checkpointing hence
no action.
< T2 start>
< T2,B, 15>
< T3 start>
< T3,A , 35>
< T2,D, 25>

BITS Pilani, Hyderabad Campus


Sequence of actions in checkpointing

 Output all log records currently in main memory onto stable storage

 Output all modified buffer blocks to the disk.

 Output log record <check point> on to stable storage.

 During the recovery process the redo / undo operations for the
transactions will be considered which occur after or just before
latest <check point> record on log.

BITS Pilani, Hyderabad Campus


Shadow paging

This technique is an alternative to log-based recovery method. We


know that the database is partitioned into same fixed length blocks
called as pages. These pages need not be in a particular order on
disk. A page table is used to find the location of ith block.

In shadow paging technique, two page tables are used. The first is
current page table and the second is shadow page table. When a
transaction starts both page tables are identical. The shadow page
table is never changed during the execution of the transaction. The
current page table may be changed when a transaction performs a
write operation. All input and output operations use current page table.
When a block page is modified it is written onto different location on
the disk and the old block which contains older values exist and can be
accessed using the shadow page table.
This is sufficient to recover from the failure.
This technique doesn’t require a log and no redo/undo operations are
needed.
BITS Pilani, Hyderabad Campus
Shadow paging

Shadow Pages on Current


page table disk page table

BITS Pilani, Hyderabad Campus


Summary

 The importance of the recovery mechanism in a DBMS


 Various recovery strategies
 Log-based recovery scheme
 How Deferred and Immediate modification techniques
work
 The concept of Checkpointing in recovery
 How Shadow paging recovery technique works

BITS Pilani, Hyderabad Campus


Database Management Systems

BITS Pilani Dr.R.Gururaj


CS&IS Dept.
Hyderabad Campus
Conclusion to DBMS course

1. Introduction and Overview of DBMS


 Introduction to database systems
 Advantages
 Three schema architecture
 Data Independence
 Architecture
 Database users

BITS Pilani, Hyderabad Campus


2. Conceptual Database Design
(ER Modeling)
 Database Design process
 ER constructs
 Notations
 Class hierarchies

BITS Pilani, Hyderabad Campus


3. Relational Data model and Constraints
 Relations, tuples, and keys
 Integrity Constraints

4. Mapping from ER to Relational Schemas


Mapping Entities, Relations, Constrints
 Mapping Class hierarchies

BITS Pilani, Hyderabad Campus


5. Relational Algebra and Calculus
 Relational operators
 Join operation
 Grouping

6. SQL-99
 DDL
 DML
 Views in SQL

BITS Pilani, Hyderabad Campus


7. Functional Dependencies
 FDs
 Inference rules

8. Database Design and Normal Forms


 Rules for Normal forms
 Decomposition
 Lossless and Dependency preserving Decomposition

BITS Pilani, Hyderabad Campus


9. Storage and File structures
 Disk storage
 File and Record Organization

10. Hashing
 Internal Hashing
 Collision resolution
 Static and Dynamic eternal Hashing

BITS Pilani, Hyderabad Campus


11. Indexing
 Primary and Secondary Indexing
 Spare and Dense Indexing
 Multilevel Indexing
 B+ Tree Indexing

12. Transaction Model


 Advantages
 States
 Transaction Schedules

BITS Pilani, Hyderabad Campus


13. Concurrent Transactions
 Concurrent Transactions and Schedules
 Advantages and Disadvantages
 Serial and Serializable Scheduls
 Conflict Serializability

14. Concurrency Control


 Serializability
 Lock-based Protocols
 Timestamp-based protocols
 Deadlocks

BITS Pilani, Hyderabad Campus


15. Database Recovery
 Log-based Recovery
 Deferred and Immediate modification
techniques
 Checkpointing
 Shadow pagng

BITS Pilani, Hyderabad Campus


BITS Pilani, Hyderabad Campus
Course No: SEWP ZC322
Course Title: Database
Management Systems
Database State for COMPANY
ER DIAGRAM – COMPANY DATABASE

Entity Types:
EMPLOYEE,
DEPARTMENT, PROJECT,
DEPENDENT

Relationship Types:
WORKS_FOR,
MANAGES,
WORKS_ON,
CONTROLS,
SUPERVISION,
DEPENDENTS_OF
Database State for BLOOD BANK
ER DIAGRAM – COMPANY DATABASE

IDENTIFY THE FOLOWING:


• Entity
• Attributes
• Relationships
• Entity Integrity
• Referential Integrity
ER DIAGRAM – BLOOD BANK DATABASE
Database State for MERCHANT PAYMENT
PROCESSING
MERCHANTS
CUSTOMERS
ID TRANSACTIONS
ID
NAME ID
NAME
CODE REFERENCEID
ADDRESSLINE1
TYPE TYPE
ADDRESSLINE2
CANPROCESSSALE CARDNUMBER
CITY
CANPROCESSCREDIT CARDHOLDER
STATE
CUSTOMERID AMOUNT
ZIPCODE
ROWVERSION REQUESTDATE
CUNTRYCODE RESPONSEDATE
CONTACTNAME
ISAPPROVED
CONTACTEMAIL TERMINALS RESPONSECODE
CONTACTPHONE ID CUSTOMERID
SIZE CODE MERCHANTID
ROWVERSION CANPROCESSSALE TERMINALID
CANPROCESSCREDIT ROWVERSION
MERCHANTID
ROWVERSION

COUNRIES
CODE
NAME
ER DIAGRAM – MERCHANT PAYMENT
PROCESSING
Thank You
Course No: SEWP ZC322
Course Title: Database
Management Systems
BIOMETRIC SYSTEM
 A BioMetric system provides many benefits to organizations. It
enables an employer to have full control of all employees working
hours. It helps control labor costs by reducing over-payments,
which are often caused by transcription error, interpretation error
and intentional error. Manual processes are also eliminated as well
as the staff needed to maintain them.
 It is often difficult to comply with labor regulation, but a time and
attendance system is invaluable for ensuring compliance with labor
regulations regarding proof of attendance.
 Companies with large employee numbers might need to install
several time clock stations in order to speed up the process of
getting all employees to clock in or out quickly or to record activity
in dispersed locations. In the business world of today we all know
one simple truth…TIME IS MONEY! We work to keep the amount of
time it takes to complete even the simplest tasks down to the
minimum.
VEHICLE COMPARER FRAMEWORK
 Vehicle comparison is one of the most
happening comparison when people opt
and choose a type of vehicle.
 The Vehicle Comparer Framework allows
user to find the most economical models
available, or choose a specific vehicle and
decide on purchasing the vehicle.
 Vehicle Comparer Framework also allows
you to compare the performance, Engine
Details and running costs.
Thank You

Você também pode gostar