DBMS Tutorial

Database Management Systems
BITS Pilani Dr.R.Gururaj

CS&IS Dept.
Hyderabad Campus
Course Content
1. Introduction and Overview of DBMS

2. Conceptual Database Design (ER Modeling)
3. Relational Model
4. Relational Algebra and Calculus
5. SQL
6. Schema Refinement and Normal Forms
7. Disk Storage
8. Hashing and Indexing
9. Transaction Management and Concurrency
Control
10.Database Recovery
BITS Pilani, Hyderabad Campus

Books
1. R Ramakrishnan & J Gehrke, Database Management

Systems, Mc Graw Hill, 3rd Ed., 2003.
2. Elmasri, Ramez, Shamkant B. Navathe, Fundamentals of

Database Systems, Pearson Education, 5th Ed., 2007
3. Date C.J., An Introduction to Database Systems, Pearson, 8th

Ed., 2006.
4. Korth H F and A Silberschatz, Database System Concepts,

MGHISE, 3rd Ed., 1997.

Lecture Session-1
Introduction to DBMS
Content
 Database Systems
 DBMS
 Database System environment
 Traditional file systems for storing data
 Advantages of DBMS over traditional file systems

Introduction
Databases and Systems to manage them have become significant
components of any present day business of any nature.
These databases help businesses to perform their day-to-day

activities in an efficient and effective manner.
• Banking
• Travel ticket reservation
• Library catalog search
Here some program access the database.

Advances in technology have given raise to new concepts-
 Multimedia databases
 GIS
 Web data
 Data warehousing and mining
Data: Known fact that can be recorded and that has
implicit meaning.
Ex. Name, Tel_no, city etc.
This data can be stored in a file on a computer.
Database: Is a collection of related data.
 It is a collection of logically related data.
 A database is designed, built and populated with

data for a specific purpose.

DBMS
DBMS: Is a collection of programs that enables users to create and maintain
databases in a convenient and effective manner.
DBMS is a software system that facilitates the following:
1.Defining the database: This includes defining the structures, data types,
constraints, indexes etc.
Database catalog/Data dictionary/ called as Meta-data
2.Constructing the database: This means storing data into the database
structures and storing on some storage medium.
3.Manipulating database for various applications: This encompasses activities like

– querying the database, inserting new records into the database, updating some
data items, and deleting certain items from the database.
What is DBMS?
What is a Database System?

Traditional file systems for
storing the data
If we take the example of savings bank enterprise, information about
customers and savings accounts etc. need to be stored.
One way to keep the information on computers is to store in files

provided by operating systems (OS).
Disadvantages of the above System

 Difficulty in accessing data (possible operations need to be hard-
coded in programs).
 Redundancy leading to inconsistency.
 Inconsistent changes made by concurrent users.
 No recovery on crash.
 The security provided by OS in the form of password is not
sufficient.
 Data Integrity is not maintained.

Advantages of using DBMS
 Data independence
 Efficient data access
 Data integrity and security
 Data Administration
 Concurrent access and Crash recovery
 Reduced application development time

Disadvantages of DBMS
1. Extra cost due to SW, HW and training.

2. Not suitable or effective for certain applications (Real-time
constraints; well-defined limited operations)
3. Data manipulation not supported by Query languages.

Summary
 What is Data, Database, and DBMS

 Importance of DBMS
 Storing data in Traditional file systems
 Advantages of DBMS over traditional file systems


CS&IS Dept.
Hyderabad Campus
Lecture Session-2
DBMS Concepts
Content
 Describing and Storing data in DBMS

 Three schema Architecture
 Data independence
 Queries
 Transactions
 Structure of a DBMS
 People who work with DBMS

Describing and storing data in
DBMS
Data model
Is a collection of high-level data description constructs that hide many

low-level details.
DBMS allows a user to define the data to be stored in terms of a data model.
Semantic data models: More abstract high-level data models that make it easier
for a user to come up with a good initial description of the data in an enterprise.
Contain wide variety of constructs that help describe a real-world enterprise data.
Ex. ER model
Representational / Implementation data models:
These are DBMS specific data models and are built around just few basic constructs.
Ex. Relational data model, Object data model
A database design in terms of a semantic model serves as a useful starting point and
is subsequently translated into a database design in terms of the data model the
DBMS supports.

Relational Model:
The central data description construct in this model is a relation,

which can be thought of as a set of records.
Schema: Description of data in terms of a data model is called a schema.

A relation schema specifies the name of the relation, field, type etc.
Ex. Student (sid: string; name: string; age: integer)
every row follows the schema of the relation.

Instance of a relation:
Student
sid name age
A120 Raju 21
A134 Kiran 19
C110 John 22
A schema can be regarded as a template for describing a student.
We can specify integrity constraints which are conditions that need to be
satisfied by records in the relation. Ex. uniqueness

The following are some important representational data models (DBMS Specific)
1. Network Model: Though the basic structure is a record,

the relationships are captured using links.
The database can be seen as an arbitrary network of records connected by links.
Ex.: GE’s Integrated Data store (IDS), in Early 1960s
2. Hierarchical Model: The records containing data are organized
as a collection of trees. Ex.: IBMs IMS (Information Management System),
in late 1960s
3. Relational Model: (early 1970s)Data & relationships are captured as tables & keys.
Ex.: Oracle, IBMs DB2, MySQL, Informix, Sybase, MS Access, Ingress, MySQL etc.
The basic storage structure is a record.
4. Object Data Model: Objects created through object–oriented programs
can be stored in database.
Ex.: Object Store
5. Object Relational Model: Objects can be stores in tables.
Ex.: Oracle, Informix

Database Schema
Database Schema: Description of a database is called as database Schema
Three-Schema Architecture
A database can be described using three different levels of abstractions.
Description at each level can be defined by a schema. For each abstraction we
focus on one of the specific issues such as user views, concepts, storage etc.
1. External schema: Used to describe the database at external level.

Also described in terms of the data model of that DBMS. This allows data
access to be customized at the level of individual users/groups/applications.
Any external schema has one or more views and relations from the conceptual
schema. This schema design is guided by end user requirements.
2. Conceptual schema (logical schema) Describes the stored data in terms of the
data model specific to that DBMS. In RDBMS conceptual schema describes
all relations that are stored in the database. Arriving at good choice of
relations, fields and constraints is known as conceptual database design.
3. Physical schema: Describes the physical storage strategy for the database.

Three Schema Architecture
External Schema 1 External Schema 2 External Schema 3
External Level
Conceptual Level Conceptual Schema
Physical Schema
Physical/Internal
Level
Storage
Three schema architecture of

DBMS

Data Independence
Data Independence:
The three-level architecture which is the result of the three-level
abstraction on database, leads to data independence.
1. Logical data independence: changes in conceptual level schema

should not affect the application level or external level schemas.
2. Physical data independence: The changes in physical features of

storage, i.e., changes to the physical storage format should not affect
schema at conceptual level.
The above data independence is one of the important advantages of

DBMS.
The DBMS stores the description of schemas as System catalog.

Queries
Queries in RDBMS
The ease with which information can be obtained from a database
often determines its value to the user.
RDBMS allows users to pose a rich class of questions in the form of

queries.
Relational data model has powerful query languages:
Formal query languages (based on strong mathematical logic)

Relational algebra
Relational Calculus
Commercial query language
SQL

Transactions
Transaction Management
A transaction is a collection of operations that perform a single logical
operation or function in a database application.
Each transaction is a unit of atomicity.
A transaction is an atomic unit of work that is either completed in its

entirety or not done at all.
Concurrent Transactions
Incomplete Transactions and system crash

For recovery purposes, the system needs to keep track of
when the transaction starts, terminates, and commits or aborts.

DBMS Structure
W e b fo r m A p p lic a t io n SQL
fr o n t e n d in te r fa c e
SQL
C om m and
Q u e r y E n g in e
T r a n s a c t io n
M anager
R e c o v e ry
B u ffe r / D is k / F ile M anager
L ock M anager
M anager
C o n c u rre n c y
c o n tro l m a n a g e r
DBMS
In d e x file s
/s y s te m
c a ta lo g /d a ta
b lo c k s

People who work with DBMS
 Database Implementers
 End users
 Application Programmers
 Database administrator (DBA)
DBA’s role:
1. Design of physical & Conceptual schemas
2. Security and authorization
3. Data availability , recovery and backup
4. Database tuning- modifying the schemas to meet the
requirements

Summary
 How data is described in a DBMS
 What is a data model
 What is a schema
 What is three schema architecture of a DBMS
 What is data independence
 Queries and Query languages
 Transaction management
 Components of a DBMS
 People working with DBMS

Contents
1. Mapping ER to Relational model
2. Mapping EER to Relational model
1 CSC 352/ISC 332 Dr.R.Gururaj BITS Pilani, Hyderabad Campus

ER-to- Relational Mapping
Step 1: Mapping of Regular Entity Types.

 For each regular (strong) entity type E in the ER schema, create
a relation R that includes all the simple attributes of E.
 Choose one of the key attributes of E as the primary key for R.
 If the chosen key of E is composite, the set of simple attributes
that form it will together form the primary key of R.
Example: We create the relations EMPLOYEE, DEPARTMENT, and

PROJECT in the relational schema corresponding to the regular
entities in the ER diagram.
 SSN, DNUMBER, and PNUMBER are the primary keys for the
relations EMPLOYEE, DEPARTMENT, and PROJECT as shown.

M int LNam e
FN a m e Num ber
Add ress W o rk s_fo r
Se x (1, 1 ) (4, N ) N am e
L oca tio n
N am e Salar y
Em ploye e d ep artm ent
EM PL O Y EE Num ber of D EPA R T ME N T

Ssn E mp lo yee
(0 , 1 ) Start Date
Bd a te M a na ge r (1 , 1 )
Co ntro llin g (0 , N)
(0, N ) de partm en t
M A N A GE S
(0, 1 )
S up erviso r Su pe rv iso r H o urs
C O N T R O LS
S U P ER V IS IO N
(0, N) W O R K S_O N
C on trolled
Em p loye e (1 , 1 )
(1 , N ) proje ct
P ro ject
D E PE N D E N T S_ OF
PR O JE C T
Nam e
D e pe nd ent N u m be r
(1, 1)
Lo ca tio n
D EP EN D EN T
R e la tio ns hi p
Nam e Se x B d ate
CSC 352/ISC 332 Dr.R.Gururaj BITS Pilani, Hyderabad Campus

Step 2: Mapping of Weak Entity Types
 For each weak entity type W in the ER schema with owner entity type
E, create a relation R & include all simple attributes (or simple
components of composite attributes) of W as attributes of R.
 Also, include as foreign key attributes of R the primary key attribute(s)
of the relation(s) that correspond to the owner entity type(s).
 The primary key of R is the combination of the primary key(s) of the
owner(s) and the partial key of the weak entity type W, if any.
Example: Create the relation DEPENDENT in this step to

correspond to the weak entity type DEPENDENT.
 Include the primary key SSN of the EMPLOYEE relation as a foreign key
attribute of DEPENDENT (renamed to ESSN).
 The primary key of the DEPENDENT relation is the combination {ESSN,
DEPENDENT_NAME} because DEPENDENT_NAME is the partial key of
DEPENDENT.

Step 3: Mapping of Binary 1:1 Relation Types
For each binary 1:1 relationship type R in the ER schema, identify the relations S
and T that correspond to the entity types participating in R.
There are three possible approaches:

1. Foreign Key approach: Choose one of the relations-say S-and include a foreign
key in S that refers to the primary key of T. It is better to choose an entity type
with total participation in R in the role of S.
• Example: 1:1 relation MANAGES is mapped by choosing the participating
entity type DEPARTMENT to serve in the role of S, because its
participation in the MANAGES relationship type is total.
2. Merged relation option: An alternate mapping of a 1:1 relationship type is
possible by merging the two entity types and the relationship into a single
relation. This may be appropriate when both participations are total.
3. Cross-reference or relationship relation option: The third alternative is to set
up a third relation R for the purpose of cross-referencing the primary keys of
the two relations S and T representing the entity types.

Step 4: Mapping of Binary 1:N Relationship Types.
 For each regular binary 1:N relationship type R, identify the
relation S that represent the participating entity type at the N-
side of the relationship type.
 Include as foreign key in S the primary key of the relation T
that represents the other entity type participating in R.
 Include any simple attributes of the 1:N relation type as
attributes of S.
Example: 1:N relationship types WORKS_FOR,
CONTROLS, and SUPERVISION in the figure.
 For WORKS_FOR we include the primary key DNUMBER of
the DEPARTMENT relation as foreign key in the EMPLOYEE
relation and call it DNO.

M int LNam e
FN a m e Num ber
Add ress W o rk s_fo r
Se x (1, 1 ) (4, N ) N am e
L oca tio n
N am e Salar y
Em ploye e d ep artm ent
EM PL O Y EE Num ber of D EPA R T ME N T

Ssn E mp lo yee
(0 , 1 ) Start Date
Bd a te M a na ge r (1 , 1 )
Co ntro llin g (0 , N)
(0, N ) de partm en t
M A N A GE S
(0, 1 )
S up erviso r Su pe rv iso r H o urs
C O N T R O LS
S U P ER V IS IO N
(0, N) W O R K S_O N
C on trolled
Em p loye e (1 , 1 )
(1 , N ) proje ct
P ro ject
D E PE N D E N T S_ OF
PR O JE C T
Nam e
D e pe nd ent N u m be r
(1, 1)
Lo ca tio n
D EP EN D EN T
R e la tio ns hi p
Nam e Se x B d ate

Step 5: Mapping of Binary M:N Relationship Types.
 For each regular binary M:N relationship type R, create a new relation
S to represent R.
 Include as foreign key attributes in S the primary keys of the relations
that represent the participating entity types; their combination will form
the primary key of S.
 Also include any simple attributes of the M:N relationship type (or simple
components of composite attributes) as attributes of S.
Example: The M:N relationship type WORKS_ON from the ER
diagram is mapped by creating a relation WORKS_ON in the
relational database schema.
 The primary keys of the PROJECT and EMPLOYEE relations are included as
foreign keys in WORKS_ON and renamed PNO and ESSN, respectively.
 Attribute HOURS in WORKS_ON represents the HOURS attribute of the
relation type. The primary key of the WORKS_ON relation is the
combination of the foreign key attributes {ESSN, PNO}.

Step 6: Mapping of Multivalued attributes.
 For each multivalued attribute A, create a new relation R.
 This relation R will include an attribute corresponding to A, plus the
primary key attribute K-as a foreign key in R-of the relation that
represents the entity type of relationship type that has A as an attribute.
 The primary key of R is the combination of A and K. If the multivalued
attribute is composite, we include its simple components.
Example: The relation DEPT_LOCATIONS is created.
 The attribute DLOCATION represents the multivalued attribute
LOCATIONS of DEPARTMENT, while DNUMBER-as foreign key-represents
the primary key of the DEPARTMENT relation.
 The primary key of R is the combination of {DNUMBER, DLOCATION}.

Step 7: Mapping of N-ary Relationship Types.
 For each n-ary relationship type R, where n>2, create a new
relationship S to represent R.
 Include as foreign key attributes in S the primary keys of the
relations that represent the participating entity types.
 Also include any simple attributes of the n-ary relationship type
(or simple components of composite attributes) as attributes of
S.
Example: The relationship type SUPPY in the ER on the next
slide.
 This can be mapped to the relation SUPPLY shown in the relational
schema, whose primary key is the combination of the three foreign
keys {SNAME, PARTNO, PROJNAME}

EER-to- Relational Mapping

EER-to- Relational Mapping
Step8: Options for Mapping Specialization or Generalization.

Convert each specialization with m subclasses {S1,
S2,….,Sm} and generalized superclass C, where the
attributes of C are {k,a1,…an} and k is the (primary) key,
into relational schemas using one of the four following
options:
• Option 8A: Multiple relations-Superclass and subclasses
• Option 8B: Multiple relations-Subclass relations only
• Option 8C: Single relation with one type attribute
• Option 8D: Single relation with multiple type attributes

Create a relation L for C with attributes Attrs(L) = {k,a1,…an} and PK(L)
= k. Create a relation Li for each subclass Si, 1 < i < m, with the
attributesAttrs(Li) = {k} U {attributes of Si} and PK(Li)=k. This option
works for any specialization (total or partial, disjoint of over-lapping).
Create a relation Li for each subclass Si, 1 < i < m, with the attributes
Attr(Li) = {attributes of Si} U {k,a1…,an} and PK(Li) = k. This option only
works for a specialization whose subclasses are total (every entity in
the superclass must belong to (at least) one of the subclasses).

Option 8A: Multiple
relations-Superclass
and subclasses

Option 8B: Multiple
relations-Subclass
relations only

Option 8C: Single
relation with
one type attribute
EMPLOYEE
Ssn Fname Minit Lname Birthdatae Address jobtype Typing speed Tgrade EngType

Option 8D: Single
relation with multiple
type attributes
EMPLOYEE
Part_no Descr Mflag Drawing_no Batch_no Man_date Pflag supp_name list_price

ER-Relational mapping for
Company Database

Exercise 1
Explain how you would map the following EER/ER

Constructs to Relational model. Give simple
examples.
Mapping specialization.
Mapping 1:1 binary relationship, where one entity

type has total participation, and the other entity
type has partial participation.
Mapping Complex attribute of an entity type.
Ternary (3-ary) relationship.

Summary
We learnt ER/EER to relational mapping.


CS&IS Dept.
Hyderabad Campus
Lecture Session-3
Conceptual Database Design (ER Modeling)
Content
 Steps in Database Design Process

 ER Concepts (Entities, Attributes, Associations etc.)
 ER Notations
 Class Hierarchies
 Conceptual modeling using UML

Major Steps in Database
Design Process
Requirement analysis
Understanding the domain
Identifying the data to be stored
Identifying the constraints
Conceptual Database design
E-R modeling/UML
Logical Database Design
Designing tables and relationships
Refinement of schema
Physical database design
Indexing
Clustering
 Storage formats

ER Modeling
ER Model is a popular high-level (conceptual) data model.
It is an approach to designing Semantic Conceptual schema of a Database.
ER model allows us to describe the data involved in a real-world environment in
terms of objects and their relationships, which are widely used in design of
database.
ER model provides preliminary concepts or idea about the data representation
which is later modified to achieve final detailed design.
Important concepts/notions used in ER modeling are-
Entity is an object in real-world or some idea or concept which can be

distinguished from other objects.
Ex.: person, school, class, department, weather, salary, temperature etc.
Entity has independent existence.
Each entity belongs to an Entity type that defines the structure.
Entity Set is a Collection of similar objects.

Concepts used in ER
Attribute: reflects a property of an object or entity. We have following
types of attributes.
> Simple attribute
> Composite attribute
> Single valued attribute
> Multi-valued attribute
> Derived attribute
> Stored attribute
Candidate Key (simply called a key): Is an Attribute of an entity type
whose value can uniquely identify an entity in a set.
Primary key: one of the candidate keys can become PK of an entity

type.
Alternate keys: The candidate keys other than the PK, are known as
alternate keys.

Concepts used in ER
Relationship: The association between entities is known as

relationship.
Domain of an attribute: The set of possible values is known as domain
of an attribute

Notations used in ER
Notations used in ER modeling are shown below.
Entity Type
Weak Entity Type
Relationship Type
Identifying Relationship type
Attribute

Key Attribute
Multivalued Attribute
Composite Attribute
Derived Attribute

R Total Participation of E1 in R
E1 E2
Cardinality ratio 1; N for E1; E2 in

1 N
E1 R E2 R
(min, max) Structural Constraint (min, max)

R E on Participation of E in R

Relationships in ER
Relationships
Manager Manages Employee
Manages
Manager Employee
Degree of a Relationship
Manages
• If there are two entity types involved it is a binary relationship type Manager Employee
• If there are three entity types involved it is a ternary relationship type

Sales Sells
Product
Assistant
• Unary relationships are also known as a recursive relationship Customer

Manages
Employee
• It is possible to have n-ary relationship (e.g. quaternary or unary)

Relationships in ER
Cardinality of a relationship
Relationships are rarely one-to-one.

For example, a manager usually manages more than one employee.
This is described by the cardinality of the relationship, for which there are four possible categories.
1 is married to 1
One to one (1:1) relationship Man Women
1 Manages m
Manager Employee
One to many (1:M) relationship
m 1
Studies
Many to one (M:1) relationship Student Course
m n
Teaches
Many to many (M:N) relationship Lecturer Student

Relationships in ER
Participation Constraint
If all the entities of an entity type are involved in the relationship then that entity type’s
involvement said to be total in that relationship. In the below relationship if each employee is
associated with at least one dept. Then the participation of EMP is total. Here, EMP works for
DEPT.
If, only few entities of the set are involved the participation is partial.
E1 R E2
EMP DEPT Association Role:

Worker Employer
* 1
Multiplicity
EMP DEPT
Association Name & Direction:

Works_for
EMP DEPT

ER Diagram for the Company DB
schema, with all role names
Mint LName
FName Number
Address Works_for
Sex (1, 1) (4, N) Name

Location
Name Salary
Employee department
EMPLOYEE Number of DEPARTMENT

Ssn Employees
(0, 1) Start Date
Bdate Manager (1, 1)
Controlling (0, N)
(0, N) department
MANAGES
(0, 1)
Supervisor Supervisor Hours
CONTROLS
SUPERVISION (0, N) W ORKS_ON

Controlled
Employee (1, 1)
(1, N) project
Project
DEPENDENTS_OF
PROJECT
Name
Dependent Number
(1, 1)
Location
DEPENDENT
Relationship
Name Sex Bdate

Class Hierarchies
Some times it is natural to classify entities in a set into subclasses.
eid name age
Employee
IS A
No_Hrs Cid
Hourly_Emp Contract_Emp
Specialization : Employee is specialized into Hourly_emp and Contract_emp
Generalization: Hourly_emp and Contract_emp are generalized by Employee

UML for Conceptual data
modeling
We can model a database at conceptual level using UML.
UML constructs can be drawn as diagrams.
It encompasses broader spectrum of software design process than
ER modeling.
We can do:
 Business modeling (describe the business process involved in the SW)
 System modeling (specify requirements)
 Conceptual database modeling (like ER)
 Physical DB modeling (model indexes and table spaces)
 Hardware System modeling (describe hardware system configuration)
Class diagrams can be used to describe the database at conceptual level, like ER
diagrams.

Summary
 Various steps in database design process
 What is ER modeling
 Concepts and notations used in ER
 Class hierarchies in ER
 Use of UML for Conceptual database modeling


CS&IS Dept.
Hyderabad Campus
Lecture Session-4
Relational Data Model & Relational Constraints
Content
1. What is Relational model

2. Characteristics
3. Relational constrains
4. Representation of schemas

Relational Model
Edgar Codd proposed Relational Data Model in 1970.
It is a representational or implementation data model.
Using this representational (or implementation) model we

represent a database as collection of relations.
The notion of relation here is different from the notion of

relationship used in ER modeling.
Relation is the main construct for representing data in

relational model.
Every relation consists of a relation schema and Relation
instance.

R (A1, A2, A3,……,An)
Relation Schema is denoted by
Relation name Attribute list
The number of columns in a relation is known as its degree or arity’.
Relation instance or Relation State (r) of R (thought of as a table)

Each row in the table represents a collection of related data.
Each row contains facts about some entity of some entity-set.
R = (A1, A2, A3,……., An)

r(R) is a set of n tuples in R
r = {t1, t2, t3,…….,tn}
r is an instance of R each t is a tuple and is a ordered list of values.
t = (v1 , v2 ,…, vn ) where vi is an element of domain of Ai

Entities of each type/set are stored as rows in a single relation.
Hence in general, a relation corresponds to a single entity type in

ER.
In some cases a relationship between two entities can have some
specific attributes which can be captured in a relation (table).
A row is called a tuple.

The columns of the table represent attributes of that entity-set.
The column header is known as attribute or field.

Data type or format of an attribute: is the format of data for that
attribute. Ex. Character strings, numeric, alphanumeric etc.
Values that can appear in a column for any row is called the
domain of that attribute.
Relational Database Schema is denoted by
S ={R1, R2, ……,Rn)
Database Relations in the

name database (tables)

Attribute A of relation R is accessed by notation- R.A.
Ex: Student (name, age, branch). Here Student is the relation name.
Student.age - denotes age attribute of Student relation.
Characteristics of a Relation:
Ordering of tuples is not significant.
Ordering of values in a tuple is important.
Values in a tuple under each column must be atomic (simple & single).

Relational Model Terminology
Informal Terms Formal Terms

Table Relation
Column Header Attribute
All possible Column Domain
Values
Row Tuple
Table Definition Schema of a Relation
Populated Table State of the Relation

Relational Constraints
Constraints are restrictions on data of a relation.
Domain level Constrains – Format of data Ex. Character numeric

etc.
Semantic – Not NULL etc.
Entity integrity constraints – Primary key, unique key
Referential integrity constraints– Foreign key
Dependencies –
Functional dependency : What attributes value defines the value of
another attribute is known as dependency.
This concept is used in database design.

Referential integrity
The Referential Integrity constraint is specified between two relations and is used to
maintain consistency among the tuples of two relations. FK
PK
R1 R2
a b c p q r
c in R1 refers to p in R2
The FK attribute R1 has the same domain as the primary key attribute of R2.
The attribute c in R1 is said to reference the attribute p in R2.
The value of FK in a tuple t of R1 either occurs as a value under p in R2 for some

tuple, or is a NULL.
R1  is known as referencing relation

R2 is known as referenced relation
Constraints can be specified while defining the structure & also as triggers.

Relational Schema
Representation

Relational Schema
Representation

Operations on Relations and
constraints
The following table indicates the constraints need to be taken care while
performing certain operations on a relation.
Operation on relations Constraints need to be

taken care
Insert Null, Not Null, PK, unique, FK,
format, Domain
Delete FK
Update Null, Not Null, PK, unique, FK,

domain, and Semantic

Actions need to be taken when FK is set , on operations like
update, insert, and delete. FK
PK
R1 R2
a b c p q r
c in R1 refers to p in R2
If insert a tuple in R1 where the value for c is not in p of R2, then don’t allow.
What if a tuple in R2 is deleted: Cascade, don’t allow, set to default, set to null.
What if update on R2’s p happens:

Cascade, don’t allow, set to default, set to null.

Summary
 What are basics of relational model
 Relation instance
 Relational data constraints
 Referential integrity
 Relational scheme representation


CS&IS Dept.
Hyderabad Campus
Lecture Session-5
ER to Relational Mapping
Content
1. Mapping Regular Entity types

2. Mapping Weak Entity types
3. Mapping 1:1 Relationships
4. Mapping 1:N Relationships
5. Mapping N:M Relationships
6. Mapping Multivalued attributes
7. Mapping ternary relationships
8. Mapping Class Hierarchies

Mapping entity types
1. Mapping of Regular Entity Types.
 For each regular (strong) entity type E in the ER schema, create
a relation R that includes all the simple attributes of E.
 Choose one of the key attributes of E as the primary key for R.
 If the chosen key of E is composite, the set of simple attributes
that form it will together form the primary key of R.

2. Mapping of Weak Entity Types
 For each weak entity type W in the ER schema with owner
entity type E, create a relation R & include all simple
attributes (or simple components of composite attributes)
of W as attributes of R.
 Also, include as foreign key attributes of R the primary key
attribute(s) of the relation(s) that correspond to the owner
entity type(s).
 The primary key of R is the combination of the primary
key(s) of the owner(s) and the partial key of the weak
entity type W, if any.

Mapping Relationship types
3. Mapping of Binary 1:1 Relation Types
For each binary 1:1 relationship type R in the ER schema, identify

the relations S and T that correspond to the entity types
participating in R.
There are three possible approaches:

1. Foreign Key approach: Choose one of the relations-say S-and include a
foreign key in S that refers to the primary key of T. It is better to choose
an entity type with total participation in R in the role of S.
2. Merged relation option: An alternate mapping of a 1:1 relationship
type is possible by merging the two entity types and the relationship
into a single relation. This may be appropriate when both
participations are total.
3. Cross-reference or relationship relation option: The third alternative
is to set up a third relation R for the purpose of cross-referencing the
primary keys of the two relations S and T representing the entity types.

4. Mapping of Binary 1:N Relationship Types.
 For each regular binary 1:N relationship type R, identify the
relation S that represent the participating entity type at the N-
side of the relationship type.
 Include as foreign key in S the primary key of the relation T
that represents the other entity type participating in R.
 Include any simple attributes of the 1:N relation type as
attributes of S.

5. Mapping of Binary M:N Relationship Types.
 For each regular binary M:N relationship type R, create a new
relation S to represent R.
relations that represent the participating entity types; their
combination will form the primary key of S.
 Also include any simple attributes of the M:N relationship
type (or simple components of composite attributes) as
attributes of S.

Mapping Multivalued
attributes
6. Mapping of Multivalued attributes.
 For each multivalued attribute A, create a new relation R.
 This relation R will include an attribute corresponding to A, plus the
primary key attribute K-as a foreign key in R-of the relation that
represents the entity type of relationship type that has A as an attribute.
 The primary key of R is the combination of A and K. If the multivalued
attribute is composite, we include its simple components.

Mapping n-ary relationships
7. Mapping of N-ary Relationship Types.

 For each n-ary relationship type R, where n>2, create a new
relationship S to represent R.
relations that represent the participating entity types.
 Also include any simple attributes of the n-ary relationship type
(or simple components of composite attributes) as attributes of
S.
Example: The relationship type SUPPY in the ER on the next
slide.
 This can be mapped to the relation SUPPLY shown in the relational
schema, whose primary key is the combination of the three foreign
keys {SNAME, PARTNO, PROJNAME}

Mapping Class hierarchies

Mapping Class hierarchies
8. Options for Mapping Specialization or Generalization.


Option 8A: Multiple
relations-Superclass
and subclasses

Option 8B: Multiple
relations-Subclass
relations only

ER-Diagram for Company
Database
M int LN am e
FN am e N um ber
A ddress W o rks_fo r
Sex (1, 1) (4, N ) N am e

Location
N am e S alary
E m ployee departm ent
EM PLO YE E N um ber of D EP A R T M EN T
Ssn
E m ployee
(0, 1) S tart D ate
Bdate M anager (1, 1)
C ontrolling (0, N )
(0, N ) departm ent
M A N A G ES
(0, 1)
S upervisor S upervisor H ours
C O N T R O LS
S U P E RV IS IO N (0, N ) W O RK S _O N
C ontrolled
E m ployee (1, 1)
(1, N ) project
P roject
D E P E N DE N T S _O F
PR O JEC T
N am e
D ependent N um ber
(1, 1)
Location
D EP EN D E N T
R e la tio nship
N am e Sex B date

ER-Relational mapping for
Company Database

Summary
 We have learnt the rules and guidelines for mapping ER to
Relational model.
 Rules for mapping Entity types
 Rules for mapping Relationships
 Rules for mapping Class hierarchies


CS&IS Dept.
Hyderabad Campus
Lecture Session-6
Relational Algebra & Relational Calculus
Content
 Query languages & Formal query languages for Relational data model
 Introduction to Relational Algebra
 Relational operators
 Set operators
 Join operators
 Aggregate functions
 Grouping operator
 Relational Calculus concepts

Query Languages for
Relational data model
Querying means extracting data from the database for the purpose of
processing it.
Every data model has some formal query languages to support

specification of data retrieval and manipulate requests.
Formal query languages

1. Relational Algebra
2. Relational Calculus
(a) Tuple Relational Calculus
(b) Domain Relational Calculus
Commercial query languages
1. Structured Query Language (SQL)
2. Query by Example (QBE)

Introduction to
Relational Algebra
Relational Algebra is a formal query language for relational data model.
A basic set of relational model operations constitute the relational

algebra.
These operations enable the user to specify basic data retrieval
requests.
The result of a relational algebra query is also a new relation which may
have been formed from one or more relations.
A sequence of relational algebraic operations forms a relational

algebraic expression, whose result is also a relation.

Operations in
Relational Algebra
A. Set Operations
o Union,
o Intersection,
o Difference,
o Cartesian product.
B. Relational Operations
o Select,
o Project,
o join,
o Division etc.

Select Operation: is to select subset of tuples that satisfy some
selection condition.
Symbol used is  (sigma)
Ex:  dno 4 (EMP)

The above expression selects all tuples from EMP table, where the
value of the column ‘dno’ is 4.
The general form of ‘select’ clause is  <select condition> ( R )

Projection Operation:
Selects certain columns. Symbol is ( pi)
 name, age, dno(EMP)
Selects columns name, age, dno for all tuples from the table EMP
Note:
We can apply the expressions in sequence or we can nest them in
single expression.
Ex.:
 name, age  dno=5 ( EMP) 
The above expression selects name and age of employees
working with dno 5.
The above query can also be written as
R1   dno=5 ( EMP )
R2   name,age ( R1 )
R1 & R2 are the names given to intermediate results(relations).

Union:
If two relations R1 & R2 are compatible ( i.e., have same type of tuples)
then we can merge them by union operation.
Duplicate tuples are eliminated. Ex: (R1  R2).
Intersection R1  R2
Only equivalent tuples from R1 & R2 are selected.
Difference R1 – R2
Only those tuples seen in R1 and not seen in R2 are selected.
Note: (R1-R2) is not same as (R2-R1)

 a11 a12  no rows = 3 Rows=3
 b11 b12 
R1  a21 a22  no Columns -2 Column=2
R2 b21 b22 
 a31 a32  b31 b32 
(R1  R2) =  a11 a12 b11 b12 

a a12 b21 b22 
 11
 a11 a12 b31 b32 
 
 a21 a22 b11 b12 
No of rows = 3  3 = 9  a21 a22 b21 b22 
 
No Columns = 2 + 2 = 4  a21 a22 b31 b32 
a a32 b11 b12 
 31 
 a31 a32 b21 b22 
a a32 b31 b32 
 31
Cross product or Cartesian product

Rename operator
(rho)
Ex:  S (b ,b ,b ) ( R)
1 2 3
Renames R to S and new names of attributes are b1, b2,

b3
 S ( R)
Renames R to S with same attribute names

Division ()
Used when we want to check the meeting of all the criteria
Let R(A, B) and S(A) TRS
Selects all values for B column in R which contains all values under A
in S.
Hence the no. column in T is only B.
Join: ( ) Used to join tuples from different tables based on same

condition. Result is new tuple with different arity.
D  DEPT Mgtssn = ssn EMP
Joins tuples from DEPT & EMP where Mgrssn in DEPT is equal to ssn
in EMP and stores the new tuples in relation D.

Theta Join: Joining on some condition with comparison that involves
operator like (=, >, , , ) etc.
Equijoin is a special type of join where the join condition is ‘=’ (equals
operator) only.
Natural Join: is an equijoin on attributes in R and S having the same

name.
In the resulting relation only one column is listed.
Ex. D  DEPT *EMP.
The joining is on common attribute with same name (Dept Name).

Employee Dept Employee * Dept
Name EmpID Dept Name Dept Name Manager Name EmpID Dept Name Manager
Harry 3415 Finance Finance George Harry 3415 Finance George
Sally 2241 Sales Sales Harriet Sally 2241 Sales Harriet
George 3401 Finance George 3401 Finance George
Harriet 2202 Sales Production Charles Hariet 2202 Sales Harriet

Inner Join (R S) An inner join only combines tuples from R and S if
they meet the conditions. Tuples that do not meet the conditions are not
showed in the final result. (This is the usual type of join).
Outer join: An outer join displays the tuples of one of the relations even if
there is no match for the tuple in the other relation.
Left outer join: (R S) In the result relations, in addition to all the
matching tuples from R and S, all the remaining tuples from left side
relation (R) are also showed. For these tuple from R, columns under S will
have null values(padding).
Right Outer Join: (R S) In the result relation, matching tuples will
occur from R & S. In addition all the tuples form S will also appear with
null values for the R attributes.
Full Outer join (R S) In the result, all tuples from R & S will appear
with null values for the other relations attributes.

Additional Relational Operations
Aggregate functions: Sum Average Max Min Count
Grouping:
The tuples of a relation are first grouped by the value of some attribute and
then aggregate functions are applied on individual groups.
Symbol use is – £
Ex. Dno £Count (ssn) (EMP)
The above expression first group the tuples in EMP table based on Dno, and
then applies count function on individual groups this will output no. of
employees in each department.
Dno Count (SSn)
Result relation 

Company Database Schema (set of tables/relations)

1. Get the list of employee IDs who have no dependents.
It is equivalent to:
{ {set of all employees} - {set of employees with
Dependents} }
R1  Π ssn (Employee)
R2  Π essn (Dependent)
Result  R1- R2

2. Get the list of employee IDs who have more than two
dependents.
R1  essn £ COUNT Dependent_name (Dependent)
Result  Π essn (σ COUNT_Depenedent_name>2 (R1))
R1
essn Count_Dependent_name
101 3
102 1

3. Get the list of projects controlled by department with
name “ACCOUNTS”.
R1  σ Dname=‘ACCOUNTS’ (Department)
Result  Π pnumber, pname (Project Dnum=‘Dnumber (R1))

4. Get the list of employee IDs working on all projects
A B B A
a1 b1 b1 a1
R1  Πessn, pno(Works_on) a1 b2 b2 a4
a1 b3 b3
R2  Πpnumber(Project) a2 b2
a2 b3
Result  R1 ÷ R2 a3 b1
a3 b2
a3 b3
a4 b3

5. Find the projects controlled by departments located in
Mumbai.
R1  ΠDnumber(σ Dlocation=‘Mumbai’ (Dept_locations)
R2  Π pnumber, pname (Project Dnum=‘Dnumber (R1))

Tuole Relational Calculus
Relational Calculus is a formal query language for relational model
where we write one declarative expression to specify a retrieval
request and hence there is no description of how to evaluate the query.
A calculus expression specifies what is to be retrieved rather than how

to do it.
Hence, relational calculus is non–procedural language where as

relational algebra discussed in the previous section is procedural,
where we write sequence of operations to retrieve data.
Any expression for data retrieval written in relational algebra can also
be written in relational calculus and vice-versa.
Hence expression power of relational algebra and relational calculus is

same.
Tuple Rational Calculus(TRC) is based on specifying a number of tuple
variables.
Each tuple variable usually ranges over a particular database relation.

Variables can take values of individual tuples from the relation.
A simple relational calculus query is in the form-
{t | condition (t)}
t – tuple variable
condition (t) – is a conditional expression involving t.
Result is a set of all tuples that satisfy the conditions specified in

condition (t).

Ex. Find all employees whose salary is above 50,000
{t | EMP (t) and t. salary > 50,000}
Selects all tuples from EMP such that for each tuple selected, the salary
value is > 50,000.
The expression EMP(t) specifies from where the tuple t must be chosen.
Hence EMP relation in this case is known as a range relation.
Note: The above query retrieves all the attributes of relation EMP.

The universal (), and existential () quantifiers can be
applied to tuples.
Ex.:
{t.name, t age | EMP (t) and (d) (Dept (d)and d.dname =
‘Research’ and d.dno = t.dno )}
To retrieve the name and age of all employees who work

for ‘Research’ department.
If the tuple variable t occurs with  or  quantifiers the

variable is known as bound variable and otherwise called
as free variable.

Safe Relation Calculus Expression
Is one that guarantees to yield a finite set of tuples as result.
Ex. {t | not (EMP (t))}
Is unsafe because it yields all tuples in the universe that are

not in EMP relation, which are infinitely numerous.
An expression is safe if all values in its result are from the

domain of the expression.

Relational Completeness:
This notion is used to compare high level query

languages.
Any relational query language L is considered to be

relationally complete if we can express in L any query that
is expressed in relational algebra (RA) or relational
calculus (RC).

Summary
 What is a query language
 Formal query languages for Relational data model
 Basic concepts of Relational Algebra
 Operations in Relational Algebra
 Relational Calculus
 Examples


CS&IS Dept.
Hyderabad Campus
Lecture Session-7
Structured Query Language (SQL)-1
Content
 Introduction to SQL
 Features of SQL
 DDL Statements
 DML commands

Introduction to SQL
SQL (Structured Query language) is the most widely used

commercial query language for relational databases.
SQL was introduced by IBM(1970).
 The present standard SQL -3 or SQL – 99 was introduced in

1999 by ANSI (American National Standards Institute) and
ISO jointly.
SQL is a user friendly query language.
 Now-a-days almost all relational databases like – Oracle,

MySQL, IBM’s DB2, Informix etc., support SQL.

SQL is a high-level declarative language to specify data retrieval
requests for data stored in relational databases.
 Its declarative because we just specify what to be extracted,

rather than how to do it.
SQL is relationally complete, meaning that any query that is

expressed in relational algebra or calculus can also be written
in SQL.
SQL also supports additional features that are not existing in

formal languages.
SQL is a standard and many vendors implement it in their own

way without deviating from the standard specifications.

Features of SQL
1. DDL (Data Definition Language) Set of commands to support creation,
deletion and modification of table structures and views.
2. DML (Data Manipulation Language) Set of commands to pose queries,

insert new tuples, and update/delete existing tuples.
3. Embedded SQL: Allows users to call SQL code from host languages like
C, C++ & Java.
4. Triggers: Actions executed by the DBMS whenever changes to the

database meet specified conditions. Action to be performed and the set of
conditions can be defined in “Triggers”.
5. Transaction Management: to perform roll-back / commit actions.
6. Indexes: Indexes can be created to speed up the access to data stored in

DB.

DDL Commands
The DDL (Create) statement for creating Employee table.
CREATE TABLE EMPLOYEE(
FNAME VARCHAR(15) NOT NULL,
MINIT CHAR,
LNAME VARCHAR(15) NOT NULL,
SSN CHAR(9),
BDATE DATE,
ADDRESS VARCHAR(30),
SEX CHAR,
SALARY DECIMAL(10, 2),
SUPERSSN CHAR(9),
DNO INT NOT NULL DEFAULT 1,
PRIMARY KEY (SSN),
FOREIGN KEY (SUPERSSN) REFERENCES EMPLOYEE (SSN),
FOREIGN KEY (DNO) REFERENCES DEPARTMENT (DNUMBER)
);
ALTER TABLE EMPLOYEE ADD CONSTRINT EMPFK FOREIGN KEY (DNO)
REFERENCES DEPT(DNUMBER) ON DELETE SET DEFAULT/ SET NULL/CASCADE ON UPDATE

CASCADE;

CREATE TABLE DEPARTMENT(
DNAME VARCHAR(15) NOT NULL,
DNUMBER INT NOT NULL,
MGRSSN CHAR(9) NOT NULL,
MGRSTARTDATE DATE,
PRIMARY KEY (DNUMBER),

UNIQUE (DNAME),
FOREIGN KEY (MGRSSN)REFERENCES EMPLOYEE (SSN));

DROPPING TABLE EMP
Drop table EMP;
Adding New column to EMP
ALTER TABLE EMP ADD ‘CITY’ VARCHAR (20);
TO DROP A COLUMN
ALTER TABLE EMP DROP AGE CASCADE/RESTRICT;
We can also give names to constraints and later use the names
to access those constraints and alter them.

DML Commands
DML (Data Manipulation)
 Selecting tuples, columns (querying)

 Inserting new tuples
 Updating existing tuples
 Deleting existing tuples

Basic Query Statements
SQL has ‘SELECT’ statement for retrieving information from the database.
This SELECT has no relationship with select () operation in relational
algebra. All the queries mentioned here are specified on the COMPANY
database given in Fig. 3.1.
THE SELECT – FROM – WHERE CONSTRUCT:
SELECT < attribute list> // attribute names to be retrieved

FROM < table list > // names of relation involved
WHERE < condition> // Boolean expression to identify the
tuples to be extracted.

Ex. 1
SELECT bdate, address
FROM EMPLOYEE
WHERE Fname = ‘john’;
Retrieves the birthdate & address of the employees whose first name
is ‘John’.
Ex. 2 Join operation
SELECT Fname, Lname, Address
FROM EMPLOYEE, DEPARTMENT
WHERE Dname = ‘Research” and Dnumber = Dno;
Retrieves first name, last name and address from joined truples from
employee and department. The joining condition is Dnumber in
department table is equal to Dno in employee table. We can also do
aliasing (renaming) of tables to avoid ambiguity.

Ex. 3 SELECT E.Fname, S.Fname
FROM EMPLOYEE AS E, EMPLOYEE AS S
WHERE E.superssn = S.ssn;
Retrieves the employee’s first name and his immediate

supervisor’s first name. This is an example of self joining.
Ex. 4 SELECT ssn

FROM EMPLOYEE;
Retrieves all ‘ssn’ from employee table.

Ex. 5
SELECT ssn, dname

FROM EMPLOYEE, DEPARTMENT;
This will retrieve ssn, Dname from the relation which is result of cross
product of employee and department tables.
Ex. 6: SELECT *
FROM EMPLOYEE
WHERE Dno = 5;
The above query will retrieve all the columns from employee table for
the tuples where Dno = 5.

Ex. 7: SELECT ALL salary
FROM EMPLOYEE;
Retrieves all salaries (including duplicates) from employee table.
Ex. 8 SELECT DISTINCT salary

FROM EMPLOYEE;
Retrieves distinct values for ‘salary’ attribute
We also have following operations in SQL

Union (for Union)
Except (for Difference)
Intersect (for Intersection)
Duplicate tuples are eliminated from the result.

Substring Comparisons in SQL
The character ‘%’ replaces an arbitrary number of characters, and ‘_‘
(underscore) replaces a single character.
Ex. 9 To retrieve all employees whose address is in Houston, Texas
SELECT fname
FROM EMPLOYEE
WHERE Address LIKE ‘% Houston, Texas %’;
Ex. 10 To retrieve the resulting salaries if every employee working in the

‘Accounts’ project is given a 10% raise.
SELECT Fname, 1.1* salary

FROM EMPLOYEE, WORKS_ON, PROJECT
WHERE ssn = essn AND pno = pnumber AND pname = ‘Accounts’;

Ex. 11
Retrieve all employees in department 5 whose salary is between
30,000 and 40,000.
SELECT *
FROM EMPLOYEE
WHERE (Salary BETWEEN 30000 AND 40000) and Dno= 5;
Order By:
The default ordering of the result is ascending. We can specify the key
word DESC if we wish a descending order of values.
SELECT Fname, Dno, age
Ex. 12 FROM EMPLOYEE
WHERE salary > 30,0000
ORDER BY Dno;

Summary
 What is SQL
 What are the features supported by SQL
 How to create relational schemas using SQL
 How to specify queries in SQL


CS&IS Dept.
Hyderabad Campus
Lecture Session-8
Structured Query Language (SQL)-2
Content
 Nested queries and correlated nested queries

 Use of EXISTS and NOT EXISTS
 Explicit join operations
 Aggregate functions
 Group by and Having classes
 Insert/ Update / Delete operations
 Views

Nested Queries
Ex.1 Retrieve the name of each employee who has a dependent

with the same name as the employee.
SELECT E.Fname
FROM EMPLOYEE AS E
WHERE E.ssn IN(SELECT ESSN FROM DEPENDENT
WHERE E.FNAME = DEPENDENT_NAME);
Correlated Nested Queries:
Whenever a condition in the WHERE clause of a nested query

references some attribute of a relation declared in the outer query, then
the two queries are said to be correlated.

Use of NOT EXISTS clause
Ex. 2
Retrieve the names, salary of employees who have no dependents
SELECT Fname, Salary

FROM EMPLOYEE
WHERE NOT EXISTS (SELECT * FROM DEPENDENT WHERE SSN
= ESSN);
We can also use ‘EXISTS’ to check the existence of at least one tuple
in the result.
It is also possible to use an explicit set of values in the WHERE –

clause.
We can also check whether a value is NULL

Renaming Attributes in the Result
Ex. 3
SELECT name AS Emp_name
FROM EMPLOYEE
WHERE Dno = 5;

Join Operation
We can also perform
Join – using key word ‘JOIN’
Natural join – using key word ‘NATURAL JOIN’
Left outer join – using key word ‘LEFT OUTER JOIN’
Right outer join – using key word ‘RIGHT OUTER JOIN’
Aggregate Functions and Grouping

COUNT
SUM
MAX
MIN
AVG

Ex. 4 SELECT SUM (Salary), AVG (Salary) from EMPLOYEE;
Ex. 5 To retrieve number of rows in Employee table

SELECT count (*)
FROM EMPLOYEE;
Ex. 6 Retrieve the name of employees who have two or more dependents
SELECT Fname
FROM EMPLOYEE
WHERE (SELECT COUNT (*) FROM DEPENDENT WHERE SSN
= ESSN) > = 2;

Group by
Ex. 7 For each department retrieve the department number and no of

employees.
SELECT dno, count (*)
FROM EMPLOYEE
GROUP BY Dno;
Group by and Having clause
Ex. 8 Retrieve the department number and no of employees for the

departments which have more than 5 employees working for it.
SELECT dno, count (*)

FROM EMPLOYEE
GROUP BY Dno
HAVING count(*)>5;
INSERT operation
For Inserting a new tuple into the relation
General Form
INSERT INTO <table name>
VALUES(v1, v2, v3,………….vn);
Ex. 9 INSERT INTO DEPARTMENT

VALUES(‘MARKETING’,10, 103, ‘2000-06-25’);
Deleting a tuple
Ex. 10 DELETE FROM <table name>

WHERE <condition>;
Ex. 11 DELETE FROM DEPARTMENT

WHERE dnumber=10;
If we don’t specify the condition all tuples are deleted.
Update command
Ex. 12 UPDATE EMPLOYEE

SET salary = 60000
WHERE ssn = 141;
Updates tuples in Employee table for the tuples with ssn = 141, sets
the value of the attribute salary to 60,000

Views in SQL
A view in SQL is a single table that is derived from other tables.
These other tables are known as base tables.
A view does not necessarily exist in physical form, it can be considered

as a virtual table.
The tuples of base tables are actually stored in database.
This limits the updates on views.
In fact when a view is updated, the corresponding base tables are the
structures which are to be updated.
This makes update operations on views complex.

Creating View
CREATE VIEW EMP_DETAILS
AS SELECT name, salary, dname, age, dloc
FROM EMPLOYEE, DEPARTMENT
WHERE dno = dnumber;
Whenever the view definition is executed, the new temporary table is

generated with specified attributes from specified base tables.
View definitions are stored in database, not the result of the view.
From then onwards view can be seen as a table and queries can be
posed on it.

Ex. SELECT name, dname FROM EMP_DETAILS
WHERE dno = 5;
Here EMP_DETAILS is a view. Where this query is executed, first the

view definition for EMP_DETAILS is executed and the select and
where operation are performed on the temporary table.

Note:
• A view is always up to date.

• Updates are generally not possible on views.
• Meant for querying only.
• Some times it is possible to store views for some
duration.
• Those views are known as materialized views.

Example SQL statements
EMPLOYEE
FNAME MINIT LNAME SSN BDATE ADDRESS SEX SALARY SUPERSSN DNO
DEPARTMENT
DNAME DNUMBER MGRSSN MGRSTARTDATE
DEPT_LOCATIONS
DNUMBER DLOCATION
PROJECT
PNAME PNUMBER PLOCATION DNUM
WORKS_ON
ESSN PNO HOURS
DEPENDENT
ESSN DEPENDENT_NAME SEX BDATE RELATIONSHIP

1. Get the list of employee IDs who have no dependents.
select ssn
from Employee
where ssn NOT IN ( select essn
from Dependent
);
(select ssn from Employee)

except
(select essn from Dependent);

2. Get the list of employee IDs who have more than two
dependents.
select essn
from Dependent
group by essn
having count(*) > 2;

3. Get the list of projects controlled by department with
name “ACCOUNTS”.
select pnumber, pname

from Projects
where Dnum IN ( select dnumber
from Department
where Dname=‘ACCOUNTS’);
select pnumber, pmname

from Project, Department
where Dnum=Dnumber AND Dname= ‘ACCOUNTS’;

4. Get the list of employee IDs working on all projects
Select essn
From Works_on
Group By essn
Having COUNT(*) = (select COUNT(*) from project);
select E.essn
from Works_on as E
where ((select pno from Works_on where essn=E.essn)
contsins
(select pnumber from Project));
5. Find the projects controlled by departments located in
Mumbai.
select pnumber, pname

from project
where dnum = (select dnumber
from Dept_locations
where Dlocation=‘Mumbai’);

6. Update the salary of those employees working with
department- HR , to Rs. 20000
update Employee
set salary=20000
where dnum = (select dnumber
from Department
where Dname=‘HR’);

7. Delete the records of employees who get salary less than
5000.
delete
from Employee
where salary < 5000;
delete
from Employee;

Summary
 How to write nested queries in SQL
 Writing queries using the clauses EXISTS, NOT XISTS, BETWEEN
AND, IN, NOT IN
 How to perform explicit JOIN operations
 How to use GROUP BY and HVING
 The concept of views in SQL
 Some examples on SQL


CS&IS Dept.
Hyderabad Campus
Lecture Session-9
Schema Refinement -1
Content
 Introduction to Schema Refinement

 Functional Dependencies
 Inference Rules
 Normalization
 Normal Forms (1NF and 2NF)

Schema Refinement (Database Design)
All database applications have certain constraints that must hold for the
data.
These set of constraints help to make the system to accept correct and
valid data.
A DBMS must provide facilities for defining and enforcing these

constraints.
Types of Integrity Constraints for Relational data model

Domain constraints – Data type, Null, Check for certain range.
Entity constraints – Primary key and Unique key
Referential integrity – Foreign key

A good database design practice is essential to develop
good relational schemas at logical level.
Good database design is needed for:
Clarity in understanding the database and
To formulate good queries
This is achieved by schema refinement performed on

the conceptual schema which is the result of mapping
high-level conceptual schema(ER) to Data model
specific conceptual schema(relational schema)
Functional Dependencies
Functional Dependency is a constraint between two sets

of attributes from the database.
If a relational database schema has n attributes A1, A2,

A3,….., An, then think of it as a universal database schema
R = {A1, A2, A3,……A4}.
This is not a real table, this is conceptual for developing

formal theory of data dependencies.

Function Dependency
Denoted by X  Y between two sets of attributes in R,
and specifies a constraint on the possible tuples that can
form a relational instance r of R.
Values of Y component are determined by X component.
(or) Y is functionally dependent on X.
Thus, X functionally determines Y in a relation schema R if
and only if whenever two tuples of r(R) agree on their X
values they must necessarily agree on their Y values, but
Y  X is not true (need not be)
Ex: ssn  ename; {ssn, pnumber}  Hours
Note: FDs cannot be inferred. They should be defined by someone

who knows the semantics of the database very well.
Diagrammatic Notation
Department {Dnumber Dname, Mgrssn,mgrstartdate}

Mgrstubdate}
Dname Dnumber Mgrssn Mgrstartdate
Work_on {Essn, pno}  Hours
Essn Pno Hours

Inference rules for FDs
If F denotes a set of FDs, we can infer some new FDs from specified
FDs, set of all possible functional dependencies is called as closure of
F and denoted as (F+ ).
If F = { ssn {Ename, address, dnumber},

Dnumber  {Dname, dlocation}
}
We can infer new FDs as below

ssn  {Dname, Dlocation}
ssn ssn
Dnumber  Dname
If X  Z we can say that XY  Z.

Inference Rules for FDs
Rule 1 (1R1): (Reflexing)

If X  Y then X  Y otherwise non trivial
Rule 2 (1R2) (Argumentation)
X  Y; then XZ  YZ
Rule 3 (1R3) (Transitive)
XY ; Y  Z; Then X  Z;
Rule 4 (IR4) (Decomposition or projective rule)
X  YZ then X  Y; & X  Z;
Rule 5 (IR5) (union rule)
X  A; XB ; then X  AB
Rule 6 (IR6) (Pseudo transitive)
XY ; WY  Z; then WX  Z;
We can find the clusure F+ of F, by repeated application of rules IR-1 to

IR-3. These rules are called as Armstrong’s Inference rules.

Equivalence of sets of FDs
F covers E if every FD in E is in F+
F and E are equivalent if E+ = F+
A set of FDs F is minimal if it satisfies the following conditions.
 Every dependency in F has a single attribute for its RHS.
 We can’t replace any dependency X  A in F with any dependency Y  A

where Y is proper subset of X, and still have a set of dependencies that
are equivalent to F.
 We can’t remove any dependency from F and still have equivalent FD to F.

Normalization & Normal forms
Normalization process is first proposed by

Raymond Boyce and Edgar Codd in 1972.
Normalization of data – is the process of analyizing relation

schemas based on their FDs and PKs/Keys to achieve the
desirable properties of –(i)Minimal redundancy (ii)Minimal
anomalies

In the process of normalization, unsatisfactory relations
that do not meet the requirements are de-composed into
smaller relations.
Every level of NF need to satisfy certain conditions.
Normal Form (NF) of a relation refers to highest NF

condition that it satisfies.

Schema Refinement (Database Design) encompasses
(i) Normalization – bringing the database to the desired level of NF
(ii) Checking for other desired properties like –
Lossless join property
Dependency preserving property
The above properties are desirable during the process of decomposition.
Some definitions useful in database design
Key: Is a minimal superkey is also called as candidate key. One of these

becomes PK. Other candidate keys are called alternate keys.
key Attribute: An attribute which is part of some key (any Candidate key)
We study the general definitions of NF in terms of keys, not just PK.

A relation can have any number of keys but ha only one PK.
1. First Normal Form (INF)
It states that the domain of any attribute must include only

atomic (single / simple/ individual) values.
In the example given below, under the column Dloc each
row has more than on values.
Ex.: Dept DId Dname Dloc
10 Engg HYD
CHENNAI
20 Mark HYD
MUMBAI

2.Second Normal Form (2NF)
It is based on full functional dependency.
{X  A} is fully functional if we remove any attribute from X then
that FD does not hold anymore.
Condition for 2NF: All non-key attributes are fully functionally dependent on
key (or) no non-key attribute should be dependent on part of key(partial
dependency).
eid pnum Hours ename ploc

Here, {ename} is a non key attribute and
determined by {eid} which is part of the key. Hence
we say that ename not fully functionally dependent
on key.
The relation shown is not in 2NF. Now we can
decompose this in to three relations as shown
below.
eid ename pnum ploc eid pnum hours

Summary
 What is the Schema Refinement process
 Functional Dependencies
 What are the Inference Rules
 What is Normalization & 1 NF and 2NF


CS&IS Dept.
Hyderabad Campus
Lecture Session-10
Schema Refinement-2
Content
 3 NF and BCNF
 Decomposition requirements
 Lossless join decomposition
 Dependency preserving decomposition
 Examples

Recap of 1NF and 2 NF
1. First Normal Form (INF)
It states that the domain of any attribute must include only

atomic (single / simple/ individual) values.
In the example given below, under the column Dloc each
row has more than on values.
Ex.: Dept DId Dname Dloc
10 Engg HYD
CHENNAI
20 Mark HYD
MUMBAI

2.Second Normal Form (2NF)
It is based on full functional dependency.
{X  A} is fully functional if we remove any attribute from X then
that FD does not hold anymore.
Condition for 2NF: All non-key attributes are fully functionally dependent on
key (or) no non-key attribute should be dependent on part of key(partial
dependency).
eid pnum Hours ename ploc

3. Third Normal form (3NF)
It is based on transitive dependency.

According to this, a relation should not have a non key attribute
functionally determined by another non key attribute. i.e., there should
be no transitive dependency.
ename eid address dnum dname dloc
Not in 3NF, because Dname is transitively dependent on eid.

Now we can decompose the above into 2 relations.
ename eid address dnum Dnum Dname dloc
Condition for 3NF
For each FD, X  A in database
i) X must be a superkey or
ii) A is key attribute
BCNF (Boyce Codd Normal Form)
It is a stricter form of 3NF
Condition
For each FD XA

X must be a superkey
4th NF: Is based on multivalued dependency
5th NF: Is based on join dependency normally database designers

go up to 3NF only, and 4NF & 5NF are beyond the scope of our
discussion.

Decomposition and Desirable properties
As we have seen, decomposition (of a bigger relation R

into smaller ones), is a major step in the process of
normalization.
But during this activity of decomposition, we need to

make sure that the decomposition is lossless and
dependency preserving

Loss-less join Decomposition
Let C represent a set of constraints on the database. A decomposition

{R1, R2, R3,……….R4} of a relation schema R is a lossless join
decomposition for R if all relation instances r on R that are legal under
C.
r   R (r )
1
*  R2
(r )………….= r
 R1
(r ) = projection of r on R1
r – relation instance in R
F = FDs on R
(or) {R}  {R1, R2}

Test for Lossless join property

Dependency Preserving Decomposition
Given a set of dependencies F on R, the projection of F on Ri

denoted by
(where Ri is a subset of R); is the set of FDs X  Y in F+ such that the

attributes in X  Y are contained in Ri.

 
R1 (F) R2 (F) ...., Rm (F)  F 
Then it is dependency preserving decomposition.
 R1 ( f ) - is projection of F on R1.

This dependency preserving condition makes sure that no
FD in original relation is lost as a result of decomposition.
The FDs represent constraints (business logic).
Note:
•Not every BCNF is dependency preserving
•Limited amount of redundancy in 3NF in the form of
transitive dependency is better than losing FDs as result of
bringing 3NF to BCNF.

Summary
 Recap of 2NF
 What is 3NF and BCNF
 Decomposition into 3NF and BCNF
 Lossless join decomposition
 Dependency preserving decomposition


CS&IS Dept.
Hyderabad Campus
Lecture Session-11
Data Storage
Content
 Disk pack features

 Records and Files
 File operations
 Ordered and unordered features

Disk Storage
 Disk is the preferred secondary storage device

for high storage capacity and low cost.
 Data stored as magnetized areas on magnetic
disk surfaces.
 A disk pack contains several magnetic disks
connected to a rotating spindle.
 Disks are divided into concentric circular
tracks on each disk surface.
 Track capacities vary typically from 4 to 50 Kbytes
or more

 A track is divided into smaller blocks or sectors.
 The division of a track into sectors is hard-coded on the disk

surface and cannot be changed.
 A track is divided into blocks.

1. The block size B is fixed for each system.
Typical block sizes range from B=512 bytes to B=4096 bytes.
2. Whole blocks are transferred between disk and main memory
for processing.

 A read-write head moves to the track that contains the block to be
transferred.
Disk rotation moves the block under the read-write head for reading or
writing.
 A physical disk block (hardware) address consists of:
 a cylinder number (imaginary collection of tracks of same radius from
all recorded surfaces)
 the track number or surface number (within the cylinder)
 and block number (within track).
 Reading or writing a disk block is time consuming because of the seek time
s (time to position the head on required track)
3-7msec and rotational delay (latency) – time to position at the
beginning of the required block rd.
3-4 msec with 15000rpm
Block transfer time. Smaller than above two.

Files and Records
• A file is a sequence of records, where each record is a

collection of data values (or data items).
• A file descriptor (or file header) includes information that
describes the file, such as the field names and their data types,
and the addresses of the file blocks on disk.
• Records are stored on disk blocks.
• The blocking factor (bfr) for a file is the (average) number of
file records stored in a disk block.
• A file can have fixed-length records or variable-length records.

• File records can be unspanned or spanned
– Unspanned: no record can span two blocks
– Spanned: a record can be stored in more than one block
• The physical disk blocks that are allocated to hold the records
of a file can be contiguous, linked.
• In a file of fixed-length records, all records have the same
format. Usually, unspanned blocking is used with such files.
• Files of variable-length records require additional information
to be stored in each record, such as separator characters and
field types.
– Usually spanned blocking is used with such files.

File operations
Typical file operations include:

 OPEN: Readies the file for access, and associates a pointer that will refer to a
current file record at each point in time.
 FIND: Searches for the first file record that satisfies a certain condition, and makes it
the current file record.
 FINDNEXT: Searches for the next file record (from the current record) that satisfies
a certain condition, and makes it the current file record.
 READ: Reads the current file record into a program variable.
 INSERT: Inserts a new record into the file & makes it the current file record.
 DELETE: Removes the current file record from the file, usually by marking the record
to indicate that it is no longer valid.
 MODIFY: Changes the values of some fields of the current file record.
 CLOSE: Terminates access to the file.
 REORGANIZE: Reorganizes the file records.
For example, the records marked deleted are physically removed from the file or a new
organization of the file records is created.
 READ_ORDERED: Read the file blocks in order of a specific field of the file.

Unordered Files
Also called a heap or a pile file.

New records are inserted at the end of the file.
A linear search through the file records is necessary to
search for a record.
– This requires reading and searching half the file
blocks on the average, and is hence quite
expensive.
Record insertion is quite efficient.
Reading the records in order of a particular field
requires sorting the file records.
Ordered Files
• Also called a sequential file.

• File records are kept sorted by the values of an ordering field.
• Insertion is expensive: records must be inserted in the correct
order.
A binary search can be used to search for a record on its
ordering field value.
– This requires reading and searching log2 of the file blocks
on the average, an improvement over linear search.
• Reading the records in order of the ordering field is quite
efficient.

Summary
 What is Disk storage
 Disk characteristics
 Disk pack structure
 Files and Records
 Ordered and unordered files


CS&IS Dept.
Hyderabad Campus
Lecture Session-12
Hashing Techniques
Content
1. Introduction to hashing
2. Internal hashing
3. Collision
4. External hashing
5. Static hashing
6. Dynamic hashing

Hashing
Hashing technique is an alternative to indexing, for fast retrieval of data

records based on search key.
The search field is called as hash field of the file.
In most cases the hash field is also a key field of the file, in which case it
is called as hash key.
The basic idea of hashing is that a hash function h, when supplied a

hash field value K of a record produces the address B of the disk block
that contains the record with specified key value.

h : K  B
Hash Key Disk block

function value address
Once the disk block is known, the actual search for the record within the
block is carried out in main memory buffer.
For most records we require only one block access.

Internal Hashing
0 to (M-1)
Used for internal files. 0
1
A hash table is implemented through 2
use of an array of records. (M-1)
Array with M
locations
The most common hash function used is h(k) = K mod M

This gives the index of the location in the array.
For example- if M = 10 and the key value is 24
K mod M  24 mod 10 = 4
Hence the record with key value 24 will be stored in 5th location of
the array.
If two or more records are hashed to same location it is called as
collision.
Then we need to find some other location for the new record. This
process is known as collision resolution.

Methods for collision resolution
Open addressing: When collision occurs try with alternate cells

until an empty cell is formed.
Chaining: for this various overflow locations are kept by
extending the array by number of overflow positions. A pointer
field is added to each record location. Collision is resolved by
allocating an unused overflow position.
Multiple hashing: We apply a second hash function if the first
hashing results in a collision.
The goal of a good hashing function is to distribute the records

uniformly over the address space so as to minimize collisions
while not leaving many unused locations.

External Hashing
Hashing used for disk files is called as external hashing. The

disk block contains records. A single disk block or cluster of
contiguous blocks is known as a bucket.
The hashing function maps a key value into a relative bucket
number. A table maintained in the file header converts the
bucket number into the corresponding disk block address, as
shown in the figure below. Block h:KB
Bucket address on Bucket number
number disk
1
2
M-2
M-1
Disk

The above scheme is called as static hashing because the
number of buckets allocated is fixed. This is a big constraint
for files that are dynamic.
When a bucket is filled to capacity and if the new record is

hashed on to the same bucket, then chaining is adopted,
where a pointer is maintained in each bucket to a linked list
of overflow records for the bucket.
The pointers are record pointers which include both block

address and a relative record position with in that block.

Handling overflows in Static External
Hashing

Dynamic Hashing
This scheme allows us to expand or shrink the

hash address space dynamically.
Each result of applying the hash function is a

nonnegative integer and hence can be
represented with a binary pattern. This we call
it as hash value of the record.
Records are distributed among the buckets

based on the values of the leading bits in their
hash value.
Extendible Hashing
The first technique is called as extendible hashing.

This scheme stores a directory structure in addition to
the file. This access structure is based on the result of
the hash function to the search field. The major
advantage of extendible hashing is that performance
does not degrade because of chaining, as the file
grows as we have seen in static hashing. In extendible
hashing no additional space is wasted towards the
allocations for future growth, but additional buckets
can be allocated dynamically as needed. The only
overhead in this scheme is that a directory structure
needs to be searched before the buckets are
accessed.
Linear Hashing
In the second scheme called linear hashing,

no directory structure is used. Here instead of
one hash function, multiple hash functions are
used. When collision occurs with one hash
function, the bucket that overflows is split in to
two and the records in the original bucket are
distributed among two buckets using the next
hash function h(i +1) (k). Hence we have
multiple hash functions.

Summary
• What is hashing
• Internal hashing
• External hashing
• What is static external hashing
• What is dynamic hashing
• How Extendible and Linear hashing techniques work


CS&IS Dept.
Hyderabad Campus
Lecture Session-13
Indexing -1
Content
 What is Indexing
 Primary and Secondary indexes
 Dense and Sparse Indexing
 Multilevel Indexing
 Designing Primary and Multilevel Indexes

Introduction to Indexing
An index for a file works in much the same way as a catalog

in a library.
In a library cards are kept in alphabetical order. So we don’t
have to search all cards.
In real world databases, indexes may be too large to be
handled efficiently.
Hence some sophisticated techniques are to be used.
Techniques for efficient retrieval of required records from

disk are:
• Hashing
• Indexing

The criteria for evaluating the hashing or indexing techniques –
 Access time
 Insertion time (new indexes or new records)
 Deletion time
 space overhead
Some times more than one indexing may be required for a file.
The attribute /field used for constructing index structure for a

file is called a ‘indexing field/attribute’ .

If the index field is a key, it is called as search key or indexing key.
Indexes on key attributes:
1. Built on ordering key(PK) – Primary index

2. Non-ordering Key - Secondary index on key attribute
Indexes on non-key attributes:

1. Ordering non-key -- Clustering Index
2. Non-ordering non-key attribute – Secondary index on non-key
Hence, a file can have at most one primary index o one clustering
index, but not both.

Indexing
Ordering field
Nonordering field
(secondary index)
Key
Non-key
(primary index) key Non-key
(Clustering index)

Data record: Similar kind of records(of a relation/table) are
stored in a single file containing blocks. These are called
data records and will have fields specified on the relation.
Index record: Like data records, index records are also

stored in database. Any index record normally has two
fields.
Value Pointer
Key value Location address of

the record containing
the key

25
30
Key Pointer
25 41
30
41
84
Index file
Data file
(data records)

Dense Index : In this, an index record appears for every data file record.
Sparse Index : Index records are created only for some data file
records. This occupies less space. Sparse index can be on primary or
secondary key.
A primary index and clustering index are non-dense.

Primary Indexing
Data files / Blocks
Ind ex 2
Files/Blocks 5
6
Pointer to 9
Ke y block
2
15 15
25 17
18
30 19
38
45 25
27
60 29
30
35
6
9

Dense and Sparse Indexing
24
24
32
32
36 24
36
40 40
40
50 54
50
54
54
56
56
60
60
Sparse index
Dense indexing File with data
records

Secondary Indexing
( Built on non-ordering non-key attribute)
Buckets Index attribute

Secondary Index 28
35
Field 28
Value ptr
28 39
35
35
39 39
45 45
28
39
45
45
Data records

Primary Indexing
Data files / Blocks
Ind ex 2
Files/Blocks 5
6
Pointer to 9
Ke y block
2
15 15
25 17
18
30 19
38
45 25
27
60 29
30
35
6
9

Designing a Primary index
Assume that we have an ordered file with 80000 records

stored on disk. Block size is 512 Bytes. Record length is
fixed and it is 70 Bytes. Key field(PK) length is 6 Bytes
and block pointer is 4 Bytes. Assume unspanned record
organization
Design a Primary index on primary key.

Sizeof disk block=512 Bytes; record length=70 Bytes
Block pointer=4 Bytes. Key field=6 bytes; total records=80000
No. records per block(Bfr)= floor (512/70)=7.31=7
No. of data blocks needed= ceil( 80000/7)= 11429
Index record length= key + pointer=6+4=10 Bytes
Blocking factor for index (Bfri) = floor(512/10)=51
(known as fanout)
No. of index blocks = Ceil(11429/51)= 225
No. of block accesses= floor of (log2 225) + 1 = 8+1=9

Multilevel Indexing (Two levels)
Data files / Blocks
First Level 5
6
Pointer to 9
Key block
2
Second Level
15 15
Pointer to 25 17
Key next level
18
2
30 19
30
38
60
45 25
27
60 29
30
35
6
9

Designing a multilevel index
Assume that we have an ordered file with 80000 records

stored on disk. Block size is 512 Bytes. Record length is
fixed and it is 70 Bytes. Key field(PK) length is 6 Bytes
and block pointer is 4 Bytes. Assume unspanned record
organization
Design a multilevel index on primary key.

How many levels are there.
How many blocks are there in each index level.

Size of the disk block=512 Bytes; record length=70 Bytes
Block pointer=4 Bytes. Key field=6 bytes; total records=80000
No. records per block(Bfr)= floor (512/70)=7.31=7
No. of data blocks needed= ceil( 80000/7)= 11429
Index record length= key + pointer=6+4=10 Bytes
Blocking factor for index = floor(512/10)=51 - fanout
No. of index blocks in first level= Ceil(311429/51)= 225
No. of index blocks in 2nd level= Ceil(225/51)= 15
No. of index blocks in 3rd level= Ceil(5/51)= 1 top level
No. of levels=t=3
No. of block accesses= No. index levels + 1= t+1=4

Action on deletion of records
If the record is the last record with that value delete the
entry in index file too. If it is dense index delete it like record
in a file. If it is sparse, we delete the entry and replace with
next key value, if it is not already existing.
Action on Inserting a new record

If the indexing is dense, insert the new key into the index. If
sparse no change is to be made unless new block is
created.

Summary
 What is Indexing and its importance
 How Primary and Secondary indexes work
 Examples of Dense and Sparse Indexes
 What is Multilevel Indexing
 Some example problems on designing Primary and Multilevel
Indexes


CS&IS Dept.
Hyderabad Campus
Lecture Session-14
B+ Tree Indexing
Content
 What is Tree Indexing

 B+ tree
 Inserting and deleting keys into B+ Trees
 B Tree
 Constructing a B+ tree
 Designing a B+ Tree node structure

Tree Indexing
Adopting Tree structure for implementing indexes

A tree consists of nodes & leaves. The number of arcs from a node in the
tree to root is known as path-length.
The height of non empty tree is equal to max.level of a node in a tree. For
empty tree height is zero.
Binary tree
Each node has max two children (left and right). Hence at ith level, no of
nodes present are 2(i-1) (root is at level 1).
Complete binary tree: All nodes except at last level are present.
Binary Search Trees: for each node in the tree, all values stored in its left
subtree are less than value stored in the node and all values stored in the
right subtree are greater than the value in the node.

Multilevel Search Tree of order m
(or)
M-way search tree
• Each node has m children and (m -1) keys
• Keys in each node are in ascending order

24 32 40 60
K1 K2 K3 K4
Child 1 Child 2 Child 3 Child 4 Child 5
No of children = (m) = 5
No of keys = (m -1) = (5-1) = 4

B+ Tree Indexing
B+ Tree is a multilevel search tree used to implement

dynamic multilevel indexing. The primary disadvantage of
implementing multilevel indexes is that the performance
degrades as the file grows. It can be remedied by
reorganization, but frequent reorganization is not advisable.
B+ tree is best suited for multilevel indexing of files, because
it is dynamic.
B+ Tree of Order p
It is a balanced tree, (all leaves are at same level).
Each internal node is of the form- 24 32 40 60
K1 K2 K3 K4
Child 1 Child 2 Child 3 Child 4 Child 5

B+ Trees
For a B+ tree of order p
 With in each internal node K1 < K2

 P1, P2… are tree pointers
 K1, K2, K3,… are key values which are in ascending
order from left to right
 Each internal node has at most p (order) pointers.
 Each internal node except the root node has at least
ceil (p/2) tree pointer to next level. Root has at least 2
pointers.

 An internal node with q pointers has (q -1) field values.
 All record pointers are available at leaf node only.
 Once we get a key value at leaf node, from there

accessing next value in sequence is easy because all keys
at leaf level are in ascending order.

EX: B+ Tree of order 3 i.e., p=3
Min. no. pointers in any node = 3

 2 
Root
17
5 14 19 17 40
Leaf
level 3 4 10
Record
pointer

B- Tree

Note
In a B+ tree record pointer for a record with given
key can be found only at leaf node.
But if it is in case of B-tree it can happen at
intermediate node also.
Hence in B+ tree search, success or failure can be
declared only after reaching leaf_level.
Where as in B-tree search can be successful at
intermediate level as well.
On failure we reach the leaf level.

Constructing a B+ Tree
Construct a B+ tree with given specifications. The order of the tree, p=3
and pleaf =2. The tree should be such that all the keys in the subtree pointed
by a pointer which is preceding the key must be equal to or less than the
key value , and all the keys in the subtree pointed by a pointer which is
succeeding the key must be greater than the key.
Insert the following keys in same order- 56, 22, 78, 42, 102, 90, 96, 35.
Show how the tree will expand after each insertion, and the final tree.
Next, delete 56, 46, 22 in the same order and show the status of the tree
after each deletion.

Node design for B+ tree
We need to design a B+ tree indexing for Student

relation, on student_id attribute; the key of the relation.
The attribute student_id is of 4 bytes length. Other
attributes are- student_age(4 bytes), student_name(20
bytes), student_address(40 bytes), student_branch(3
bytes). The Disk block size is 1024 Bytes. If the tree-
pointer takes 4 bytes, for the above situation, design the
best possible number of pointers per node(internal) of the
above B+ tree. Each internal node is a disk block which
contains search key values and pointers to subtrees.

Disk block size=1024 Bytes
Size of B+ tree node= size of disk block
Each tree pointer points to disk block and takes 4 Bytes.
Each key (student_id) takes 4 Bytes
In a B+ tree node, No. of pointers = no. keys +1
Assume that no. keys = n
Then no. pointers= n+1
Then min. size for a node= {(no.Keys* size of each key)+
(no.pointers * size of each pointer)} <= 1024
(n*4)+(n+1)*4 <=1024
4n+4n+4 <= 1024
8n+4 <= 1024
8n <= 1024-4= 1020
n <= 127.5 or 127
hence In each internal node, no. keys=127; and no. pointers=128

Summary
 What is Tree Indexing
 B tree and B+ tree concepts
 Constructing a B+ tree (Insert/Delete operations)
 Designing a B+ Tree node structure


CS&IS Dept.
Hyderabad Campus
Lecture Session-15
Transaction Processing
Content
 What is Transaction Model

 Significance of Transaction Model
 States of a transaction
 ACID Properties

Introduction to the Transaction Model
Multi-user Database systems:

Multiple users access the database simultaneously (multi processing).
Concurrency There occurs concurrency in data access.
Transaction:

operation or function in a database application.
Each transaction is a unit of atomicity.

Storage Types
Volatile Storage: Ex- main memory, cache
Nonvolatile storage : Ex- disk, tapes, etc.
Storage Hierarchy
DB system resides on nonvolatile storage.
Database is partitioned into blocks of fixed length storage,

which are units for storage allocation and transfer.
Transactions input and output data from disk to main memory,

and main memory to the disk.
The data transfer is done in terms of blocks.
A buffer block is same as disk block but it is in main memory,

but the size is same.
The block movement between disk and main memory is initiated by
following operations.
Input (X) – The physical block with data item X is brought from disk into
main memory.
Output (X) – Buffer block containing the data item X is sent to disk to
replace the appropriate physical block.

Input (A) Disk
Buffer blocks
blocks A
B
Disk
Output (B)
Main Memory

Transactions interact with DB by transferring data from program variables to
the DB and DB to program variables.
This transfer of data is achieved through the following two operations.
I.read (X, xi) -where xi is local variable X is DB data item and

represents the operation
xi  X
If the block with data item X is not in main memory issued Input (X)
Assign xi the value of X from buffer blocks
II. write (X, xi) performs xi  X

If block with X is not there Input (X), assign xi to X in the buffer block
Note: Reading is must but transaction need not write every item. The modified
blocks can be written back onto the disk during page replacement in main
memory.

Steps followed by Transactions while accessing data for processing
Read (X, xi)  uses (Input (X))


Modify xi

Write (X, xi)  uses (Output (X))
If the system crashed before the new value is written to the disk, then the
new value is lost forever and never written to the disk.

A transaction is an atomic unit of work that is
either completed in its entirety or not done at
all.
– For recovery purposes, the system needs to keep
track of when the transaction starts, terminates,
and commits or aborts.

Transaction Model
A transaction is a program unit that access and update several data

items.
Read ( ) and Write ( ) are the basic operations.
Data prior to Exe. of After Execution

execution of transaction. Data of the transaction
transaction may be in
inconsistent state
Difficulties when Goes to next

Consistent failure occur in consistent state
state the process
Hence, as a result of failure, state of the system will not reflect the state of
the real world that the database is supposed to capture.
We call that state as inconsistent state.
It is important to define transactions such that they preserve consistency.

ACID Properties of a Transaction
Transaction should possess the following properties called as ACID

properties.
Atomicity: A transaction is an atomic unit of processing. It is either

performed in its entirety or not performed at all.
Consistency Preservation: The successful execution of a transaction

takes the database from one consistent state to another.
Isolation: A transaction should be executed as if it is not interfered by

any other transaction.
Durability: The changes applied to the data by a transaction must be

permanent.

Transaction States
Active State: Initial state when a transaction starts.
Partially committed State: This state is reached when the last

statement is executed and the outcome is not written to the DB.
Failed State: After discovering that the normal execution cannot be

continued a transaction is aborted and reaches failed state.
Comitted State: Is reached after successful completion of the

transaction.
Terminated State: Is reached after failure or success of the transaction.

Those transactions which are not completely successful are called
failed transactions.
In order to ensure atomicity property, failed transactions should have

no effect on the database.
Hence the state of the DB must be restored to the state it was in just
before the transaction started its execution.
We say such transaction is rolled back.

When the transaction rolls back the modifications done to the DB by
the half compete transaction are removed so that the state reflects the
state before the start of execution of the transaction.
A transaction which is completely successful is called committed

transaction.
A committed transaction brings the DB to new consistent state. The

effects of committed transactions cannot be rolled back.
Once a transaction is aborted, it must be terminated and new

transaction must be started.
A transaction reaches committed state if it has partially committed and

it is guaranteed that it will never be aborted.

Summary
 What is a transaction
 Basic database operations performed by a transaction
 Properties of a transaction
 States of a transaction
 Transaction execution and the database consistency


CS&IS Dept.
Hyderabad Campus
Lecture Session-16
Concurrent Transactions and Schedules
Content
 Concurrent Transactions
 Transaction Schedule
 Serial and Concurrent Schedules
 Need for Concurrency Control
 Conflicting Operations
 Conflict Equivalent Schedule
 Test for Conflict Serializability
 View Equivalent Schedule
 View Serializability

Introduction
 Multiprogramming in modern systems increases the throughput

drastically, as the resources are shared by more than one process.
 Similarly in a DBMS multiple transactions are executed concurrently.

operation or function in a database application. Each transaction is a
unit of atomicity.
 Here, for transactions we consider data items as resources

because transactions process data by accessing them.
 When multiple transactions access data elements in a concurrent

way, this may destroy the consistency of the database.

Transaction Schedule
The descriptions that specify the execution sequence of instructions in

a set of transactions are called as schedules.
Hence schedule can describe the execution sequence of more than

one transaction.
T1 T2
Here, in the above schedule the T1 and T2 are transaction

Read (A)
transactions T1 & T2 are executed in a A = A + 50 Read (A) – Reads data item
serial manner i.e., first all the Read (B) A
instructions of the transaction T1 are B = B +A Write (B) – Writes the data
executed, and then the instructions of Write (B) item B
T2 are executed. Hence the above Read (B)
schedule is known as serial schedule. B = B + 75
Write (B)

In a serial schedule, instructions belonging to one single transaction
appear together.
A serial schedule does not exploit the concurrency. Hence, it is less

efficient.
If the transactions are executed concurrently then the resources can be

utilized more efficiently hence more throughput is achieved.

A serial schedule always results in correct database state that reflect
the real world situations.
When the instructions of different transactions of a schedule are

executed in an interleaved manner use call such schedules are called
concurrent schedules.
This kind of concurrent schedules may result in incorrect database
state.
T1 T2
Read (A)
A = A + 50
Time
Read (A)
A = A + 100
Write (B)
Write (A)

Why Concurrency control is needed?
The Lost Update Problem
This occurs when two transactions that access the same database
items have their operations interleaved in a way that makes the value of
some database item incorrect.
The Temporary Update (or Dirty Read) Problem
This occurs when one transaction updates a database item and then
the transaction fails for some reason (see Section 17.1.4).
The updated item is accessed by another transaction before it is
changed back to its original value.
The Incorrect Summary Problem
If one transaction is calculating an aggregate summary function on a
number of records while other transactions are updating some of these
records, the aggregate function may calculate some values before they
are updated and others after they are updated.

It is desirable that a schedule, after execution must leave
the database in a consistent state.
The result of a concurrent execution must be same as the

result of executing the transactions in serial way.
A concurrent schedule whose result is same as that of a

serial schedule is called as concurrent serializable
schedule.

Conflicting Operations
For transactions T1 & T2 the order of read operation on any

data element does not matter.
{T1R(Q), T2R(Q)} or {T2R(Q), T1R(Q)} does not matter.
The result is same and does not lead to any conflict.
Here, Q is the data element.
But {T1R(Q), T2W(Q)} is not same as {T2W(Q), T1R(Q)}
If Ii and Ij are the operations (instructions) two different

transactions on the same data item, and at least one of these
instructions is a WIRTE operation then we say that Ii and Ij are
conflict operations.
Here, I stands for instruction and i and j are transactions.

Hence it is evident that if we swap non-conflicting operations of a
concurrent schedule, it will not affect the final result.
Look at the following example.
T1 T2 T1 T2 T1 T2
R(A) R(A) R(A)
W(A) W(A) W(A)
R(A) R(A) R(B)
W(A) R(B) R(A)
R(B) W(A) W(B)
W(B) W(B) W(A)
R(B) R(B) R(B)
W(B) W(B) W(B)
(S1) (S2) (S3)
Concurrent schedule Swap W(A) in T2 with Swap R(A) of T2 with
with T1 & T2 R(B) in T1 (because R(B) of T1 and W(A)
accessing A, B (data they are non in T2 with W(B) in T1
item) conflicting) (Since non
conflicting)

T1 T2
R(A) Now, the final schedule is a serial schedule
W(A)
R(B)
W(B)
R(A)
W(A)
R(B)
W(B)
(S4)
Swap R(A) of T2 with
W(B) of T1

Conflict Equivalent Schedules
If a schedule S can be transformed into a schedule S by

a series of swaps of non-conflicting instructions, we say
that S and S are conflict equivalent.
Further, we say that a schedule S is conflict serializable if

it is conflict equivalent to a serial schedule.
In our example S4 in the above example is a serial

schedule and is conflict equivalent to S1. Hence S1 is a
conflict serializable schedule.

T1 T2 In this schedule we cannot perform any swap between instructions of T1 and T2. Hence it
R() is not conflict serializable
W()
W()

Test for Conflict Serializability
Let S be a schedule.
We construct a precedence graph.
Each transaction participating in the schedule will become a

vertex.
The set of edges consist of all edges Ti  Tj for which one of

the following three conditions hold-
Ti executes W(Q) before Tj executes R(Q)

Ti executes R(Q) before Tj executes W(Q)
Ti executes W(Q) before Tj executes W(Q)

T0 T1
R(A)
Infact their schedule is non conflict serializable
T0 T1
R(A)
W(A)
W(A)
R(B)
R(B)
W(B)
T1 writes(A) after T0 writes(A) hence we draw on edge from T0  T1

T1 reads(B) before T0 write (B) hence we can draw an edge from T1
 T0
At any moment of time, while developing the graph in the above
manner, if we see a cycle then the schedule is not conflict
serializable. If no cycles at the end, then it is conflict serializable.
Hence the above schedule is not serializable.
Now let us consider the following transaction which is conflict serializable
and discussed earlier.
T1 T2
R(A)
Now, let us draw a precedence graph for the above
W(A)
schedule –
R(A) To write A before T1 reads A. hence we have T0  T1.
W(A)
R(B)
W(B)
R(B) T0 T1
W(B)
We have only one edge in this graph, and no

cycles. Hence it is conflict serializable.

Serial schedule:
– A schedule S is serial if, for every transaction T
participating in the schedule, all the operations of
T are executed consecutively in the schedule.
– Otherwise, the schedule is called nonserial
schedule.
Serializable schedule:
– A schedule S is serializable if it is equivalent to
some serial schedule of the same n transactions.

Result equivalent:
– Two schedules are called result equivalent if they
produce the same final state of the database.
Conflict equivalent:
– Two schedules are said to be conflict equivalent if
the order of any two conflicting operations is the
same in both schedules.
Conflict serializable:
– A schedule S is said to be conflict serializable if it is
conflict equivalent to some serial schedule S’.
• Being serializable is not the same as being
serial
• Being serializable implies that the schedule is a
correct schedule.
– It will leave the database in a consistent state.
– The interleaving is appropriate and will result in a
state as if the transactions were serially executed,
yet will achieve efficiency due to concurrent
execution.

View Equivalent Schedules
Two schedules S and S (where same set of transactions participate in both

schedules), are said to be view equivalent if the following three conditions are met.
For each data item Q, if the transaction Ti reads the initial value of Q in S, then
transaction Ti must in schedule S, also read the initial value of Q.
For each data item Q, if transaction Ti executes read (Q) in S, and the value
produced by transaction Tj (if any) then transaction Ti must in schedule S also read
the value of Q that was produced by transaction Tj.
For each data item Q, the transaction if any that performs the final write(Q)
operation in schedule S must perform the final write(Q) operation in schedule S.
Now, we say that a schedule S is view serializable if it is view equivalent to a serial

schedule.
Note: Every conflict serializable schedule is view serializable. But not all view
serializable schedules are conflict serializable.
Summary
 What are concurrent Transactions
 What are Serial and Concurrent Schedules
 Why Concurrency Control needed in DBMS
 Conflict Equivalent Schedule and its importance
 Test for Conflict Serializability


CS&IS Dept.
Hyderabad Campus
Lecture Session-17
Concurrency Control
Contents
 Introduction to Concurrency Control

 Implementing Serializability
 Lock-based protocols
 Deadlock condition
 Two-phase locking protocol
 Time-phase locking protocol

Introduction
 In a DBMS multiple transactions are executed concurrently.
 If the transactions are executed concurrently then the resources can

be utilized more efficiently hence more throughput is achieved.
 Here, for transactions we consider data items as resources

because transactions process data by accessing them.
 When multiple transactions access data elements in a concurrent

way, this may destroy the consistency of the database.

Implementing Serializabilty
One way to ensure serializability is to allow the transactions to access the

data items in a mutually exclusive manner.
This is to make sure that when one transaction access a data item no other
transaction can modify that data item.
The following techniques implement mutual exclusion and control

concurrency.
1. Lock-based protocols
2. Timestamp-based protocols

1. Concurrency Control Using Locks: A data item may be locked in
various modes.
i) Shared (denoted by S): if a transaction obtains a shared mode lock on

a data item Q, it can read Q but not modify Q.
(ii) Exclusive (denoted by X): if this lock is obtained, a transaction can

read or write the data item.

Lock Compatibility Matrix
S X
S True False
X False False
This says that if a transaction Ti obtains a lock on a data item in S-

mode, other transaction can get a lock on the same item in S-mode
but not in X-mode.
If a transaction obtains a lock in X-mode on a data item no other

transaction can obtain a lock on the same data item in any mode.

Deadlock
The Mutual exclusion mechanism leads to deadlock situation.
For example, if transaction Ti holds a lock on a data item (Q) in

X-mode and waits for a lock on another data item (P) which is
locked by another transaction Tj in X-mode, further to release
the lock on P, Tj must acquire a lock on Q, which is locked by
Ti. This is a circular wait condition and results in a deadlock
situation.

Wait-for Graph
Deadlock condition can be determined by a wait-for graph.
All transactions of the schedule become vertices.
And we have an edge between two transactions Ti and Tj. if Ti

is waiting for Tj to release a lock on a data item.
If the graph has a cycle then we can say that the schedule will
result in a deadlock.

S(A) means transaction locks A
T1 T2 T3 T4
in share mode
S(A)
R(A) – transaction reads A
R(A)
X(B)
X(C) – Transaction locks
W(B)
C in X-mode
S(B)
W(B) – transaction write B
S(C)
R(C) T1S(B) after

X(C) T2  (B)
X(B)
T1 T2
X(A) T3X(A) after
T1 S(A) T2X(C) after
T3 S(C)
T4X(B) after
T2 X(B)
T4 T3

In the above graph there exists a cycle hence this schedule
leads to deadlock.
If a transaction Ti requests a lock and transaction Tj holds a
conflicting lock. The lock manager can use one of the
following policies to prevent deadlocks.
Timestamp based:
Wait-Die: If Ti is older than Tj it is allowed to wait otherwise
aborted.
Wound-wait: If Ti older than Tj allowed to run by aborting Tj
else Ti will wait.
Priority based
Wait-Die: If Ti has higher priority than Tj it is allowed to wait
otherwise aborted.
Wound-wait: If Ti lies higher priority it is allowed to run by
aborting Tj else Ti will wait.
Two-phase locking protocol:
This protocol answers serializability.
According to this each transaction issues lock and unlock requests in two
phases.
i) Growing phase: In this phase, a transaction may obtain locks but
may not release any lock.
ii) Shrinking phase: In this phase, a transaction may release locks
but may not obtain any new locks.
The two-phase locking protocol ensures conflict serializability.
It does not ensure freedom from deadlock.

2. Timestamp-based Concurrency Control
Maintaining the ordering between every pair of conflicting transactions is

significant.
If we select the ordering in advance, we can achieve serializability. Time-

stamping is a method to fix the ordering.
Each transaction is assigned a unique fixed timestamp.

If TS(Ti) < TS(Tj), this implies that Ti should be executed before Tj.

The time-stamps determine the serializability order.
Each data item is associated with two timestamp values.
W-timestamp(Q) – represents the largest timestamp of any transaction that

successfully executes Write(Q).
R-timestamp(Q) - which denotes the largest time stamp of any transaction

that successfully executed Read(Q).
These values are updated whenever read(Q) or write(Q) are executed.

Timestamp ordering Protocol:
This protocol operates as follows:

i)Suppose Transaction Ti issues read(Q)
If TS(Ti) < W-stamp(Q), then it implies that Ti need to read Q which was
already overwritten.
Hence read operation is rejected and Ti is rolled back.
If TS(Ti)  W-timestamp (Q) then read operation is executed.
ii)Suppose Ti issues write (Q)

If TS(Ti) < R-timestamp(Q) it implies that the value of Q being produced by
Ti had to be written long back.
Hence reject Ti & roll back.
If TS(Ti) < W-timestamp(Q), Ti is attempting to write some absolute value of
Q.
Hence reject Ti & roll back.
Otherwise write operation is executed.

Summary
 Concepts related to Concurrency Control
 Approaches for Implementing Serializability
 How lock-based protocols work
 Detecting the Deadlock condition and resolving
 Two-phase locking protocol
 How timestamp-based protocol works


CS&IS Dept.
Hyderabad Campus
Lecture Session-18
Database Recovery
Content
 Introduction to Recovery
 Recovery strategies
 Log-based recovery
 Check-pointing
 Shadow paging

Introduction to Recovery
Data prior to Exe. of After Execution

execution of transaction / of the transaction
transaction Data may be in
inconsistent state
Difficulties when Goes to next

Consistent failure occur in consistent state
state the process
As a result of failure of a transaction, the state of the system will not

reflect the state of the real world that the database is supposed to
capture. We call that state as inconsistent state.
When such thing happens we must make sure that the database is
restored to its previous consistent which existed before the start of the
transaction (which has failed).
This process is known as recovery process.
Recovery Techniques
If a transaction T performed multiple database

modifications, several output operations may be required
and a failure may occur after some of these modifications
have been made but before all of them are made.
In order to restore to the recent consistent state, we must

first write the information describing the modifications to
System log without modifying the database itself.
This helps us to remove the modification done by a failled

transaction.
Now we discuss some important recovery techniques.
Log-based Recovery
Database System Log:

Each log record describes a single database write operation
and contains the following details.
•Transaction name
•Data item name
•Old value
•New value

Types of log records
< Ti start> - indicates transaction Ti started
<Ti, Xj, V1, V2> - transaction Ti has performed a write

operation on data item Xj and value V1
before the write and will have value V2
after the write.
<Ti commit > - transaction Ti commits.
With these log records we have the ability to undo or redo a

modification that has already been output to the DB.

I. Deferred Database Modification
This technique ensures atomicity by recording all

database modifications(updates) in the log, but deferring
(postpone) the actual updates to the database until the
transaction commits.
As no data item is written before commit record of the

transaction, we need only new value. Hence we perform
only redo operation.
The redo (Ti) operation sets the value of all data items
updated by transaction Ti to the new values.
All new values will be found in the log records.

Redoing is needed when we have all modifications on log, and have
doubts about successful writing to the DB.
Ex. Log Database
< T1 starts >
<T1, A, 900>
<T1, C, 800>
<T1, commits>
A = 900
C = 800
<T2, start >
< T2, B, 700>
<T2 commit >
B = 700
On failure, a transaction need to be redone if and only if the log

contains both <start> and <commit> records.
Otherwise we don’t have to do anything.
Log Database
< T1 starts >
<T1, A, 900>
<T1, C, 800>
<T1, commits>
A = 900
C = 800
<T2, start >
< T2, B, 700>
<T2 commit >
B = 700
<T3, start >

< T3, C, 200>
//FAIL//

II. Immediate Database Modification
In this, database modifications to be output to the database while the
transaction is still in the active state.
If such is the case for incomplete transactions, on failure, undo
operation is needed and for committed transactions redo may be
required.
System Log Database
< T1 starts >
<T1, A, 600, 900>
A=900
<T1, C, 300, 800>
C=800
<T1, commits>
<T2, start >
< T2, B, 400, 700>
B = 700
<T2 commit >
System Log Database
< T1 starts >
<T1, A, 600, 900>
A=900
<T1, C, 300, 800>
C=800
<T1, commits>
<T2, start >

< T2, B, 400, 700>
B = 700
<T2 commit >
<T3, start >

< T3, C, 100, 200>
C=200
//FAIL//
Checkpointing
In case of failure, the log needs to be searched to

determine the transactions that need to be redone or
undone.
But this searching is time consuming and most of the time

the algorithm will redo the transactions which actually
written their updates to the DB, redoing them is waste of
time.
In order to reduce these types of overheads check pointing

is helpful.

< T1 start>
< T1, D, 20>
< T1 Commit>
T2 and T3 are ignored because they did not reach their
[check point]
commit point.
< T4 start>
< T4,B, 12> T4 is redone as its commit occurred after latest check
pointing.
< T4, A, 20>
< T4 Commit> T1 committed before the latest checkpointing hence
no action.
< T2 start>
< T2,B, 15>
< T3 start>
< T3,A , 35>
< T2,D, 25>

Sequence of actions in checkpointing
 Output all log records currently in main memory onto stable storage
 Output all modified buffer blocks to the disk.
 Output log record <check point> on to stable storage.
 During the recovery process the redo / undo operations for the
transactions will be considered which occur after or just before
latest <check point> record on log.

Shadow paging
This technique is an alternative to log-based recovery method. We

know that the database is partitioned into same fixed length blocks
called as pages. These pages need not be in a particular order on
disk. A page table is used to find the location of ith block.
In shadow paging technique, two page tables are used. The first is
current page table and the second is shadow page table. When a
transaction starts both page tables are identical. The shadow page
table is never changed during the execution of the transaction. The
current page table may be changed when a transaction performs a
write operation. All input and output operations use current page table.
When a block page is modified it is written onto different location on
the disk and the old block which contains older values exist and can be
accessed using the shadow page table.
This is sufficient to recover from the failure.
This technique doesn’t require a log and no redo/undo operations are
needed.
Shadow paging
Shadow Pages on Current

page table disk page table

Summary
 The importance of the recovery mechanism in a DBMS

 Various recovery strategies
 Log-based recovery scheme
 How Deferred and Immediate modification techniques
work
 The concept of Checkpointing in recovery
 How Shadow paging recovery technique works


CS&IS Dept.
Hyderabad Campus
Conclusion to DBMS course
1. Introduction and Overview of DBMS

 Introduction to database systems
 Advantages
 Three schema architecture
 Data Independence
 Architecture
 Database users

2. Conceptual Database Design
(ER Modeling)
 Database Design process
 ER constructs
 Notations
 Class hierarchies

3. Relational Data model and Constraints
 Relations, tuples, and keys
 Integrity Constraints
4. Mapping from ER to Relational Schemas

Mapping Entities, Relations, Constrints
 Mapping Class hierarchies

5. Relational Algebra and Calculus
 Relational operators
 Join operation
 Grouping
6. SQL-99
 DDL
 DML
 Views in SQL

7. Functional Dependencies
 FDs
 Inference rules
8. Database Design and Normal Forms

 Rules for Normal forms
 Decomposition
 Lossless and Dependency preserving Decomposition

9. Storage and File structures
 Disk storage
 File and Record Organization
10. Hashing
 Internal Hashing
 Collision resolution
 Static and Dynamic eternal Hashing

11. Indexing
 Primary and Secondary Indexing
 Spare and Dense Indexing
 Multilevel Indexing
 B+ Tree Indexing
12. Transaction Model

 Advantages
 States
 Transaction Schedules

13. Concurrent Transactions
 Concurrent Transactions and Schedules
 Advantages and Disadvantages
 Serial and Serializable Scheduls
 Conflict Serializability
14. Concurrency Control

 Serializability
 Lock-based Protocols
 Timestamp-based protocols
 Deadlocks

15. Database Recovery
 Log-based Recovery
 Deferred and Immediate modification
techniques
 Checkpointing
 Shadow pagng

Course No: SEWP ZC322
Course Title: Database
Management Systems
Database State for COMPANY
ER DIAGRAM – COMPANY DATABASE
Entity Types:
EMPLOYEE,
DEPARTMENT, PROJECT,
DEPENDENT
Relationship Types:
WORKS_FOR,
MANAGES,
WORKS_ON,
CONTROLS,
SUPERVISION,
DEPENDENTS_OF
Database State for BLOOD BANK
ER DIAGRAM – COMPANY DATABASE
IDENTIFY THE FOLOWING:

• Entity
• Attributes
• Relationships
• Entity Integrity
• Referential Integrity
ER DIAGRAM – BLOOD BANK DATABASE
Database State for MERCHANT PAYMENT
PROCESSING
MERCHANTS
CUSTOMERS
ID TRANSACTIONS
ID
NAME ID
NAME
CODE REFERENCEID
ADDRESSLINE1
TYPE TYPE
ADDRESSLINE2
CANPROCESSSALE CARDNUMBER
CITY
CANPROCESSCREDIT CARDHOLDER
STATE
CUSTOMERID AMOUNT
ZIPCODE
ROWVERSION REQUESTDATE
CUNTRYCODE RESPONSEDATE
CONTACTNAME
ISAPPROVED
CONTACTEMAIL TERMINALS RESPONSECODE
CONTACTPHONE ID CUSTOMERID
SIZE CODE MERCHANTID
ROWVERSION CANPROCESSSALE TERMINALID
CANPROCESSCREDIT ROWVERSION
MERCHANTID
ROWVERSION
COUNRIES
CODE
NAME
ER DIAGRAM – MERCHANT PAYMENT
PROCESSING
Thank You
Course No: SEWP ZC322
Course Title: Database
Management Systems
BIOMETRIC SYSTEM
 A BioMetric system provides many benefits to organizations. It
enables an employer to have full control of all employees working
hours. It helps control labor costs by reducing over-payments,
which are often caused by transcription error, interpretation error
and intentional error. Manual processes are also eliminated as well
as the staff needed to maintain them.
 It is often difficult to comply with labor regulation, but a time and
attendance system is invaluable for ensuring compliance with labor
regulations regarding proof of attendance.
 Companies with large employee numbers might need to install
several time clock stations in order to speed up the process of
getting all employees to clock in or out quickly or to record activity
in dispersed locations. In the business world of today we all know
one simple truth…TIME IS MONEY! We work to keep the amount of
time it takes to complete even the simplest tasks down to the
minimum.
VEHICLE COMPARER FRAMEWORK
 Vehicle comparison is one of the most
happening comparison when people opt
and choose a type of vehicle.
 The Vehicle Comparer Framework allows
user to find the most economical models
available, or choose a specific vehicle and
decide on purchasing the vehicle.
 Vehicle Comparer Framework also allows
you to compare the performance, Engine
Details and running costs.
Thank You

DBMS Tutorial

Enviado por

Dados do documento

Descrição original:

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

DBMS Tutorial

Enviado por

Direitos autorais:

Formatos disponíveis

Database Management Systems

BITS Pilani Dr.R.Gururaj

1. Introduction and Overview of DBMS

BITS Pilani, Hyderabad Campus

1. R Ramakrishnan & J Gehrke, Database Management

2. Elmasri, Ramez, Shamkant B. Navathe, Fundamentals of

3. Date C.J., An Introduction to Database Systems, Pearson, 8th

4. Korth H F and A Silberschatz, Database System Concepts,

BITS Pilani, Hyderabad Campus

BITS Pilani, Hyderabad Campus

These databases help businesses to perform their day-to-day

Here some program access the database.

This data can be stored in a file on a computer.

Database: Is a collection of related data.

 It is a collection of logically related data.

 A database is designed, built and populated with

BITS Pilani, Hyderabad Campus

DBMS is a software system that facilitates the following:

3.Manipulating database for various applications: This encompasses activities like

BITS Pilani, Hyderabad Campus

One way to keep the information on computers is to store in files

Disadvantages of the above System

BITS Pilani, Hyderabad Campus

BITS Pilani, Hyderabad Campus

1. Extra cost due to SW, HW and training.

BITS Pilani, Hyderabad Campus

 What is Data, Database, and DBMS

BITS Pilani, Hyderabad Campus

BITS Pilani Dr.R.Gururaj

 Describing and Storing data in DBMS

BITS Pilani, Hyderabad Campus

Is a collection of high-level data description constructs that hide many

BITS Pilani, Hyderabad Campus

The central data description construct in this model is a relation,

Schema: Description of data in terms of a data model is called a schema.

Ex. Student (sid: string; name: string; age: integer)

every row follows the schema of the relation.

BITS Pilani, Hyderabad Campus

A schema can be regarded as a template for describing a student.

We can specify integrity constraints which are conditions that need to be

satisfied by records in the relation. Ex. uniqueness

BITS Pilani, Hyderabad Campus

1. Network Model: Though the basic structure is a record,

BITS Pilani, Hyderabad Campus

1. External schema: Used to describe the database at external level.

BITS Pilani, Hyderabad Campus

External Schema 1 External Schema 2 External Schema 3

Conceptual Level Conceptual Schema

Three schema architecture of

BITS Pilani, Hyderabad Campus

1. Logical data independence: changes in conceptual level schema

2. Physical data independence: The changes in physical features of

The above data independence is one of the important advantages of

The DBMS stores the description of schemas as System catalog.

RDBMS allows users to pose a rich class of questions in the form of

Relational data model has powerful query languages:

Formal query languages (based on strong mathematical logic)

BITS Pilani, Hyderabad Campus

Each transaction is a unit of atomicity.

A transaction is an atomic unit of work that is either completed in its

Incomplete Transactions and system crash

BITS Pilani, Hyderabad Campus

BITS Pilani, Hyderabad Campus