Você está na página 1de 78

Compiled by W.

Sithole

DATABASE SYSTEMS CONCEPTS & DESIGN

1. DATABASE ENVIRONMENT

Definition of a Database:
It is a shared collection of interrelated data designed to meet the varied information needs of an
organisation.

It is integrated and it is shared


Integrated - previously distinct data files have been logically organised to
eliminate (or reduce) redundancy and to facilitate data access.
Shared - all qualified users in the organisation have access to the same
data, for use in a variety of activities.

It is a structured collection of stored operational data used by all the application systems of an
organisation. It is independent of any individual application

It is a central source of data to be shared by many users for a variety of related applications.

Data as a Resource:
Information, which is the analysis and synthesis of data is one of the most vital of corporate
resources of late.

Structured into models for planning and decision making


Incorporated into measurement of performance and profitability
Integrated into product design marketing methods

Information is recognised and treated as an asset

Acceptance of data resource management is demonstrated by:


Firm commitment to the data base approach
Successful established of the data administration function

Database Concepts:
The two essential concepts are based on:
A data model is the logical structure of the data as it appears at a particular level of the
database system. Each application, which uses a database, has its own data model

Data Models
How data appears as viewed by different applications using the same database system.
E.g. customer accounts file contain details about goods - stock file contain details about goods
Data Independence
Data models are not affected by any changes in storage techniques
Central data model & associated data models are distinct from the arrangement of data on any
particular storage media.

Reality, Data and Metadata


The real world itself will be referred to as reality. Data collected about people, places, or events in
reality will eventually be stored in the file or database. In order to understand the form and
structure of the data, information about the data itself is required. The information that describes
the data is known/referred to as metadata. The relation between data, reality and metadata is
pictured as follows: -

1 dbase notes.doc
Compiled by W. Sithole

OBJECTS, EVENTS DATA DICTIONARY DATA BASE


DIRECTORY

Entity Class Record Definitions Record Occurrences

Attributes Data item definitions Data item occurrences


Reality Metadata Data
(real world) (data definition) (data occurrences)

Entity
An object or event about which someone chooses to collect data is an entity. An Entity may be a
person, or a place for example, a sales person, a city or a product. An entity can be also an event
or unit of time, such as a machine breakdown, a sale, or a month or a year.

Entity Class
It is a collection of entities with similar characteristics. It is known as Entity Sets/Entity Types. It
is grouped for convenience.

2 dbase notes.doc
Compiled by W. Sithole

Attribute
It is a property of a real world-entity rather than as a data-oriented term.
It is a property of an entity eg. Customer
Customer Number
Customer Name
Address
Telephone
Credit Limit
Balance

An attribute is some characteristics of an entity. There can be many attributes for each entity for
example, a patient can have many attributes, such as last name, first name, address, city and so on.

The word data item is also used in conjunction with an attribute. Data element is simply a
synonym for data item.

Data items can have values. These values can be of fixed or variable length. They can be
alphabetic, numeric or alphanumeric. Sometimes a data item can be referred to as a field.

A field represents something physical not logical, therefore many data items can be packed into a
field. A field can be read and can be converted to a number of data items. A common example of
this is to store the date in a single field as mm/dd/yyyy. In order to sort the file, in the order of
date, three separate data items are extracted from the field and sorted first by year, then by month,
and finally by day.

Typical values assigned to data items may be numbers, alphabetic characters, special characters,
and a combination of all three. These can be illustrated as follows: -

ENTITY DATA ITEM VALUE


Salesperson Salesperson number salesperson 77865
name Thompson
Company name Ceata Enterprises
age 40
address 42 Musasa Close $15,800.00
sales
Package Code A209
Width 16
Weight 32
Mailing Address Box 1294, Harare
Return address Box A2098, Mutare
Order Order Number 53541 H
Description Shirts
Quantity ordered 120
Amount $1,500.00
Order placed by Takura

Identifier
This is an attribute that uniquely distinguishes an entity from the rest eg. EC Number identifies an
employee.

Association
Forms a relationship between two or more entities eg.

3 dbase notes.doc
Compiled by W. Sithole

Direct representation of association between entities distinguishes data base approach from
conventional file application.

Relationships
These are associations between entities (sometimes they are referred to as data associations). They
imply that values for the associated data items are in some way dependent on each other.

Records
A record is a collection of data items that have something in common with the entity described.
Below is a diagram to illustrate the structure of a record

Order File
Order# Description Quantity Amount

A001 Shirt 1200 35000 R


E
A002 Short 1000 16000 C
O
A003 Dress 2000 99000 R
D
A004 Trousers 1300 75000 S

A005 Vests 1100 12000

Keys
A key is one of the data items in a record. When a key uniquely identifies a record, it is called a
primary key for example order# can be a primary key because there is only one number assigned to
each customer order. In this way a primary key identifies the real world, that is customer order.

A key is called a secondary key if it can not uniquely identify a record. Secondary keys can be
used to select a group of records that belong to a set for example orders that come from the city of
Mutare. When it is not possible to identify a record uniquely by using one data item found in a
record a key can be constructed by choosing two or more data items and combine them.

When a data item is used as a key in a record, the description is underlined therefore in the order
record: -

(order#, description, quantity, amount) the key is order#.

If an attribute is a key in another file it should be underlined with a dashed line (_ _ _ _ _) and it is
a foreign key in this file.

Metadata

Metadata is data about the data in the file/database.


It describes the name given, type and the length assigned to each data item
It describes the length and composition of each of the records
It is kept in a Data Dictionary

4 dbase notes.doc
Compiled by W. Sithole

Example
Data Item Data Type Length
Name Character 10
Surname Character 15
Date of Birth Date 10
Weight Numeric 2

Data item
This is a unit fact, the smallest named unit of data in a database that has meaning to a user.
It is also known as data element, field, or attribute.
Preferences:
Data item - unit of data
Field - is physical rather than logical term that refers to the column position within a record where
a data item is located.
Examples:
Employee-Name, Student#

Data Aggregate
It is a collection of data items that is named and referenced as a whole
Example:
NAME = Last-Name, First-Name, Initials
In COBOL data aggregates are referred to as group items. In the data dictionary they should
include; data aggregate name, description, names of the included data items.

TRADITIONAL FILE PROCESSING SYSTEMS


It is programming with files.

Each user defines and implements the files needed for a specific application.

Data records are physically organised on storage devices using either sequential or random file
organisation, so that, each application has its own separate data file or files and software programs.

Example:

User1: Grade reporting Officer


Keeps a file on students and their grades
Implements program to print students transcript and enter new grades into the file

User2: Accounting Officer


Keeps track of students' fees and payments

Although both users are interested in data about students, each maintains separate files, programs
to manipulate these files, each requires data not available from the other's files

Results in redundancy in defining and storing data resulting in wastage of storage space &
redundant efforts to maintain common data up-to-date

In the Database Approach, a single repository of data is maintained, defined once and accessed by
various users.

5 dbase notes.doc
Compiled by W. Sithole

Advantages & Disadvantages


In a Traditional File Environment all the methods of file organisation are associated with
individual files and individual software programs.

What if the information required to solve a particular problem is located in more than one file?

Often extra programming and data manipulation will be required to obtain that information, for
example:-
Suppose you want to know all of the orders outstanding for a particular
customer. Some of the information is maintained in the order file, for an order
entry application. The rest of the information is maintained in a customer master
file. Thus the required information is stored in several files, each of which is
organised in a different way. To extract the required information, there is need
to sort both files until the records are arranged in the same order. Records from
these files will have to be matched, and the data items from the merging of both
files will have to be extracted and output.

- Obtaining this information requires additional programming and creation of more files
- Most organisations have developed information systems one at a time, as the need arises,
each with its own set of programs, files and users. After some time, these applications
and files may reach to a point where the organisation's information resources maybe out
of control.
- Some symptoms of this crisis are:
Data redundancy (similar data in different files)
Program or Data Dependency
Data Confusion (caused by continuously opening and closing different
files)
Excessive costs

1. Data Redundancy
Refers to the presence of duplicate data in multiple data files or in several data files. The
same piece of data, such as employee name and address, will be maintained and stored in
several different files by several systems. Separate software programs must be developed
to update this information and keep it current in each file in which it appears.

2. Program/Data Dependency
Refers to the close relationships between data stored in files and specific software
programs required to update and maintain these files. Every computer program or
application must describe the location of the data it uses. In a traditional file
environment, any change to the format or structure of the data in the file necessitates a
change in all of the software programs that use the data.

3. Data Confusion (Inconsistency of data)


Refers to inconsistency among various representations of the same piece of data in
different information systems and files. Over time , as different groups in a firm update
their applications accordingly to their own business rules, data in one system become
inconsistent with the same data in another system for example, the student name and
addresses maintained in a college student enrolment system and in a separate system to
generate mailing labels may not correspond exactly if each system is updated with
different software programs, procedures and time frames.

4. Excessive Software Costs


Normally result from creating, documenting and keeping track of so many files and
different applications many of which contain redundant data.

6 dbase notes.doc
Compiled by W. Sithole

These problems can be easily viewed or pictured or visualised through the following illustrations:

Illustration Of Traditional File System

DATA FILES APPLICATIONS USERS

Cust Name Savings


Social Security# Savings Accounting
Savings A/C ID Account System
A/C Balance

Cust Name
Social Security#
Address Loan
Loan A/C ID Loan Accounting
Interest Rate Account System
Loan Period
Loan Balance

Cust Name
Social Security#
Address Checking Checking
Checking A/C ID Account Accounting
Account Balance System

Class discussion on advantages of Traditional File Environment


What are the justifications of Database on Organisation?

Advantages:
1. Easy to create and simple to use
2. Require minimal overheads to access and use

Data Base Approach

Characteristics Of Database Approach Versus Traditional File Processing Approach:

In the conventional file processing, the user defines and implements files for specific applications.
In the database approach, a single repository of data is maintained and defined once and accessed
by various users.

Four characteristics most important in distinguishing a database system from a traditional file
processing system are:

i) Self-contained nature of database system:


The database system contains not only the database itself but also the complete definition
or description of the database. The definition (or metadata) is stored in a system catalog.

In traditional file processing, data definition is typically part of application programs

ii) Insulation between programs and data:


In traditional file processing systems, the structure of data files is embedded in access
programs. Hence, any change to the structure of a file may require changing all programs
that access the file.

7 dbase notes.doc
Compiled by W. Sithole

In a database system, the DBMS access programs are written independently of any
specific files. The structure of data files is stored in the DBMS catalog separately from
the access programs. This is called program-data independence.

iii) Data abstraction:


A DBMS should provide users with a conceptual representation of data that does not
include many of the details of how it is stored. A data model is a type of data abstraction
that provides this conceptual representation. The data model uses logical concepts such
as objects, their properties, and their interrelationships, which may be easier for users to
understand than storage concepts.

iv) Support of multiple views of data:


A database typically has many users each with a different perspective or view of the
database. A view may be a subset of the database or it may contain virtual data that is
derived from the database but not explicitly stored.

8 dbase notes.doc
Compiled by W. Sithole

DBMS CONCEPTS

Data Model, Schemas and Instances

A data model is the main tool for providing abstraction. It is a set of concepts used to describe the
structure of a database. It includes a set of operations for specifying retrievals and updates.

It is important to distinguish between the description of a database and the database itself. The
description of a database is called a database schema. The description of an entity is called a
schema. Data in the database at a particular moment is called a database instance.

DBMS ARCHITECTURE

Here we are looking at an architecture for database systems, called the three-level-schema
architecture.

Three-Level schema Architecture

The goal of the three-level schema architecture is to separate the user applications from the
physical database. In this architecture, schemas can be defined at the following three levels:

The internal level has an internal schema, which describes the physical storage structure of the
database. It is the one closest to the physical storage, that is, it is the one concerned with the way
the data is physically stored. This is usually the one taken by the systems programmers. The
systems programmer is concerned with the actual physical organisation and placement of the data
element in the database. The internal view is the internal or hardware view of the database. The
internal schema uses a physical data model and describes the complete details of data storage and
access paths for the database. The systems programmer designs and implements this view by
allocating cylinders, tracks and sectors for the various segments of the database, so that various
programs can run as smoothly and efficiently as possible.

The conceptual level has a conceptual schema, which describes the structure of the whole database
for a community of users. It is a logical view. It is how the Database appears to be organised to
the people who designed it. The conceptual schema is a global description of the database that
hides the details of physical storage structures and concentrates on describing entities, data types,
relationships and constraints. It is the view usually used by the Database Administrator. It includes
all the data elements in the Database and how these data elements logically relate to each other.

The external or view level includes a number of external schemas or user views. It is the one
concerned with the way the data is viewed by individual users, and is usually used by an
application programmer. Each external schema describes the database view of one group of
database users. Each view typically describes the part of the database that a particular user group
is interested in and hides the rest of the database from that user group.

9 dbase notes.doc
Compiled by W. Sithole

The three-schema architecture is illustrated below

END USERS

EXTERNAL EXTERNAL EXTERNAL


LEVEL VIEW 1 ... VIEW n

external/conceptual
mapping

CONCEPTUAL CONCEPTUAL
LEVEL SCHEMA

conceptual/internal
mapping

INTERNAL INTERNAL
LEVEL SCHEMA

STORED
DATABASE

10 dbase notes.doc
Compiled by W. Sithole

UserA1 UserA2 UserB1 UserB2 ...

External External
View A View B

External/Conceptual External/Conceptual
Mapping A Mapping B

Conceptual DBMS
View

Conceptual/Internal Mapping

Internal View
Stored Database

11 dbase notes.doc
Compiled by W. Sithole

Data Independence

The three-schema architecture can be used to explain the concept of data independence, which can
be defined as the capacity to change the schema at one level of a database system without having to
change the schema at the next higher level. There are two types of data independence:

Logical data independence is the capacity to change the conceptual schema without
having to change external schemas or application programs. We may change the external
schema to expand the database by adding a new record type or data item, or to reduce the
database by removing a record type or data item.

Physical data independence is the capacity to change the internal schema without having
to change the conceptual (or external) schemas. Changes to the internal schema may be
needed because some physical files are reorganised for example, by creating additional
access structures to improve the performance of retrieval or update. If the same data as
before remains in the database, we should not have to change the conceptual schema.

Benefits/Advantages of Database Approach

1. Reduced Data Redundancy


A Database minimises duplication of data from file to file, thus a student's name and
address might appear in only one record in a University Database rather than in the files
of many departments. Only one copy of each data item is kept, duplication of data is
eliminated.

Improved consistency of data while reducing the waste in storage space due to a reduced
redundancy.

2. Data Independence
A Database system keeps descriptions of data separate from the applications that can
occur without necessarily requiring changes in every application program that uses the
data.

Promotes data independence, which insulates application programs from modification of


the database.
3. Data Sharing
Permission to develop new data processing applications without having to create new data files
through data sharing. Data can be accessed from a single central source for use by many users for
different applications.
4. Permits centralised control over data standards, security restrictions, and integrity controls.
Uniform system of security monitoring via centralised system.

5. Encourages use of powerful query languages by users without previous programming


experience. The result could be reduction in program maintenance cost. The cost to upgrade
application programs in response to changes in the file structure. Some database also store
data in ways that do not depend on the storage media used. Thus if new disk drives are
purchased, the data may not need to be recognised to remain accessible to the application
programs using them.

6. Increased application programmers and user productivity

Most DBMS offer application program development tools that help application
programmers in writing program codes. These tools can be very powerful, and they
usually improve an application programmers productivity substantially. Object oriented
databases provide developers with libraries of reusable codes to speed up development of

12 dbase notes.doc
Compiled by W. Sithole

applications. Users also increase their productivity when Query Languages and report
generators allow them to produce reports from the database with little technical
knowledge and without any help from the programmers, thus avoiding the long time
periods that MIS departments typically take to develop new applications. The result is
greater use of the corporate database for ad-hoc queries. Users also increase their
productivity when they use microcomputer software designed to work with mainframe
database. This allows them to acquire and manipulate data with easy, without requiring
the assistance of programmers.

7. Improved Data Integrity: Because data redundancy is minimised and the threat to data
integrity is reduced. Data integrity ensures that the data in the database is accurate. Updated
values are available to all applications and it ensures data consistency to all applications.

8. Reduced data complexity


Consolidated management of data, access and utilisation via the DBMS reduces complexity.

9. Eliminates data confusion


Data confusion can be eliminated because there is one and only one source and definition for
the data.

10. Setting up of new applications made easy


It is a matter of just extending the database and providing new interfaces because most of the
data is already available. This is time saving in that there is no need for starting from scratch.

Problems/Disadvantages of Databases

DBMS provide many opportunities and advantages, but these advantages may come at a price.
DBMS also poses problem as:-

1. Resource Problems
Characterised with high initial investment and possible need for additional hardware. It
requires large software system for creation and maintenance. It also requires a fairly large
computer to support it. A Database system usually requires extra computing resources.
After all, the new database system programs must run much more data, must be stored on-
line to answer queries, which we hope will increase. As a result much more terminals
may be needed to put managers and other users on-line, to the additional hard disk
system, which may be needed to put more data on-line and make it available to managers.
Communications devices maybe needed to connect the extra terminals to the database. It
maybe even necessary to increase the size or number of CPUs to run the extra software
required by the database system.

Currently PCs are becoming more powerful and DBMS becoming more compact
therefore the problem is becoming less serious. It is also being overcome by availability
of distributed relational databases

2. Security Problems
A database must have sufficient controls to ensure that data is made available to
authorised personnel only and that adding, deleting and updating of data in the database is
accomplished by authorised personnel only. Access security means much more than
merely providing log in codes, account codes and passwords. Security considerations
should include some means of controlling physical access to terminals, tapes, and other
devices. Security considerations should also include the non-computerised procedures
associated with the database such as forms to control the updating or deletion of records
or files and procedures for storing source documents. In addition, access to employee,
vendor, and customer data should conform to various state regulations, such as the 1974

13 dbase notes.doc
Compiled by W. Sithole

Privacy Act, and the 1978 Right to Financial Privacy Act. Certainly the database should
contain an archiving feature to copy all important files and programs and these should be
procedures for regular update and storage of these archival copies.

Failure of database system through hardware or software problems, or malicious damage or


industrial action can adversely affect the organisation since all its data processing is dependent
on the database.

3. Ownership Problem
In file based systems employees who run application programs on application specific
files frequently feel that the data in these files are theirs and theirs alone. Users, such as
payroll department, personnel develop ownership of the files in the system. When a
database of such files is created, the data is owned by the entire company. Any user with
a need should be able to obtain the authority to read or otherwise access the data.
However, for a database to be successful the data must be viewed and treated as a
corporate resource, not as an individual's property.

Security and integrity may be compromised if DBA does not administer the database
properly.

The organisation experiences and overhead cost for providing security, concurrency
control, and recovery and integrity functions.

The generality with which the DBMS provides for defining and processing data can also
be problematic.

Justification of Data Base in an Organisation


Application needs constantly changing
High frequency of rapid access to answer ad hoc questions
Need to reduce long lead times and high development costs
Need to share data throughout the organization
Need to cross functional and departmental boundaries in order to communicate and relate data
Need to improve quality and consistence of data
If substantial dedicated programming assistance is not normally available

COMPONENTS OF THE DATA BASE


User Group
DBMS
Database
Data Dictionary
User/System Interface
Data Base Administration & Hardware

Database System Elements:


Stored Data
Various Data Models
Software to maintain data (DBMS)
Person working with the database

Database Management System is a layer of software which maintain the database & provide an
interface to the data for application programs, which use it.

The DBMS (Database Management System) is the heart of the database.

14 dbase notes.doc
Compiled by W. Sithole

It allows creation, accessing, modification and updating of the database and the retrieval of data
and the generation of the reports.

All transactions between users and database are through DBMS.

The DBA (Database Administrator) ensures that the database meets its objectives.

In charge of the overall running of the database system.

It requires software & managerial skills

Technical Responsibilities of a DBA:

To set up the database

To control and manage the Database

To identify the needs of an organisation and of the users

To define, implement and control the database storage including the structure of the database.

To define & control access to the database.

To coordinate the data resources of the whole enterprise using user and management
cooperation.

To ensure that policies and procedures are established to guarantee effective production,
control and use of data.

To define a strategy for backup storage and recovery from breakdown

To decide how data is to be stored

To decide on the information content of the database & structure of different data models.

ACCESSING THE DATABASE THROUGH THE DBMS

THE DATABASE SYSTEM

It is a computer record keeping system

15 dbase notes.doc
Compiled by W. Sithole

BUILDING BLOCKS OF A COMPUTER BASED ELECTRONIC DATABASE SYSTEM

BIT

BYTE/CHARACTER

DATA ELEMENT/FIELD

RECORD

FILE

DATABASE

A BIT is a binary digit, which is either a 0 or a 1

A BYTE is a collection of bits representing a character

DATA ELEMENT/FIELD is a collection of characters describing one attribute of an entity

An ENTITY is anything we can collect/store information on

An ATTRIBUTE is an element that makes up something or an entity. In database an attribute is known as


a field

A RECORD is a collection of related data elements describing an entity

A FILE is a collection of records describing an entity

A DATABASE is a collection of related files

16 dbase notes.doc
Compiled by W. Sithole

THE DATABASE MANAGEMENT SYSTEMS

A DBMS is a collection of software programs that :-

1. Store data in a uniform way


2. Organise the data into records in a uniform way
3. Allow access to the data in a uniform way

- In a DBMS, applications do not obtain the data they need directly from the storage media
(database)
- They request the data from the DBMS
- The DBMS then retrieves the data from the storage media and provides them to the application
programs
- A DBMS operates between application programs and the data

The illustration below shows the relationship of Application Programs, the DBMS and the Database.

COMPONENTS OF A DBMS

DBMS system software is usually developed by commercial vendors and purchased by organisations.

The components of a particular DBMS vary from one vendor to another.

Some of these components are typically used by information specialists in the system, for example,
information systems specialists typically use the Data Dictionary, Data Languages, Teleprocessing Monitor,
Applications Development Systems, Security Software and archiving and recovery system components of
DBMS.

Other components such as Report Writers and Query Languages may be used by both programmers and
other non-specialists.

DATA DICTIONARY/DIRECTORY/DATABASE SCHEMA

Contains the names and description of every data element in the Database

It also has a description of how data elements relate to one another

Through the use of its data dictionary, a DBMS stores data in a consistent manner thus reducing
redundancy. For example, the data dictionary ensures that the data element representing the number of an
inventory item named (stocknum) will be of uniform length and have other uniform characteristics
regardless of the application program that uses it.

Application developers use the data dictionary to create the records they need for the programs they are
developing

A Data Dictionary checks records that are being developed against the records that already exists in the
database and prevents redundancy in data element names

Because of the data dictionary an application program does not have to specify the characteristics of the
data it wants from the database. It merely requests the data from the DBMS

17 dbase notes.doc
Compiled by W. Sithole

This may permit changing the characteristics of a data element in the data dictionary without changing it in
all the application programs that use the data element

Defines Metadata

DATA LANGUAGES

To place a data element into the Data Dictionary, a special language is used to describe the characteristics
of the data element.

This language is called a Data Description Language or DDL.

To ensure uniformity in accessing data from the database, a DBMS will require that standardised commands
be used in application programs.

These commands are part of a specialised language used by programmers to retrieve and process data from
the Database.

This language is called the Data Manipulation Language or DML

A DML usually consists of a series of commands such as FIND, GET, APPEND etc.

These commands are placed in an application program to instruct the DBMS to get the data the application
needs at the right time

SECURITY SOFTWARE

A security software package provides a variety of tools to shield the Database from unauthorised access.

ARCHIVING AND RECOVERY SYSTEM

Archiving programs provide the Database Manager with the tools to make copies of the database, which can
be used in the case of the original database records are damaged.

Restart or recovery systems are tools used to restart the database and to recover lost data in the event of a
failure

REPORT WRITERS

A report Writer allows the programmers, managers and other users to design output reports without writing
an application program in a programming language such as COBOL, SQL and other QUERY Language

A Query Language is a set of commands for creating, updating and accessing data from a Database.

Query Languages allow programmers to ask ad-hoc questions of the database interactive without the aid of
programmers

A form of a Query Language is SQL (Structured Query Language)

SQL is a set of about several English like commands that has become a standard in Database industry and
development

For SQL is used in many DBMS, managers who understand SQL syntax are able to use the same set of
commands regardless of the DBMS.

18 dbase notes.doc
Compiled by W. Sithole

This software must provide the manager with access to data in many Database Management Environments.
The basic form of an SQL command is:-

SELECT ........ FROM ....... WHERE ..........

After SELECT you list the fields you want to display

After FROM you list the name of the file or group of records that contain those fields
After WHERE you list any condition for the search of the records

Example:

If you wish to select all customer names from customer database where the city in which the customer lives
is Harare

Solution:

SELECT * ALL FIELDS


FROM customer
WHERE city = "Harare"

OR

SELECT name, DOB, Credit_Limit, City SPECIFIED FIELDS ONLY


FROM customer
WHERE city = "Harare"

The results would be a list of ALL fields/Specified fields of customers located in Harare only

Some of the Query Languages use a natural language set of commands


These Query Languages are structured so that the commands used are as close to standard English as
possible. For example the following statement might be used:
PRINT names and address of all customers who live in Harare

Query Languages allow users to retrieve data from database without having detailed information about the
structure of the records or without being concerned about the processes the DBMS uses to retrieve the data.
Furthermore managers do not have to learn COBOL, BASIC etc

TELEPROCESSING MONITOR

It is a communications software package that manages communication between the database and remote
terminals

Teleprocessing monitors often handle order entry systems that have terminals located at remote sales
locations.

These maybe developed by DBMS software firms and offered as a companion package to their database
products

19 dbase notes.doc
Compiled by W. Sithole

EVOLUTION OF DATA MANAGEMENT SOFTWARE

RUDIMENTARY FILE STORAGE STRUCTURED RELATIONAL DBMS


INPUT OUTPUT SOFTWARE ACCESS DBMS
SOFTWARE METHOD

Program Program Program Program

Access Method Logical Schema External Schema

Physical Schema Conceptual Schema

Internal Schema

No independence Storage independence Physical data Logical data independence


independence
Physical data Storage units can be Physical and logical External and conceptual
description in the changed structures separated structures separated
program

1. Earliest data processing applications there was no formal data management software, all data
descriptions and input/output instructions were coded in each application program resulting in no
data independence every change to a data file required modification or rewriting of the application
program.
2. Access Methods was the first formal data management software. It is a software routine that manages
the details of accessing and retrieving records in a file providing storage independence. Storage units
can be changed (newer units replacing older units) without altering or modifying application programs.
3. Two-level schema (two-schema architecture) was the most early database management systems
employed. Logical schema corresponds to an external or user view that describes the data as seen by
each application program. A physical schema corresponds to the internal schema that describes the
representation of data in computer facilities. This resulted in physical data independence that is, the
data structures or methods of representing data in secondary storage could be altered without modifying
application programs e.g. To achieve efficiency, linked lists could be used instead of indexes without
changing application programs. The two-level schema was characteristics of structured database
management systems, such as those that use the hierarchical and network data models. This did not
provide logical data independence.
4. Three-level schema provided by contemporary relational DBMS. The conceptual schema provides an
integrated view of the data resource for the entire organisation. This schema (conceptual) evolves over
time new data definitions are added to it as the database grows and matures. It provides both logical

20 dbase notes.doc
Compiled by W. Sithole

and physical independence. It has logical data independence; the conceptual schema can grow and
evolve over time without affecting the external schema resulting in existing application programs not
need to modify as database evolves.

A database management system that provides these three levels of data is said to follow a three-schema
architecture.

A schema is a logical model of a database. It captures the metadata that describe an organisations data in a
language that can be understood by the computer.

Level of data independence Examples of changes


Logical
Data item format Data item type, length, representation, or unit of
measure
Data item usage How a data item is derived, used, edited, or
protected
Logical record structure How data items are grouped into logical records
Logical data structure Overall logical structure or conceptual model
Physical
Physical data organisation How the data are organised into stored records
Access method What search techniques and access strategies are
used
Physical data location Where data are located on the storage devices
Storage devices Characteristics of the physical storage devices
used.

Physical data independence insulates a user from changes to the internal model
Logical data independence insulates a user from changes to the conceptual model.

21 dbase notes.doc
Compiled by W. Sithole

FILE ORGANISATION

A file contains groups of records used to provide information for operations, planning, decision
making etc.
It is a technique for physically arranging the records of a file on a secondary storage device.

Overview of basic file organisation:

File Organisation

Sequential Indexed
Direct

Nonsequential (full Relative-addressed Hash-addresses


Sequential (block index)
index)

Hardware independent
(VSAM)
Hardware
dependent (ISAM)

22 dbase notes.doc
Compiled by W. Sithole

Comparison of Basic File Organisation:


a)

Start of Asteroids Breakout Combat Zaxxon


File

b)
Key H P Z

A D H K M P Q Z

Asteroids Defender Megamania Zaxxon

c)
Chess Combat Defender Faceoff Zaxxon
Relative
Record number 1 2 3 4 n

d)
Key Hashing routine Relative record #

Pitfall Berserk Odyssey Donkey Kong Space Invader


Relative
Record # 1 2 3 4 n

23 dbase notes.doc
Compiled by W. Sithole

Organisation Access
a) Sequential: Sequential:
Physical order of records in the file is the same Accessing a record is only by first
as the order in which records were written to the accessing all records that physically
file normally in ascending order of primary precede it
key
b) Indexed Sequential: Random/Sequential:
Records are stored in physical sequence Random access of individual records is
according to the primary key. possible without accessing other records.
The file management system or access method,
builds an index, separate from data records that Entire file can be accessed sequentially
contains key values and pointers to the data
themselves
c) Relative: Relative:
Also known as direct file organisation Each record can be retrieved by specifying
Records are often loaded in primary key its relative record number, which gives the
sequence so that the file can be processed position of the record relative to the
sequentially, but records can also be in random beginning of the file.
sequence.
The user or application program has
To specify the relative location of a desired
record.
d) Hashed: Relative:
Also known as direct file organisation in which Record is located by its relative record
hash addressing is used. number, as for a relative organisation.
The primary key value for a record is converted .
by an algorithm (called hashing routine) into a
relative record number.
Records are not in logical order.
Hashing algorithm scatters records throughout
the file, and is normally not in primary key
sequence.

Basic Access Modes:


1. Sequential:
Record can be retrieved only by retrieving all the records that physically precede it.
Generally used for copying files and for sequential batch processing of records.
2. Random:
Record is accessed out of the blue without referencing other records in the file.
It follows no predefined pattern
Typically used for on-line updating and/or retrieval of records.

File organisation is rarely changed but record access mode can change each time the file is used.

Permissible File Organisation & Record Access Modes:

File Record Access Mode


Organisation Sequential Random
Sequential Yes No (Impractical)
Indexed Sequential Yes Yes
Direct-Relative Yes Yes
Direct-Hashed No (Impractical) Yes

24 dbase notes.doc
Compiled by W. Sithole

There are several file organisation methods namely:

1. Hashed file organisation


2. Clustering file organisation
3. Index file organisation
4. Compression file organisation

1. The Hashed File Organisation


Direct access devices also permit access to a given record by going directly to its address.

Since it is not feasible to reserve a physical address for each possible record a method
called Hashing is used. Hashing is the process of calculating an address from the record
key.

Suppose that there were 500 employees in an organisation and we wanted to use the
Social Security Number as a key, it would be inefficient to reserve 999 999 999
addresses, one for each social security number.

Therefore, we could take the social security number and use it to derive the address of the
record. There are many hashing techniques, a common one is to divide the original
number by a prime number. This is known as the Division Method that approximates the
storage locations and then use the remainder as the address, as follows:-

Begin with the Social Security Number 053-4689-42. Then divide by 509,
yielding 105047. Note that 105047 multiplied by 509 does not equal 53468923
instead. The difference between the original number 53468942 and 53468923 is
19.

The storage location of the record for an employee whose Social Security
Number is 472-3840-86 has the same remainder.

When this occurs, the second person's record should be placed in a special
overflow area.
Example:
Qn. Given the following number 472-3840-86 divided by 509
(prime number). Find the physical location.
Solution:
The physical location is 472384086/509
Yielding 928063 X 509
= 472384067
Then the original number 472384086 as a difference of 19

Therefore the physical address is 19.


Converts record key into the relative record number with a hashing algorithm.

Modular Arithmetic
Divide key by the number of locations available for storage, and take the remainder
for example, 100 locations and a 4-digit key 1537:
Storage location 153 remainder 37
Therefore the storage location is 37
Alphanumeric keys need to be converted to base 36 or ASCII code for each character
or digit

Folding
Divide key into two or more parts added together. For example 872377 = 872

25 dbase notes.doc
Compiled by W. Sithole

377
1249
Then apply the Modular Arithmetic

Divide and remainder is a common hashing algorithm


Divide record key by a prime number to determine the remainder
Prime number must be greater than the number of actual records
Prime number must contain an allotment for future file expansion.
Example: For a 10,000 product inventory system, prime number
11,001 must be used which allows 1,001 additional expansion positions
(10%)
Nonnumeric record keys:
Either strip off record key of its nonnumeric characters
Or convert them to numbers
Hashing algorithm for alphanumeric conversion is the Soundex System.
B,F,P, V are assigned 1
C, G, J, K, Q, S, X, Z are assigned 2
D, T are assigned 3
L is assigned 4
M, N are assigned 5
R is assigned 6
A, E, H, I, O, U, W, Y are assigned 0
Example:

a) Product # C-64744 strip off nonnumeric characters


Location = Remainder (Record Key divided by Prime number)
= 64744/11001
= 5 Remainder 9739

b) BURNS 10652
It works like a one-way street cannot be worked backwards.

Advantages:
Supports applications demanding quick record retrieval because locating and reading
desired record into memory usually requires a single access to the disk.
Involves a single calculation for finding the record number
Permits both numeric and alphanumeric keys
Easily implemented with COBOL, C, PASCAL instructions

Disadvantages:
Hashing algorithms might result in collisions, that is identical remainders called
crashes or synonyms. For example Product Number C-64744 and F-42742 both
yield remainder 9739 when divided by 11001.
When collisions occur an indicator is stored in the first record to warn a user of the
crash. The indicator reveals where the other record really resides.
Due to collisions, extra disk space is allotted for a record that would otherwise
collide with another
Due to random order of the file, a sorting step must occur before listing or otherwise
processing the file in sequence
When file becomes full, a programmer writes a one-time program to rebuild it with
expansion space.

NOTE:

26 dbase notes.doc
Compiled by W. Sithole

Sequential, direct, or indexed files depend on the users needs


For instant access to data direct and indexed techniques apply
For batch environments sequential techniques apply
HASHING

It is a process of designing an algorithm to calculate an address

HASH TABLE METHODS:

- This is a table scheme in which updates, searches and deletions could ideally be done in a constant
time
- We seek a mathematical function which produces table addresses when supplied with the key
- Since there are many more possible key values than addresses, this becomes a many-to-one function
in which many different values can read to the same address.
- Since we do not know which keys will arise in front, it is possible that 2 keys with the same address
will arrive and a hash collision will occur
- Therefore to design a good hash table we must find a solution to the following two problems:-

a) Find a hash function that minimises the number of collisions by spreading arriving records
around the table as evenly as possible

b) Since any hash function is many-to-one collisions are inevitable and therefore a good way of
resolving them is necessary

- There are basically four methods which are used to produce hash tables which are:- (mainly for
system software programming)

1) Truncation
2) Division
3) Midsquare
4) Partition/Folding

TRUNCATION

- This is a method where you normally take the last characters of the address
eg h(2467) = 467
h(12601) = 601
h(12467) = 467

- the advantage of truncation is that it is a fast method


- The disadvantage is that you must study the keys thoroughly to minimise collisions

DIVISION

- You take the key and MOD it by the MAXSIZE that is you will use the function:-
key MOD MAXISIZE

eg 21 MOD 8 will give an address 5 (remainder)


- This method id popular because it has got a wide range of addresses
- Its main disadvantage is that the computer takes more time in dividing
- It is an advantage to use maxisize which is a prime number to reduce the number of zeros such that we
do not end up having a sequential search

MIDSQUARE

27 dbase notes.doc
Compiled by W. Sithole

- It converts the filename into its decimal equivalence, finds the middle digit and square it to give the
address
eg 49294 = 4
24683 = 36

PARTITION/FOLDING

- This method divides the number into groups, adds the individual groups to give the address
eg 510324 = 51+03+24
78
Therefore h(510324) = 78

RESOLVING HASH COLLISIONS

- The technique of searching in a systematic and repetitive fashion for an alternative notation is called
PROBING

(1) Linear Probing


Uses the following formula:-

inc(i) = (i + 1)MOD MAXITEMS

Incrementing function

- The incrementing function takes an address (i) not a key and produces another hash address
- If the new location is occupied, we take that hash address and pass it again through the incrementing
function etc until we find an open location and with luck we may be able to place it in a few probes
- Therefore we should have an indicator to tell whether the position is occupied or unoccupied and as
such we say that using linear probing we first of all, apply h(k) and then as many increments (i) as we
need

Disadvantages:
- Clearly linear probing results in clustering where a number of synonyms will be adjacent to each other
and mixed with others and as the table runs these clusters will inevitably grow larger and larger
making update, search and delete operations run more slowly

Advantages:
- It is suitable for small lists

(2) Non-Linear Probing

- Uses the following equation:

- inc(i,p) = (i + ap bp2)MOD MAXITEMS


where p = number of probes and
a,b = +_1

(3) Bucket Hashing

- It establishes a bucket or a separate storage for all members of a given synonym


- The hashing function is used to determine which bucket the new arrival belongs to
- In most cases linear lists are used as structures for the buckets

2. Clustering File Organisation Technique

28 dbase notes.doc
Compiled by W. Sithole

The basic idea behind clustering is to try and store records that are logically related and
physically close together on disk.

Physical data clustering is an extremely important factor in performance as can easily be


seen from the following example:

Suppose the stored record most recently accessed is record R1, and suppose the next
stored record required is record R2. Suppose also that stored R1 is stored on page P1 and
R2 is stored on page P2. Then:-
1. If P1 and P2 are one and the same, then the access to R2 will not require any
physical input or output at all, because the desired page, page 2 will already
be in a buffer in main memory
2. If P1 and P2 are distinct but physically close together in particular if they
are physically adjacent then the access to record R2 will require a physical
input/output (unless of course Page P2 also happens to be in a main
memory buffer), but the seek time involved in that input/output will be
small, because the read/write heads will already be close to the desired
position. In particular, the seek time will be 0 if P1 and P2 are in the same
cylinder.
Indexing
This is another file organisation method, which is divided into two areas namely:-
1. The Data Area
Contains all the records with all values or entries organised
sequentially which can be in ascending order

2. Index Area
Contains the record key per given track number. This record
key must be the highest in that track number. The 2 areas are
linked or joined by pointers

The general structure of an indexed file is structured as


follows:

CITY (INDEX) FILE SUPPLIER FILE


CITY S# NAME CITY
Harare 101 Simba Harare
Bulawayo 102 Tino Bulawayo
Mutare 103 Rue Harare
104 Rudo Mutare
105 Rufaro Harare
106 Takura Bulawayo
107 Rachel Mutare
The above supplier file is said to be indexed by city file.

The fundamental advantage of an index file is that:


It speeds up retrieval or accessing.
It offers great flexibility which allows both random and sequential access to data

But there is a disadvantage too, because:


It slows down updates. For instance every time a new stored record is added to the indexed
file, a new entry will also have to be added to the index.
It means extra work of maintaining the different tables
Amount of memory needed to store tables
Extra disk space for the index and overflow areas

29 dbase notes.doc
Compiled by W. Sithole

Indexes can be used in essentially two different ways:


1. They can be used for sequential access to the indexed files where sequential means in the
sequence defined by values of the indexed fields. The city index will allow records in the
supplier file to be accessed in city sequence.
2. Indexes can also be used for direct access to individual records in the indexed file on the
basis of a given value for the indexed field.

Dense and Non-dense Indexing


A dense index contains one index record for each data record in the index file (1:1). Fully inverted
index on every field.

A non-dense index is sometimes called sparse index, does not contain an entry for every stored
record in the indexed file (1:m). Less storage space used index for a number of records.

S# Index Supplier
S1 Smith London
S2 Jones Paris
S2 S3 Blake Paris
S4 S4 Clarke London
S5 S5 Adams Athens
S6 Brown Paris

3. Compression Techniques

This is a way of minimising amount of storage for stored data by replacing the data with some
representation.

There are three types of compression:

Front Compression
Rear Compression
Hierarchical Compression

Front Compression:

It is replacing front characters identical to previous entry by corresponding count.


The blanks are padded with b

30 dbase notes.doc
Compiled by W. Sithole

Example:

The following 4 names appear in a stored table. The field length is 10 characters. Apply front
differential compression:
Farai
Farasiya
Farisai
Farikayi

Solution:

Farai 0 - Faraibbbbb
Farasiya 4 - siyabb
Farisai 3 - isaibbb
Farikayi 4 - kayibb

Rear Compression:

It eliminates all trailing blanks replacing with appropriate count


It is dropping all characters to the right for the entry in question to differentiate it from its
immediate neighbors
First number is as in front differential compression and second is a count of number of
characters recorded
This results in loss of some information when the data is decompressed but it is available in
full somewhere in the data file
Example:

The following names appear in a stored table. The field length is 15 characters. Apply rear
compression.
Abrahams,GK
Ackermann,LZ
Ackroyd,S
Adams,T
Adams,TR
Adamson,CR
Allen,S
Ayres,ST
Bailey,TE
Baileyman,D
Solution:
Expanded form
Abrahams,GK 0-2 Ab Ab
Ackermann,LZ 1-3 cke Acke
Ackroyd,S 3-1 r Ackr
Adams,T 1-6 dams,T Adams,T
Adams,TR 7-1 R Adams,TR
Adamson,CR 5-1 o Adamso
Allen,S 1-1 l Al
Ayres,ST 1-1 y Ay
Bailey,TE 0-7 Bailey Bailey
Baileyman,D 6-1 m Baileym

31 dbase notes.doc
Compiled by W. Sithole

Hierarchical Compression:

A supplier-stored file might be clustered by values of the city field, foe example all London
suppliers would be stored together etc. The set of all supplier records for a given city might be
compressed into a single hierarchic stored record, in which the city value in question appears only
once, followed by all the other details for each supplier who happens to be in that city.

It consists of two parts:


Fixed part (city field)
Varying part (set of supplier entries). Varying in the sense that the number of entries
it contains (i.e. the number of suppliers in the city in question) varies from one
occurrence of the record to another, that is, a repeating group.

This is only possible if there is intra-file clustering.

Athens
S5 Adams 3
0

London
S1 Smith 20 S4 Clark 20

Paris
S2 Jones 10 S3 Blake 30

Intra-file

Page p1 page p2

S1 Smith 20 London S2 Jones 10 Paris


P1 300
P1 300

P2 200 P3 400 P5 100 P2 400

Inter-file
Combines supplier and shipment files into a single file and then apply intra-file compression to that single
file.

32 dbase notes.doc
Compiled by W. Sithole

DATA MODELS TYPES

There are four types of database models that is:-


1. Hierarchical
2. Network
3. Relational
4. Object-Oriented

The hierarchical and network models use standard files and provide structures that allow them to be cross-
referenced and integrated. They have been available since early 1970s. The relational model uses tables to
store data. It provides the ability to cross-reference and manipulate the data and it provides for data
integrity. The object-oriented model uses objects.

1. The Hierarchical Model


In the hierarchical database, data relationships follow hierarchies or trees, which reflect either a one to one
relationship or a one to many relationship among record types

The upper most record in the tree structure is called the root record. From there data organised into groups
containing parent record can have many child records (called siblings), but each child record can have only
one parent record. Parent records are higher in the data structure than are child records, however, each
child can become a parent and have its own child records. Because relationships between data items follow
defined paths, access to the data is fast. However, any relationship between data items must be defined
when the database is being created.

Parent-Child relationship

The depth of the hierarchy can be deeper

Properties of a Hierarchical Schema

1. One record type, called the root of the hierarchical schema does not participate as a child record
type in any Parent-Child relationship (PCR)
2. Every record type except the root participates as a child record type in exactly one PCR type
3. A record type can participate as parent record type in any number (zero/more) of PCR type
4. A record type that does not participate as parent record type in any PCR type is called a LEAF of
the hierarchical schema
5. If a record type participates as parent in more than one PCR type then its child record types are
ordered. The order is displayed, by convention, from left to right in a hierarchical diagram.

2. The Network Model

A network database is similar to a hierarchical database except that each record can have more than one
parent, thus creating a many-to-many relationship among the records. For example, a customer may be
called on by more than one salesperson in the same company, and a single salesperson may call on more
than one customer. Within this structure, any record can be related to any other data element.

The main advantage of a network database is its ability to handle sophisticated relationships among various
records. Therefore more than one path can lead to a desired data level.

The network database structure is more versatile and flexible than is the hierarchical structure because the
route to data is not necessarily downwards, it can be in any direction.

In the network structure again similar to the hierarchical structure data access is fast, because relationships
must be defined during the database design. However network complexity limits users in their ability to
access the database without the help of programming staff.

33 dbase notes.doc
Compiled by W. Sithole

Permits a record to belong to a number of parents.

3. The Relational Model

A relational database is composed of many tables in which data are stored, but a relational database
involves more than just the use of tables. Tables in a relational database must have unique rows, and the
cells (the intersections of a row and a column - equivalent to a fields) must be single-valued (that is, each
cell must contain only one item of information, such as a name, address, or identification number).

It is built from tables of data elements known as relations.

A row is called a tuple and a column is called an attribute. The data type describing the types of values
that can appear in each column is called a domain.

Domain
Is a set of atomic values. An atomic value means that each value in the domain is indivisible as far as the
relational model is concerned.

Logical definition of domains

Hre_phone_number -The set of valid 6-digit numbers in Harare


Cell_phone_numbers -The set of 9-digit numbers within a particular supporting network
Employee_age
-Possible ages of employees of a company, a value between 18 and 65 years old

A domain has a name, data type, and format.

Relation
A relation schema is a set of attributes. It is used to describe a relation. The degree of a relation is the
number of attributes n of its relation schema.
Defined, as a set of tuples and tuples in a relation do not have any particular order. Values within a tuple
are ordered. Values in the tuple are atomic therefore composite and multivalued attributes are not allowed
that is the First Normal Form assumption.

Relations may represent facts about entities or about relationships


Example
MAJOR (StudentID, DeptCode) asserts that students major in academic departments.

Tuple
All tuples in a relation must be distinct - no two tuples can have the same combination of values for their
attributes. Super key is an attribute such that it can not be duplicated within a relation e.g. {studentID,
Name, Age} cannot remove any attribute. A relation may have more than one key - each of the keys is
called a candidate key.

Example:
studentid and candidate_number

Example:
Relation of degree 7

STUDENT (name, ID_number, home_phone, address, cell_phone, age, GPA)

where GPA is Grade_Point_Average

34 dbase notes.doc
Compiled by W. Sithole

This is illustrated below:


STUDEN Name ID# Home_phon Address Cell_phone Age GPA
T e
Ben Evans 15-041681C32 212279 2 Stoney Rd 011209876 30 3.9
11 Park St 091239448 18 3.2
2 Dunmow0. 091332144 19 3.5
Rd
10-88th Ave 023312541 25 2.9
12 Wyatt Rd 023353659 28 3.9

A database management system that allows data to be readily created, maintained, manipulated, and
retrieved from a relational database is called Relational Database Management System (RDBMS). The
RDBMS, not the user, must ensure that all tables conform to the requirements. The RDBMS also must
contain features that address the structure, integrity and manipulation of the database.

In a relational database, data relationships do not have to be predefined. Hence users can query a relational
database and establish data relationships spontaneously by joining common fields. A database query
language is a helpful tool that acts as an interface between users and a relational DBMS. The language
helps the users of a relational database to easily manipulate, analyse and create reports from the data
contained in the database. It is composed of easy-to-use statements that allow people other than
programmers to use the database.

Relation CAR Relation ENGINE


Model Number Name Engine Model Number
1 Fiesta 950 1
2 Escort 1100 1
3 Sierra 1100 2
1300 1
1300 2
1600 2
1600 3
2000 3

4. The Object-Oriented Model

While the relational model is well suited to the needs of strong and manipulating business data, it is not well
suited for handling the data needs of certain complex applications, such as computer-aided design (CAD)
and computer aided software engineering (CASE).

Business data follow a defined data structure that the relational models handle well. However, applications
such as CAD and CASE deal with a variety of complex data types that can not be easily expressed by
relational models. Such programs also require massive amounts of persistent data (data that can not be
altered and that are stored in their own private memory space), and a database for them must be able to
evolve without affecting the data in memory that the application uses to operate.

An object-oriented database uses objects and messages to accommodate new types of data and provide for
advanced data handling. A database management system that allows objects to be readily created,
maintained, manipulated and retrieved from an object-oriented database is called an Object-Oriented
Database Management System (OODBMS)

An object-oriented database management system must still provide features that you would expect in any
other database management system, but there is still no clear standard for the object-oriented model.

35 dbase notes.doc
Compiled by W. Sithole

Logical Database Design

A logical database design is a detailed description of a database in terms of the ways in which the users will
use the data.

During this phase an analyst performs a detailed study of the data identifying how the data is grouped
together and how they relate to each other. An analyst must also determine which fields have multiple
occurrences of data, which fields will be keys or indexes and the size and type of each field.

A Schema as a complete description of the contents and structure of a database. It defines the database to
the system, including the record layout, the names, length and size of all fields and the data relationships.

A Subschema defines each user's view, or specific parts of the database that a user can access. A
subschema restricts each user to certain records and fields within the database. Every database has one and
only one schema, but each user must have a subschema.

STRUCTURED QUERY LANGUAGE (SQL)

In SQL, commands are given to define the structure of the database. Each database is identified by a name,
which is given in a CREATE DATABASE command.

The entities are defined as tables, with each attribute defined as a column in the table. A table then is given
a name, and each attribute declared by giving it a column name and stating its type. Supported data types
include:-

CHARACTER - values
SMALLINT - A restricted range of integers
DECIMAL - Which allow a fixed number of decimal places
FLOAT - For floating point values
MONEY - Currency values
DATE - For dates

Each data type allows a certain set of possible values. There is also a possibility of a column having an
unknown value called NULL. When a column is specified, it is assumed to allow a value unless the phrase
NOT NULL is specified

NULL values should not be allowed in any column, which forms part of the primary key of the table.

BELOW IS AN SQL COMMAND USED TO DEFINE A DATABASE

The name Art.db is chosen for the database, while the tables are called painting, artist and gallery. The
database MONEY has been used so is assumed to be supported by the implementation. The only column,
which allows a NULL value, is Nationality in the artist table. A NULL value in this column of a particular
row would mean that the actual value is unknown.

CREATE DATABASE Art.db CREATE TABLE Painting


(Title char(20) NOT NULL,
Artist-name char(20) NOT NULL,
Cost money NOT NULL,
Gallery-name char(15) NOT NULL)
CREATE TABLE Artist
(Artist-name char(20) NOT NULL,
Initial char(5) NOT NULL
Nationality char(15) )

36 dbase notes.doc
Compiled by W. Sithole

CREATE TABLE Gallery


(Gallery-name char(15) NOT NULL,
Gallery-Add char(20) NOT NULL)

CREATE UNIQUE INDEX painting.IDX on painting


(Title, artist-name)

CREATE UNIQUE INDEX artist.IDX on artist


(artist-name)

CREATE UNIQUE INDEX gallery.idx on gallery


(gallery-name)

UNIQUE INDEXES are defined on the tables for the primary keys to prevent the system allowing rows in
the tables with duplicate values in the key.

Instead, an INDEX is created for the key and is specified as unique, so that any attempt to add rows with
same key will be trapped as an error. For the gallery and artist tables, the key has just one component
attribute, but the key for the painting table has two attributes and the index is created for the pair (title,
artist-name).

Indexes may be created for any number of columns in the table. Usually their purpose is to speed up access
to the data using the column value. Each index must be given a name, although it is not used again and
unless it is to be deleted. The names used for the indexes in the illustration above are painting.idx, artist.idx
and gallery.idx.

RETRIEVING DATA FROM ONE TABLE

The SQL SELECT statement is used to retrieve data from a table. It combines elements of the relational
algebra operation via its various options

SELECTION

In its simplest form, a SELECT command will select all data from the table, as in the example:-

SELECT *
FROM Art

The asterisk (*) indicates that all the columns (fields) of the table Art are to be selected.

Using the WHERE clause will restrict the rows (records) which are selected to those satisfying the
condition for example:-

SELECT *
FROM Art
WHERE cost > 5000

In this form the SQL SELECT provides the functions of the SELECT statement of the relational algebra

Practical example for the two SELECT statement to view the contents of Art table is pictured as
follows:-

TABLE: Art

37 dbase notes.doc
Compiled by W. Sithole

TitleArtist_NameCostGallery_NamePoolVictor300ChitamboPeelJohn1000NyashaSonyArthur1500H
arareReelmTecla800NyashaTitoAmon4500Mutare
Questions:

1. Write an SQL code to view all records from the Art table

2. Write an SQL code to view all records from the Art table where cost is less than $1500 and
Gallery_Name is equal to Nyasha

Write an SQL statement to list only the columns Title, and Cost in the table Art

NB: Your statements should be supported by resulting tables.

Solution 1:

SELECT *
FROM Art

Resulting Table

TitleArtist_NameCostGallery_NamePoolVictor300ChitamboPeelJohn1000NyashaSonyArthur1500Harare
ReelmTecla800NyashaTitoAmon4500Mutare
Solution 2:

SELECT *
FROM Art
WHERE cost < 1500 and Gallery_Name = Nyasha

Resulting Table

TitleArtist_NameCostGallery_NamePeelJohn1000NyashaReelmTecla800Nyasha
Solution 3:

SELECT Title, Cost


FROM Art

Resulting Table

TitleCostPool300Peel1000Sony1500Reelm800Tito4500

PROJECTIONS

There is a provision in the SQL SELECT to cover the PROJECT of relational Algebra

The rows selected from a table can be projected into a list of their columns by including the column list
instead of the asterisk. The command:-

SELECT Title, Artist_Name, Gallery_Name


FROM Art
WHERE Cost > 1000

A table with 3 columns will be produced

This is obtained from the Art table by first retrieving the rows, which satisfy the condition (Cost > 1000),
then projecting them into the 3 columns and the cost values are omitted from the result.

38 dbase notes.doc
Compiled by W. Sithole

The result of SELECT including a projection is structured as follows:-

Table Art

TitleArtist_NameGallery_NameSonyArthurHarareTitoAmonMutare
If the SELECT command specifies all the components of the primary key of the table as part of the column
list the resulting rows will also be identified by the key value

In particular there will be no duplicate rows in the table, however, if the list of columns does not contain the
key or primary key, there maybe duplicate rows in the resulting table. An example is shown below which is
the result of applying the command.

SELECT Gallery_Name
FROM Art
WHERE Cost > 700

Resulting Table

Gallery_NameNyashaHarareNyashaMutare
A variation of the SELECT command can be used to ensure that duplicate rows are removed from the
result. It uses the DISTINCT Key word within the SELECT

SELECT DISTINCT Gallery_Name


FROM Art
WHERE Cost > 700

The above code will remove all duplicate rows producing the following table:-

Gallery_NameNyashaHarareMutare
* This is so, because we are projecting on Gallery_Name only but using a DISTINCT command
where we have to satisfy a condition given.

ORDERING THE ROWS


All of the SELECT commands mentioned previously produce tables as their results with the rows appearing
in the order in which they are found

It is possible to specify a particular order for the rows based on the selected column values by including an
'ORDER BY' clause

For example:
SELECT DISTINCT Gallery_Name
FROM Art
WHERE Cost > 700
ORDER BY Gallery_Name

This will produce the rows in ascending order of gallery name as shown on the table below

Gallery_NameHarareMutareNyasha

GROUPED DATA

There are additional clauses in the SELECT command, which allows it to deal with groups in data rather
than individual rows. The GROUP BY clause combines records with identical values in the specific field
list into a single record

39 dbase notes.doc
Compiled by W. Sithole

The final result of the SELECT is formed by projecting values into the selected columns. For example,
consider the command: -

SELECT Gallery_Name
FROM Art
WHERE cost < 1000
GROUPED BY Gallery_Name

It will produce a list of Gallery_Names, which hold Art whose cost is < 1000. The GROUP BY clause
causes all the selected rows with the same Gallery_Name to be grouped into a single row.

The projection onto the Gallery_Name is then performed and resulting table has got no duplicate names.

In fact it is equivalent to SELECT DISTINCT command. An added advantage of grouping data is that
there are standard functions, which can be applied to groups and producing one value for the whole group.
They include:-

1. SUM FUNCTION - To sum values in one column

2. AVG FUNCTION - To calculate the average value in a column

3. MIN FUNCTION - To find the minimum value in a column

4. MAX FUNCTION - To find the maximum value in a column

5. COUNT FUNCTION - To count the number of values in a column

Example 1
Write an SQL command to calculate the SUM of ALL COST in table painting.
Solution:
SELECT SUM (cost)
FROM painting

The computer will then sum up all the cost figures in the table painting and display the total only.

Example 2
Write an SQL statement using table painting to display the following output:

Gallery-name Cost

Example 3
Write an SQL statement to find the total galleries in the table painting.

Solution:
SELECT DISTINCT(gallery-name)
FROM painting
ORDER BY gallery-name

OR

SELECT COUNT(gallery-name)

40 dbase notes.doc
Compiled by W. Sithole

FROM painting
GROUPED BY gallery-name

Example 4

Write an SQL command to find or to list the maximum cost value in the table painting.

Solution:
SELECT MAX(cost)
FROM painting

SUB QUERIES
The WHERE clause can be express a complex condition. It can be used in what is called a SUBQUERY.
This makes use of another SELECT statement as part of the condition Nested SELECT statement.
Suppose we want to find all paintings by a particular artist the following statement is issued.

SELECT artist-name
FROM artist-table
WHERE artist-name = John

This produces a table of artist names equal to John. It can be used as part of the WHERE condition in the
SELECT statement which accesses or retrieves the tuple painting.

The SQL statement is structured as follows:


SELECT *
FROM painting IN(SELECT artist-name
FROM artist
WHERE artist-name = John)

It extracts rows from the table painting where the artist name appears in the subquery. The IN operator is
used to perform this test on the result of the subquery. This IN operator and its
negation/complement/inverse NOT IN are not the only operators for use in subqueries
SELECT *
FROM painting NOT IN(SELECT artist-name
FROM artist
WHERE artist-name = John)

This will extract any other records but not Johns.

ALL and ANY operators can be used with a relational operator such as >= to test the column value against
the result of the subquery. To select the titles of the most costly paintings we could use the following
command:
SELECT title
FROM painting
WHERE cost >= ALL (SELECT cost
FROM painting)
CONSTRUCTING USER ACCESS

When a central database is used for a number of different users who have different requirements, it is
essential to be able to tailor the data to the different needs. In this case, there are two SQL features which
provide these facilities:

Defining views to limit what is seen

Granting access privileges to particular users

41 dbase notes.doc
Compiled by W. Sithole

VIEWS
A view is a virtual (does not physically exist) table obtained from the real tables by a SELECT statement.
Its main use is to tailor the data of a table to the needs of particular users, so that it omits details of no
interest or should not see.

In the example of the table painting, it may be desired to let most users see all the data except for the cost.
A view can be created which omits the cost column as follows:

CREATE VIEW details AS


SELECT title, artist-name, gallery-name
FROM painting

This view is given the name details.

To the users it looks just like a table and can be treated as a table in most SQL commands. However, it is
not a real table. Its data is obtained from the painting table by performing the SELECT statement each time
it is accessed.

The statement:
SELECT *
FROM details
WHERE gallery-name = Chipangali

Uses the view as a table. It retrieves the data relating to the paintings in the Chipangali gallery, but does not
include the cost, since the virtual table is formed by ignoring the cost column and is not part of the view.
Views can be created for any SELECT statement, not just like those, which limit the columns of a table.

A virtual table of all paintings held at the gallery Chipangali would be created by the command:
CREATE VIEW Chipangali AS
SELECT *
FROM painting
WHERE gallery-name = Chipangali

This would contain all of the 4 columns of the table painting, but only those rows relating to the gallery
Chipangali.

Once a view has been created its definition, as a SELECT statement will exist until a DROP VIEW
command is performed.

While it exists, it can be treated as a table although it is only a virtual table.

42 dbase notes.doc
Compiled by W. Sithole

GRANTING PRIVILEGES
Users of a database are identified by a user name. Individual users can be granted privileges which give
them certain permission to use the SQL command on the database
Permissions may also be granted to all users by using the key word PUBLIC instead of the user name.
The GRANT CONNECT command is available to define passwords for a list of users. It has the form:
GRANT CONNECT TO <user list>
IDENTIFIED BY <passwords>

It can be used to set up the password(s) for the new users or to alter the existing user passwords. Some
implementations do not use this facility, but rely on the operating system to deal with passwords for users.
Specific privileges to permit the use of SQL statements on a table or view are allocated by further GRANT
command. They have the following form:
GRANT <privilege list>
ON <table or view>
TO <user list>

Where table is the name of the table or view, user list is either a list of names or the key word PUBLIC, and
privilege list is a list of key words for the privileges.

The privileges are any of the following:


SELECT
INSERT
DELETE
UPDATE <column list>
ALTER
ALL

And permit use of the corresponding SQL commands or statements

UPDATE may have a list of commands, stating those which are allowed to be updated. The default is, to
allow columns to be updated.

The ALL privilege permits all commands or privileges to be used or selected.

GRANT SELECT, UPDATE(cost, gallery-name)


ON painting
TO John, Nancy

Would let all 2 named users to use the SELECT command on the table painting and UPDATE the columns
cost and gallery-name only.

Since the privileges can be granted selectively a considerable degree of control of user access to data is
available.

Class exercise:

43 dbase notes.doc
Compiled by W. Sithole

Given the following table:

Student
Stud-ID Student-Name Town Course-Level Fee
HND1002 Chipo Harare HND 7500
ND2001 Edmore Mutare ND1 6500
ND200100 Takura Harare ND2 3000
ND2003 Simba Kwekwe ND1 6500
ND2008 Esther Bulawayo ND1 6500
HND1004 Rachel Mutare HND 7500
NC3001 James Gweru NC 3500
NC3007 Oscar Kwekwe NC 3500
ND2009 Linda Bulawayo ND1 6500

1. Create 3 views to the database Student so that


a) The Principal can only see the course-level and fee fields.
b) The Accountant can have access to all columns.
c) The Head of Department can only access student names, Stud-Ids and towns.
2. Write SQL statements, which can only permit the Accountant to SELECT, INSERT, DELETE and
UPDATE all columns of the student table.
3. Write SQL statements, which list only towns in their unique order where course fee is above $3500.
4. Write an SQL statement which list all students in both HND and NC course-level.

Question:
Given the following ERD design a detailed database using SQL necessary for the illustration.

44 dbase notes.doc
Compiled by W. Sithole

Stud-name town crs-title

crs-id
dob
STUDENT COURSE
ATTENDS

Stud-# nationality #-of-stud

TEACHES

Qualification #-of-stud

TEACHER

name

45 dbase notes.doc
Compiled by W. Sithole

CREATE DATABASE college.db

CREATE TABLE student


(stud-name char(20) NOT NULL,
stud-# smallint(5) NOT NULL,
dob date(8) NOT NULL,
town char(20) NOT NULL,
nationality char(15))

CREATE TABLE teacher


(name char(20) NOT NULL,
#--of-crs smallint(8) NOT NULL,
qualification char (30) NOT NULL,
#-of-stud smallint(2))

CREATE TABLE course


(crs-id smallint(6) NOT NULL,
crs-title char(20) NOT NULL,
#-of-stud smallint(2))

ALTERING THE DATABASE STRUCTURE


A database structure can be modified in a number of ways. Extra tables can be added using the create table
command, and extra indexes can be set up.

DROP TABLE <tablename> and


DROP INDEX <index-name>
Can be used to remove tables and indexes while
DROP DATABASE <database-name> can be used to remove the whole database.

ALTER TABLE <table-name>


ADD col-name1 char (20), This can be used to alter tables.

ADDING, DELETING and UPDATING GATA

1. Adding Data: SQL provides an INSERT command to add a single record to a table, for example:
INSERT INTO student VALUES
(HND1006, James Made, Mutare, HND, 7500)

This will add a row to the student table with all column values defined. The indexes associated
with the tables are updated automatically such that reentering the same record will be rejected.

2. Deleting Data: The DELETE command is used to remove rows or records from a table. In its
simplest form it will remove all rows as in the command:
DELETE FROM<table-name> will remove all rows
DELETE FROM <TABLE-NAME>
WHERE <condition> only rows meeting the set condition will be removed.

The WHERE clause is used in the DELETE command and in other commands. The conditions
can be quite complex, enabling the commands to be very selectively applied.

They allow:
(a) AND, OR and NOT to be used as logical connections
(b) Numerical and character data to be compared for either equality or inequality such as: >,
<, =, >=, <=

46 dbase notes.doc
Compiled by W. Sithole

SYNONYM USAGE

SELECT UNIQUE p#
FROM sp spx
WHERE p# IN
SELECT p#
FROM sp
WHERE s# (=) spx.s#
NOT IN

OR

SELECT p#
FROM sp
GROUP BY p#
HAVING COUNT (s#) > 1

Part #s for all parts supplied by more than one supplier

DICTIONARY
Collection of relations ie. Catalog and columns

Catalog - contains a row for each relation defined to the system


Table-name, (key), creator, # of columns etc.
Group of named schemas (consists of tables/views and definitions and
user defined specifications affecting physical placement of data on disk)
Library that contains ready to use functions
RDMS data dictionary
Systems database that contains information concerning various objects
that are of interest to the system itself eg base tables, views, indexes,
users, access privileges.
Table in which DBMS maintains data about the database
Contains administrative information eg. Access permission

Columns - contains a row for every column of every relation defined to


the systems
Table-name, column-name (composite key), data type (char, numeric
etc), length etc.

User can query this system eg.


SELECT tname, FROM columns, WHERE cname = s#
List of table names for tables with column s#.

Useful to a user who does not know all the fields of some tables but only an attribute.

CREATE SYNONYMS
Specifies an alternative name for a table/view; often used to define an abbreviation or to avoid
prefacing with the owner name of the table.

DROP SYNONYMS
Destroys a synonym declaration

COMMENTS USAGE
Provides an explanatory remark for table columns

47 dbase notes.doc
Compiled by W. Sithole

COMMENTS STATEMENT
Provides an explanatory remark for table columns (stored as part of internal definition tables.
Used in updating a catalog together with DELETE, CREATE TABLE, ALTER, INSERT

COMMENT ON TABLE s IS Each row represents one supplier


No comment on index

COMMENT ON COLUMN p.city IS Location of unique warehouse storing this part;

Greatest interaction between DBA and user group


Results in a complete set of data definitions recorded in the DD

Conceptual Modelling organisations data


External Interaction with users and other system specialists
Internal
Integrity control

Initial load/creation populating the database


Rights and duties of users that satisfy their responsibilities since it is a corporate database eg.
Updating Rights, Accessing Rights

STAGEMAJOR FUNCTIONS
Planning correctness, increase programmers
1. Develop entity charts productivity)
2. Analyse costs and benefits 3. Establish security techniques (Passwords,
3. Develop implementation plan access tables, encryption)
4. Evaluate and select software and hardware 4. Load databases (Special programs to load
5. Establish application priorities from different files)
6. Develop data standards (Naming conversions 5. Specify test procedures
and definitions eg Customer: Prospective, Prior, 6. Establish procedures for backup and
No Longer) recovery
7. Conduct user training
Requirements Formulation & Analysis
1. Define user requirements Operation & Maintenance
2. Develop data definitions 1. Monitor database performance
3. Develop data dictionary 2. Tune and reorganise databases
3. Enforce standards
Design 4. Support users
1. Design conceptual model
2. Design External models (Modelling Growth & Change
Organisations data, DBA interact with 1. Implement change control procedures 2.
users and other system specialists in data Plan growth and change
processing) Change in size: Storage space utilisation,
3. Design Internal models (schemas) DBA allocate additional space, reallocate existing
4. Design Integrity controls space
Change in content/structure: new
Implementation application requests, alter logical and
1. Specify database access policies (Rights) physical database structure
2. Develop standards for application Change in usage pattern: performance
programming (For consistency & monitoring, assigning frequently accessed
records to faster devices, additional higher
performance hardware devices.

48 dbase notes.doc
Compiled by W. Sithole

DATABASE LIFE CYCLE


Managed by the Database Administrator. There are 6 stages:
1. Planning
2. Requirements Formulation & Analysis
3. Design
4. Implementation
5. Operation & Maintenance
6. Growth & Change

Planning

Growth & Change Requirements


Formulation &
Analysis

Operation & Design


Maintenance

Implementation

Planning:
Its purpose is to develop a strategic plan for database development that supports the overall
organisation business plan

Requirements Formulation & Analysis


Is concerned with identifying data elements currently used by the organisation, precisely defining
these elements & their relationship, and documenting the results in a form that is convenient to the
design that is to follow. In addition to identifying current data, requirements Formulation &
Analysis attempts to identify new data elements or changes in existing data elements that will be
required in the near future.

Design Stage
Its purpose is to develop a database architecture that will meet the information needs of the
organisation now and in the future. There are 3 stages in database design, that is, Conceptual,
Implementation & Physical design.

a) Conceptual Design: Its purpose is to synthesise the various user views and information
requirements into a global database design. The design is called Conceptual Schema/Data
Model and may be expressed in one of the several forms that is, entity relationship diagram,
semantic data model, normalise relation. The Conceptual Data Model describes entities,
attributes and relationships.
b) Implementation Design: Its purpose is to map the Conceptual Data Model into a logical
schema that can be processed by a particular DBMS. The conceptual data model is mapped
into hierarchical, network or relational data model.

49 dbase notes.doc
Compiled by W. Sithole

c) Physical Design: Last stage of Database design concerned with designing stored record
formats, selecting access methods and deciding on physical factors such as record blocking.
Also concerned with database security, integrity and backup and recovery.

Implementation Stage:
Once the database is completed, the implementation process begins. The first step is the creation
or initial load of the database. Database administration manages the loading process and resolves
any inconsistencies that arise during this process.

Operation & Maintenance Stage


This is the ongoing process of updating the database to keep it current. Examples of updating
include adding a new employee record, changing a student address, deleting an invoice. Database
Administrator is responsible for developing procedures that ensure that the database is kept current
and that is protected during update operations. A Database Administrator must perform the
following functions:
a) Assigning responsibility for data collection, editing and verification
b) Establish appropriate update schedules
c) Establish an active and quality assurance program, including procedures for protecting,
restoring and auditing the database.

Growth and Change Stage


The database is a model of the organisation itself. As a result it is not static but reflects dynamic
changes in the organisation and its environment. The Database Administrator must plan for
change, monitor the performance of the database both efficiency and user satisfaction, and take
whatever action are required to maintain a high level of system performance and success.

Functions of Database Administration


Summarised according to the:

1. Planning:
Develop entity charts
Analyse costs and benefits
Develop implementation plan
Evaluate and select software or hardware
Establish application priorities
Develop data standards
2. Requirements Formulation & Analysis:
Define user requirements
Develop data definitions
Develop data dictionary
3. Database Design:
Design conceptual model
Design external models
Design internal models
Design integrity controls
4. Database Implementation:
Specify database access policies
Develop standards for application programming
Establish security techniques
Load database
Specify test procedures
Establish procedures for backup & recovery
Conduct user training
5. Operations & Maintenance:
Monitor database performance

50 dbase notes.doc
Compiled by W. Sithole

Tune and reorganise database


Enforce standards
Supports users
6. Growth & Change
Implement change control procedures
Plan growth & change

DATABASE IMPLEMENTATION
DBMS Functions:
Data storage, retrieval & update. A database may be shared by many users, the DBMS must provide
multiple user views and allow users to store, retrieve and update their data easily and efficiently

Data Dictionary/Directory
The DBMS must maintain a user accessible data dictionary

Recovery Services:
The DBMS must be able to restore the database or return it to a non-condition in the event of some system
failure. Sources of system failure include:
Operator error
Disk head crashes
Program error

Security mechanisms:
Data must be protected against accidental or intentional misuse or destruction. The DBMS must provide
mechanism for controlling accessed data and defining what action (read only, update may be taken by each
user.)

51 dbase notes.doc
Compiled by W. Sithole

NORMALISATION
Is the analysis of functional attributes (data items). The purpose of normalisation is to reduce complex user
views to a set of small, stable data structures. Normaliseed data structures are more flexible, stable and
easier to maintain than unnormalised structures.

Steps in Normalisation:

USER VIEWS

UNNORMALISED
RELATION

Remove repeating groups

1NF
RELATIONS

Remove partial dependencies

2NF
RELATIONS
Remove transitive dependencies

3NF
RELATIONS

Remove overlapping candidate keys

BCNF
RELATIONS

Remove multivalued dependencies

4NF
RELATIONS Remove join dependencies

1.

5NF
RELATIONS

52 dbase notes.doc
Compiled by W. Sithole

User views are identified


1. Each user view is converted to the form of an unnormalised relation
2. Any repeating groups are then removed from the unnormalised relations to produce a set of relations in
1st NF
3. Any partial dependencies are removed from these relations, the result is a set of relations in 2 nd NF.
4. Any transitive dependencies are removed creating a set of relations in 3 rd NF.

Unnormalised Relation:
It is a relation that contains one or more repeating groups for example GRADE-REPORT:

GRADE-REPORT
Stud# Studname Major Course# Crs-title Lec-name L-office Grade
38214 Takura IS IS350 Dbase Chamanga 6 A
IS465 SAD Kamudyariwa 10 C
69173 Esther PM IS465 SAD Kamudyariwa 10 A
PM300 Proj-Mgt Kamudyariwa 10 B
QM400 OR Magadza 11 C

Stud# studname 1:1

Major 1:1

Stud# course# 1 : M

Crs-title 1 : M

Lec-name 1:M

L-office 1 : M

There are multiple values at the intersection of certain rows and columns. Since each student takes more
than one course, the course data in the above relation constituents a repeating group within student data. In
an unnormalised relation, a single attribute can not save as a candidate or primary key. Suppose we take
student number as a primary key, there is a one-to-one relationship from student number to student name
and major. However, the relationship is one-to-many from student number to course and remaining
attributes. The student number is not a primary key, since it does not uniquely identify all the attributes in
this relation.

Disadvantages of Unnormalised Relations:


They contain redundant data which may result in inconsistent data, for example, information pertaining
to course number IS465 is contained in 2 locations (2 tuples in the sample). Suppose that we want to
change the course title from SAD to ASAD, to make this change, we would have to check the entire
grade-report relation. To locate all occurrences of course number IS465, if we fail to update all
occurrences the data would be inconsistent.

Normalised Relations:
A normalised relation is one that contains only single values at the intersection of each row and
column. A normalised relation contains no repeating groups. To normalise a relation that contains a
single repeating group we remove the repeating group and form 2 relations. The 2 new relations
formed from the above example are as in Student(S) and Student-Course(SC). Student relation is
already in 3rd NF whereas Student-Course relation is in 1st NF.

Therefore stud# is not a candidate key because it does not uniquely identify all attributes in this relation.

53 dbase notes.doc
Compiled by W. Sithole

Redundancy exist, for example, course# IS465 is contained in multiple rows.

Update anomaly when one wants to change SAD to ASAD in crs-title there is need to search the entire
relation failure of which results in data inconsistent.

Notation for Unnormalised data:

Grade Report(stud#, studname, major{course#, crs-title, Lec-name, L-office, Grade})

Where { shows a repeating group.

INF

A relation with a single repeating group will form 2 relations by removing the repeating group.

S(student)
Stud# Studname Major
38214 Takura IS
69173 Esther PM

SC(student-course)
Stud# Course# Crs-title Lec-name L-office Grade
38214 IS350 Dbase Chamanga 6 A
38214 IS465 SAD Kamudyariwa 10 C
69173 IS465 SAD Kamudyariwa 10 A
69173 PM300 Proj-Mgt Kamudyariwa 10 B
69173 QM400 OR Magadza 11 C

INF with primary key (stud#, course#) attributes from repeating group.
Primary key uniquely identifies students grade.

Student-Course still has data redundancy which results in update anomalies in INSERTING, DELETING,
UPDATING data.

INSERT:
To insert a new course it is impossible because if no student is taking that course that results in a null value
for stud# which is not allowed.
DELETE:
To delete a student record for a particular tuple results in loosing course title, and lecturer details. Leaving
the course details results in a NULL value for stud# which is part of the key and it is not allowed.

UPDATE:
To update course title since it appears a number of times for example, SAD there is need to search through
every tuple. There is inefficiency and might result in data inconsistencies in the case of failure to update all
occurrences.

The above problems being a result of nonkey attributes which are dependent on only part of the key, that is,
course# for example:

(stud#, course#) grade

course# crs-title

Lec-name

54 dbase notes.doc
Compiled by W. Sithole

L-office

Grade is fully dependent on (stud, course#) whereas Crs-title, Lec-name, L-office partially depend on the
primary key (stud#, course#). As shown below.

Crs-title

Stud#
grade
Lec-name
Course#

L-office

Partially dependent on primary key Fully functionally


(stud#, course#) dependent on primary key
(stud#, course#)
2NF
By removing attributes which are partially dependent on the primary key creating 2 relations:
1. With attributes fully dependent on the primary key
2. With attributes partially dependent on part of the primary key

R(Registration)
Stud# Course# Grade
38214 IS350 A
38214 IS465 C
69173 IS465 A
69173 PM300 B
69173 QM400 C
3NF

CL(Course-Lecturer)
Course# Crs-Title Lec-Name L-Office
IS350 DBase Chamanga 6
IS465 SAD Kamudyariwa 8
PM300 Project Mgt Kamudyariwa 8
QM400 OR Magadza 11
2NF

Course title appeara once in course-lecturer relation which solves the update anomally. Course data can be
inserted, deleted without reference tostudent data
Course# Crs-Title
Lec-Name
L-Office
Lec-Name L-Office This illustrates that there is a unique office for a lecturer, that
is transitive dependency when one nonkey attribute is dependent on one or more nonkey attributes.

55 dbase notes.doc
Compiled by W. Sithole

Course# Lec-Name L-Office


Transitive Dependency

Problems with 2NF

INSERT:
It is impossible to insert a new lecturer since it is dependent on course#. The new lecturer is not yet
assigned to teach at least one course. It is not possible for example to insert Ms Mvududu until one or more
courses have been assigned to her.

DELETE:
Deleting course data results in a lecturer data lost for example, deleting course# IS350 results in loss of
Chamanga data

UPDATE:
Lecturer data occur many times therefore changing lecturer office for Makunga requires searching every
tuple failure to which will result in data inconsistency for example one tuple reads Rm 8 and another will
read Rm 12.

3NF
Removing attributes that participate in transitive dependency, for example, Lec-Name and L-Office results
in the following relations:
C(Course)
Course# Crs-Title Lec-Name
IS350 DBase Chamanga
IS465 SAD Kamudyariwa
PM300 Project-Mgt Kamudyariwa
QM400 OR Magadza
Primary Key (Course#) and Foreign Key (Lec-Name)

L(Lecturer)
Lec-Name L-Office
Chamanga 6
Kachepa 11
Makunga 8
Primary Key (Lec-Name)

The assumption is that L-Office can have more than one occupant therefore Lec-Name becomes primary
key and associates the 2 relations course and lecturer.

In this 3NF insertion and deletion can be done without referencing other entities. Updates are also possible
because they are confined to a single tuple within a relation

56 dbase notes.doc
Compiled by W. Sithole

The whole Grade-Report View will be represented by the following relations:

C(Course)
Course# Crs-Title Lec-Name
IS350 DBase Chamanga
IS465 SAD Kamudyariwa
PM300 Project-Mgt Kamudyariwa
QM400 OR Magadza

L(Lecturer)
Lec-Name L-Office
Chamanga 6
Kachepa 11
Makunga 8

R(Registration)
Stud# Course# Grade
38214 IS350 A
38214 IS465 C
69173 IS465 A
69173 PM300 B
69173 QM400 C

S(student)
Stud# Studname Major
38214 Takura IS
69173 Esther PM

Relations in 3NF are sufficient for most practical database design problems. When a relation has more than
one candidate key, problems may arise even if it is in 3NF, hence the further normal forms come in, for
example, BCNF, 4NF, 5NF, DKNF.

BCNF (Boyce Codd Normal Form)


When a relation has more than one candidate key anomalies may result even though the relation is in 3rd NF.

SMA(student-Major-Advisor)
Stud# Major Advisor
123 Physics Edwin
123 Music Chioniso
456 Biology Machuma
789 Physics Tawanda
999 Physics Edwin

The semantic rules of the above relation are as follows:


1. Each student may major in several subjects
2. For each major, a given student has only one advisor
3. Each major has several advisors
4. Each advisor advises only one major

57 dbase notes.doc
Compiled by W. Sithole

A dependency diagram summarising the above rule:

Student#

Advisor

Major

The relation is in 3rd NF since


1. There are no repeating groups
2. No partial dependencies
3. No transitive dependencies

They are still anomalies in the relation above, that is, suppose that student# 456 changes her major, from
Biology to Maths, when the tuple of that student is updated, we lose that Machuma advises Biology (update
anomaly)

Suppose we want to insert a tuple with the information that Gamu advises in Computers. This can not be
done until at least one student majoring in Computers is assigned Gamu as an advisor (insertion anomaly)

In the above relation there are 2 candidate keys, student#, major and student#, advisor. The type of
anomalies that exist in this relation can occur when there are 2 or more overlapping candidate keys.

BCNF definition
A relation is in BCNF if and only if every determinant is a candidate key.

Determinant is any attribute simple or composite on which some other attribute is fully functionally
dependent, for example, in the above relation, the attribute advisor is determinant, since major is fully
functionally dependent on advisor.

To make the above relation in BCNF we make Advisor a candidate key and project the original 3 rd NF
relation into 2 relations that are in BCNF.

SA(Student-advisor) AM(Advisor-Major)
Student# Major Advisor Major
123 Physics Edwin Physics
123 Music Chioniso Music
456 Biology Machuma Biology
789 Physics Tawanda Physics
999 Physics

58 dbase notes.doc
Compiled by W. Sithole

Fourth Normal Form (4NF)

Even when a relation is in BCNF it may still contain unwanted redundancy that may result in update
anomalies, for example, consider the following unnormalised relation

O(Offering)
Course Instructor Textbook
Mgt White Drucker
Black Peters
Green
Finance Gray Weston
Gilford

Assumptions:
1. Each course has one or more instructors
2. For each course, all of the textbooks indicated are used.

O(Offering)
Course Instructor Textbook
Management White Drucker
Management Green Drucker
Management Black Drucker
Management White Peters
Management Green Peters
Management Black Peters
Finance Gray Weston
Finance Gray Gilford
Normalised Relation

From the normalised relation offering, for each course, all possible combinations of instructor and
textbooks appear in the resulting relation. The primary key of this relation consist of all the 3 attributes
(BCNF). The above relation contains redundant data. This can lead to update anomalies, that is, suppose
you want to add a third textbook to the management course. This would require the addition of 3 new rows
to the relation, one for each instructor. From the above relation you can see that for each course there is a
well defined set of instructors (one-to-many relationship) and a well defined set of textbooks (one-to-many
relationship). However, the instructors and textbooks are independent of each other. The relationship can
be summarised as follows:

Multivalued dependency

course instructor textbook

Multivalued Dependency
Exists when there are 3 attributes for example, a, b, & c, and for each value of a there is a well defined set
of values of b and a well defined set of values of c. However, the set of values of b is independent of set c
and vice-versa

To remove the multivalued dependency from a relation, we project the relation into 2 relations each of
which contains one of the 2 independent attributes.

59 dbase notes.doc
Compiled by W. Sithole

4NF
A relation is in 4NF if it is in BCNF and contains no multivalued dependencies.

The 2 new relations formed are as follows:

L(Lecturer) T(Text)
Course Instructor Course Textbook
Mgt White Mgt Drucker
Mgt Black Mgt Peters
Mgt Green Finance Weston
Finance Gray Finance Gilford

5NF
The normal formal form is designed to cope with join dependency. A relation that has a joint dependency
can not be decomposed by projection into other relations.

5th NF: a relation is said to be in 5NF if it is in 4NF and all loin dependencies are removed.

Limitations of Normalisation

Users may have to join several tables for retrieval which require additional computer time

Referential integrity is more difficult to enforce when a table is decomposed via normalisation

It ignores operational considerations

Objectives of Normalisation:
Reduce redundancy
Produce a stable data structure.

SECURITY AND INTEGRITY

SECURITY
Security refers to the protection of data against unauthorised access, alterations or destruction

INTEGRITY
Refers to the accuracy or validity of data

In other words security involves ensuring that the users are allowed to do the things they are trying to do
Integrity also involves ensuring the things they are trying to do are correct.

In both cases the system needs to be aware of certain rules that users must not violate. These rules must be
specified (typically by the DBA), using suitable language, and must be maintained in the system catalog or
dictionary and in both cases the DBA or DBMS must monitor user operations to ensure that the rules are
thus enforced.

60 dbase notes.doc
Compiled by W. Sithole

GENERAL SECURITY CONSIDERATIONS

There are numerous aspects to the security problem, among them are the following:
1. The legal, social and ethical aspects: Examples are does the person making a request, say for the
customer credit have a legal right to the requested information?
2. Physical control: Is the computer or terminal room locked or otherwise guarded?
3. Policy Questions: How does the enterprise owing the system decide on who should be allowed access
to hat?
4. Operational Problems: If a password scheme is used, how are the passwords kept secret and how are
they changed?
5. Hardware controls: Does the processing unit provide any security features such as storage protection
keys or a privileged operation mode.
6. Operating system security: Does the operating system erase the contents of storsge and data files when
they are finished with?

Now, modern DBMS typically support either or both of the two the approaches to data security. The
approaches are: Discretional or Mandatory.

Discretional Control: (User profile)


A given user will have different access rights (also known as privileges or authorities) on different objects;
further different users will typically have different rights on the same objects.
Discretional schemes are thus very flexible, WHY? Because users have the right to choose what they want
and can use their own modes.

Mandatory Control:
Each data element is tagged or labeled with a certain classic level and each user is given certain clearance
level.
A given data object can be accessed only by users with the appropriate clearance level. This is enforced by
the DBA
Regardless of whether we are dealing with a discretional or mandatory scheme, all the decisions as to which
users have to perform which operation or which object are policy decisions, not technical ones.
All the DBMS can do is to enforce those decisions once they are made.
It follows that, the result of those policy decisions:-
Must be made known to the system (by means of statements in some appropriate definition language),
and
Must be remembered by the system (by means of saving them in the catalog, in the form of security
rules also known as authorisation rules)

There must be a means of checking a given access request against the applicable security rules (by access
requests here we mean the combination of requested operation plus requested object plus requested user, in
general).

This checking is done by the DBMS security subsystem, also known as the authorisation subsystem.

In order that maybe able to decide which security rules are applicable to a given access request, the
subsystem must be able to recognise the source of that request that is, it must be able to recognise the
requesting user. For that reason, when users sign in to the system they are typically required to supply not
only their user ID (to say who they are), but also a password (to prove they are who they say they are).
The password supposed to be known only to the system and to the legitimate users of the user ID
concerned.
Regarding this last point, incidentally note that any number of distinct users might be able to share the same
group User ID. In this way the system can support user groups, and can thus provide a way of allowing
everyone, for instance, in accounting department to share the same privileges.
The operations of adding individual users to or removing individual users from a given group can then be
performed independent of the operation of specifying the privileges that apply to that group.

61 dbase notes.doc
Compiled by W. Sithole

Note however that the obvious place to keep a record of which groups are again in the catalog.
To repeat from the previous section most DBMS support either discretionary control or mandatory or both.
Infact, it would be more accurate to say that most systems support discretionary control and some systems
support mandatory control as well. Discretionary control is thus more likely to be encountered in practice.
As already noted, there is need to be a language that supports the definition of security rules. We therefore
begin by describing a hypothetical example of such a language, shown as follow:-

CREATE SECURITY RULE pr3


GRANT SELECT, UPDATE(cost)
ON painting
WHERE gallery-name = Chitombo
TO John, Peter, Anna
ON ATTEMPT violation REJECT

The above example is meant to illustrate the point that security rules have 5 components as follows:

1. A name (pr3 painting rule 3) in the example the rule will be registered in the system catalog under the
name pr3. The name will probably also appear in a message or diagnostics produced by the system in
response to an attempted violation of the rule.
2. One or more privileges (SELECT & UPDATE in the example) specified by means of the GRANT
clause.
3. The scope to which the rule applies specified by means of the ON clause. In the example the scope is
painting tuples or records where the gallery-name is not Chitombo.
4. One, or more users (more accurately user IDs) who are to be granted the specified privileges over the
specified scope, specified by means of the TO clause.
5. A violation response specified by the ON ATTEMPT violation clause, telling the system what to do if
the user attempts to violate a rule. In the example, the violation response is simply to REJECT the
attempt and provide suitable diagnostics. Such a respond will surely be the one mostly required on
practice so it is set to be the default response.

DESTROYING EXISTING RULES


General syntax:

DESTROY SECURITY RULE <security rule-name>

For example:
DESTROY SECURITY pr3

For simplicity we assume that destroying a given named relation will automatically destroy any security
rules that apply to that relation.

62 dbase notes.doc
Compiled by W. Sithole

AUDIT TRAILS

Its a special file or database in which the system keeps track automatically of all operations performed by
users on a regular database. A typical entry in the audit trail might contain the following information:

Requests (source text)


Terminal from which the operation was invoked
User who invoked the operation
Date and time of the operation
Basic relation(s), tuples and attributes affected
Old values
New values

RECOVERY
Recovery is the process of rebuilding a system pack to its original status after a system, media, transaction
failure etc.

SYSTEM FAILURE
Shut downs caused by hardware or hubs in the O/S, hardware system or other system software will be
referred to as a system crash. When the system crashes, all transactions currently executing terminate.

The contents of internal memory (which include I/O buffers) are assumed lost. However, we assume that
external memory including disks on which the database resides are not affected by the system failure.

DATA SECURITY
The protection of data in the database against unauthorised disclosure, alteration or destruction.

Authorisation Mechanisms
a) Identification
b) Authentication

Identification Users have to identify themselves to the system before accessing the database by supplying
an operator/username using machine readable cards
Authentication - The process if proving their identification by providing passwords, pin numbers, answering
some questions from the system.

Access Control
For each user the system will maintain a user profile, generated from the user definition supplied by the
DBA.

The details of the appropriate identification and authentication procedures would have been given on the
access controls. Operations allowed for a particular user to perform are to be given. The DBMS will go
through a series of test to determine whether to grant or delay access to the user. The tests may be arranged
in a sequence of increasing complexity, so that a program may reach its final decision as quickly as
possible.

DATABASE INTEGRITY
Ensuring that the data is accurate all times.

Database access control lock procedure.


Used to ensure that a given operation is authorised
Ensures that integrity constraints are not violated.

Constraints
Each relation in the database will have a set of integrity constraints associated with it.

63 dbase notes.doc
Compiled by W. Sithole

These constraints will be held in the data dictionary as part of the conceptual schema
They specify for example, that values of a particular attribute in some relation are to be within certain
boundary, or that within each tuple of some relation the values of one attribute may not exceed that of
another.

Integrity Constraints and Enforcement

1. Primary Key posses a property of uniqueness. No 2 tuples in the relation may have the same value
for this attribute or attribute combination
2. No component of a primary key value may be null

Enforcement
The DBMS must reject any attempt to generate a tuple whose key value is null or is a duplicate of the
one that already exists.
Bounds Entry
Values occurring in a particular attribute may be required to lie within certain bounds (eg values of
employee age: 15<age<60)
The constraints are specified by the Bounds Entry. The lower and upper limits have to be defined.

Values Entry
There may be a very small set of permitted values of some particular attribute combination eg permitted
values for primary colour are red, blue, green etc. In this case the permitted values could simply be
listed in a values entry for the relevant attribute or attribute combination

NB It might be desirable to list values or ranges of values that are not permissible for the attributes
concerned.

Format Entry
Values of a particular attribute may have to conform to a particular format. Eg the first character of a
supplier number must be the letter S.
The constraint is specified in a format entry for the relevant attribute.

Average Function
The set of values of a particular attribute relation may have to specify some statistical constraints eg no
employee may earn a salary that is more than twice the average salary for the department.
The predicate defining this constraint will enforce the library function AVERAGE
To enforce it the DBMS will have to monitor all storage operations against the employee relation

NB
All examples given above are of static constraints that is they specify conditions that must hold for
every given state of the database.
Another important type of constraint involves transition from one state to another eg when employees
salary is updated, the new value must be greater than the old value.
To specify such constraints it will need to specify the old and new values
The keywords OLD and NEW are reserved for this purpose.

A special case of transition is that from non-existence (ie addition of new tuple) or from existence to non-
existence (ie deletion of an existing tuple)

RECOVERY ROUTINES
Recovery routines are used to restore the database, or some portion of the database, to an earlier state after a
system failure (hardware or software) has caused the contents of the database in main storage to be lost.
They take as input a backup copy of the database (produced by the dump routines) together with the system
journal (which contains details of operations that have occurred since the dump was taken) and produce as
output a new copy of the data as it was before the failure occurred.

64 dbase notes.doc
Compiled by W. Sithole

NB Any transactions that were in progress at the time of the failure will probably have to be restarted.

BACKUP ROUTINE
Dump routines
These are used to take backup copies of selected portions of the database, also usually on tape.
It is normal practice to dump the database regularly say once a week
If the database is very large it may be more practical to dump one seventh of every day
Each time a dump is taken, a new system journal may be started and the previous one erased or
archived
Backup is normally initiated automatically by the DBMS before the database has committed its change.

Checkpoint/Restart Routines
Backing up and rerunning a long transaction in its entirety can be a time consuming process
Some systems permit transactions to take checkpoint at suitable points in their executions
The checkpoint routines will cause all changes made since the last checkpoint to be committed.
The checkpoint facility allows a long transaction to be divided up into a sequence of short ones
The checkpoint routine may also record values of specified program variables in a checkpoint entry in
the system journal

Audit Trail/System Journal/System Log


Used to record every operation on the database
For each operation the journal will typically include the following information:
(a) An identification of the transaction concerned
(b) A time stamp
(c) An identification of the terminal and user concerned
(d) The full text of the input change

And in the case of an operation involving change to the database, the type of change and address of the data
changed, together with its before and after values

Encryption/scrambling
Used to protect or is the protection of the database against an infiltrator who attempts to by pass against
the system
Example of by passing the system involves a user who physically removes part of the database for
example by stealing a disk pack
Apart from normal security measures to prevent unauthorised personnel from entering the computer
centre, the most important safeguard against physical removal of part of the database is the use of
scrambling techniques
Scrambling/encryption and privacy transformations techniques involves the following:
(a) Shuffling the characters of each tuple (or record or message) into different order
(b) Replacement of each character (or group of characters) by a different character (or group
of characters), from the same alphabet or different one
(c) Groups of characters are algebraically combined in some way with a special group of
characters (privacy key) supplied by the owner of the data.

TRANSACTIONS
A transaction is a unit of work with the property that the database is;
a) In a consistent state (state of integrity) both before it and after it but
b) Is possibly not in such in state between these 2 times

In general any changes made to the database during a transaction should not be visible to concurrent
transactions until such changes have been made, in order to prevent these concurrent transactions from
seeing the database in an inconsistent state.

65 dbase notes.doc
Compiled by W. Sithole

Any data changed by a given transaction including data created or destroyed by that transaction should
remain locked until that transaction terminates
The above discipline must be enforced by the DBMS
A transaction will be backed out if on completion it is found that the database is not in a state of
integrity
A transaction may also be backed out if the system detects a deadlock: A general strategy for such a
situation is to choose one of the deadlocked transaction, say the one most recently started or the one
that has made the changes and remove it from the system, thus freeing its locked resources for use by
other transactions.
The process of back out involves undoing all the changes that the transaction has, made releasing all
resources locked by the transaction and scheduling all the transaction for re-execution.

Example of Transaction
In a banking system a typical transaction might be
Transfer amount X from account A to account B This would be viewed as a single operation and a user
would have to enter a command such as

Transfer X = 100 A = 462351 B = 90554 at a terminal

The above transaction requires several changes to be made to the underlying database.
Specifically it involves updating the balance value in 2 distinct account tuples
Although the database is in a state of integrity before and after the sequence of changes, it may not be
throughout the entire transaction, ie some of the intermediate state (or transitions) may violate one or
more integrity constraints
It follows that there is need to be able to specify that certain constraints should not be checked until the
end of the transaction. These are called deferred constraints
By contrast, constraints that are enforced continuously during the intermediate steps of the transaction
are called intermediate

NB: The data sublanguages must include some means of signaling the end of the transaction, in order to
cause the DBMS to apply deferred checks

CONCURRENCY

In most systems, several users can access a database concurrently. The operating system switches execution
from one user program to another to minimise waiting for input or output operations
Within this approach transactions are often interleaved, that is, several steps are performed on transaction
A, then several steps on transaction B, followed by more steps on transaction A and so on.

66 dbase notes.doc
Compiled by W. Sithole

Effects of concurrent updates:


The effects of concurrent update without concurrency control are illustrated below.

1. 2 users are in the process of updating the same record which represents a savings account record for
customer A
2. At present time customer A has a balance of $200 in her account
3. User 1 reads her record into the user work area, intending to post a customer withdrawal of $150
4. Next user 2 reads the same record into that user area, intending to post a customer deposit of $25
5. User 1 posts the withdrawal and stores the record, which now indicates a balance of $50
6. User 2 then posts the deposit (increasing the balance to $225) and stores this record on top of the one
stored by user 1
7. The record now indicates a balance of $225
8. In this case the transaction for user 1 has been lost because of interference between transaction

INCONSISTENT ANALYSIS
Usually occurs in traditional file approach when the same data are stored in multiple locations,
inconsistencies in the data are inevitable that is, several of the files below contain customer data

Billing Sales Order


Program Processing
Program

Customer File Accounts Customer File Inventory File


Receivable

Suppose there is an address for one of the customers


If the files are to be consistent this change for address must be made simultaneously and correctly to
each of the files containing the customer address data item
Since files are controlled by different users it is very likely that some files will reflect the old address
while others reflect the new address. Inconsistency in stored data are one of the most common sources
of errors in computer applications that is, the outdated customer address may lead to a customer invoice
being mailed to the wrong location. As a result, the invoice may be returned as the customer payment
delayed or lost.

A transaction is a logical unit of work.

TRANSACTION RECOVERY
The transaction begins with the successful execution of a BEGIN TRANSACTION statement and it ends
with the successful execution of the COMMIT or the ROLLBACK statements

COMMIT establishes what is known/called a COMMIT point. This corresponds to the end of a logical unit
of work and to a point at which the database is or should be in a state of consistence

ROOLBACK, rolls the database back to the state it was in, which is the BEGIN TRANSACTION, which
then effectively means it goes to the previous COMMIT point.

When a commit point is established, all updates made by the program since the previous point are
committed that is, they are made permanent. Once committed, an update is guaranteed never to be undone.

67 dbase notes.doc
Compiled by W. Sithole

NOTE: Carefully the COMMIT and ROLLBACK terminate the transaction not the program. In general, a
single program execution will consist of a sequence of several transactions running one after the other as
illustrated on below:
1st Transaction

Program Begin COMMIT Failure


Initiation Transaction (Cancelled)

2nd Transaction
Recovery

Begin Rollback
Transaction

3rd Transaction

Begin Commit Program


Transaction Termination

SYSTEM RECOVERY
The system must be prepared to recover not only from local failures such as the occurrence of an overflow
condition with an individual transaction, but also from global failures such as a power failure on the CPU.

A local failure by definition affects only the transaction in which the failure has occurred. A global failure
affects all the transactions in progress at the time of the failure and including the entire system or program.
A global failure normally fall into two broad categories.

a) System Failures (eg power failure)


These affect all transactions currently in progress but do not physically damage the database. A
system failure is sometimes called a SOFT CRUSH
b) Media Failure (eg head crush on the disk)
It causes damage to the database or to some portion of it and affects at least those transactions
currently using that portion.

A media failure is sometimes called the HARD CRUSH. A critical point regarding system failure
is that the contents of the main memory are lost (in particular, the database buffers are lost).

The precise state of any transaction that was in progress at the time of the failure is therefore no
longer known, such a transaction can therefore never be successfully completed and so must be
undone (ROLLED BACK) when the system restarts.

It is also necessary to redo certain transactions at restart time, that did successfully complete prior
to the crush but did not manage to get their updates transferred from the database buffers (memory)
to the physical database or media.

The question arises: How does the system know at restart time which transactions to undo and
which to redo?

The answer is:

68 dbase notes.doc
Compiled by W. Sithole

At certain prescribed intervals typically whenever some prescribed number of entries have been
written to the system. The system automatically takes a check point, that is physically writing the
contents of the database buffers.

Check pointing: Special marker record periodically written to a transaction log (special file
recording all changes made to the database). It allows long transactions to be divided into a
sequence of short ones.

DATABASE INTEGRITY
Database Presentation:
It is the maintenance of the correctness and consistence of the data.

Commercial DBMS have integrity subsystems for monitoring transactions which update the database and
detecting integrity violations. These integrity sybsystems are rather primitive and the problems of
maintaining the correctness of the database are left in the hands of the database implementers (DBA)

INTEGRITY RULES
Integrity rules are divided into broad categories namely:
1. Entity integrity constraint or rule
Entity integrity rule also known as intra-relational inegrity is concerned with maintaining the
correctness of relations among fields or relations, and one of the most important task of these rules is
key uniqueness. Primary key should never be null or partially null.
The set property of the relation guarantee that no 2 tuples or records in a relation have the same values
in all their components

Implementation of key uniqueness requires that the system guarantee that no new tuple can be accepted
for insertion into a relation, if it has the same values in all its prime attributes as some existing tuple in
a relationship.

In addition we must also guarantee that no existing tuples in a relation is updated in such a way as to
change its prime field values to be the same as those of some other tuple in the same relation.

2. Referential integrity rules


It is concerned with the correctness of relationships between relations through the use of a foreign key

The requirement is that a foreign key must have either a null entry or an entry that matches the primary
key value in a table to which it is related.

The enforcement of the referential integrity rule makes it impossible to delete a row or tuple in a table
whose primary key has matching foreign key values in another table eg.

Entity Integrity: No NULL values in cust#


Customer-salesperson Relationship

Table: Customer
Cust# CustLName Cust-Fname CUST-DOB SalesP#
1008 Mutema John 08/12/37 37
1009 Dahwa Peter 11/10/43
10010 Musona Amon 02/05/86 14
10011 Asindadi Noah 12/12/32 21

Table: SalesPerson
SalesP# Area-Code Phone SP-Name SP-Sales-Amt

69 dbase notes.doc
Compiled by W. Sithole

24 615 882344 Tembo 66000.66


37 901 773662 Hwahwa 86500.99
14 615 252205 Murape 700.00
35 615 217197 Matanga 175600.00
21 615 091353659 Gava 65779.00

Illustration of Integrity Rules

Integrity Constraints and Enforcement

These constraints are kept in the DD as part of the Conceptual Schema

The definition of a Primary Key and its uniqueness property, no duplicates, no NULL values and to enforce
it the DBMS rejects an attempt to input records with NULL primary key values or duplicate values
Functional Dependencies represent another form of integrity constraint.
Comparison expression eg qtyout value not to exceed qtyord value
Lower and Upper limit values specified
Valid/Permitted values for a certain attribute
Attribute values conforming to a particular format
Statistical constraint eg no employee may earn more than twice the average salary for the department

DEADLOCK

PRINCIPLES OF DEADLOCK
Occurs when each of the two transactions is waiting for the other to release an item

Solution
Deadlock prevention protocol
Every transaction locks all items it needs in advance

Deadlock detection
No locks but periodically checks if the system is in a state of deadlock

Wait-for graph
Abort some of the transactions if theres a deadlock

This is a permanent blocking of a set of processes or users that either compete for system resources or
communicate with each other.

Reusable resource
Is one that can be safely used by only one process or user at a time and is not depicted by that use.
Examples include, processors, input/output channels, main & secondary memory. Data structures such as
database and files are examples of reusable resources.

Consumable resource
Is one that can be created (produced) and destroyed (consumed). Examples include interacts, printer
papers, signal messages etc

Conditions for a deadlock


1. Mutual exclusion
2. Hold and wait
3. No preemption (no release)
4. Circular wait

70 dbase notes.doc
Compiled by W. Sithole

Only one process or user may use a resource for example a tuple at a time.

Held by
User A Tuple T1

requests
User A1 Tuple T1
Process/user Resource

Mutual exclusion

Held by
User A1
Tuple T1

requests
User A2
|
|
|
requests
User An

Hold and Wait


A process or user may hold allocated resource while awaiting assignment of others

Held by
User A1 Tuple T1

requests

User A2
held by Tuple T2

71 dbase notes.doc
Compiled by W. Sithole

Circular wait

Tuple T1
Requests Held by

User A1

User A2
Held by

requests

Tuple T2

NB Deadlock can be called unresolvable circular wait.

No preemption
No resource can be possibly removed from a process or user holding it

Held by
Tuple T1
User A1

requests
User A2

A Related Problem:

Indefinite Postponement
Similar to deadlock known as indefinite blocking or starvation
In a system that keeps users waiting while that is make resources allocation and user scheduling decisions, it
is possible to delay indefinitely the scheduling decisions, it is possible to delay indefinitely the scheduling
of a user while other users receive the system attention.

Indefinite postponement may occur because of biases in a system resources scheduling policies. When
resources are scheduled on a priority basis, it is possible for a given user to wait for a resource indefinitely
as users with a higher priority continue to arrive.

Solution to indefinite postponement


Indefinite postponement is prevented by allowing a users priority to increase as he waits for a resource.

Eventually that users priority will exceed the priority of incoming users and the waiting user will be
serviced. The process is called AGING

72 dbase notes.doc
Compiled by W. Sithole

Deadlock Avoidance
Requires knowledge of future requests for a resource for example:

1. Do not grant authority to the user if his/her demands might lead to a deadlock
2. Do not grant incremental resource request to a user if this allocation might lead to a deadlock.

Deadlock detection
The goal of deadlock detection is to determine if a deadlock has occurred and determine precisely those
users and resources in the deadlock. Once this is determined the deadlock can be cleared from the system.

Deadlock recovery
Methods used to clear deadlocks from a system so that it may proceed to operate free of the deadlock, and
so that the deadlocked users may complete their execution and free their resources.

CONCURRENCY

Data Sharing
There are several problems which can result from the sharing of access to the database that is there is lost
update. If 2 users are allowed to hold the same tuple concurrently the first of the 2 subsequent update
operations will be nullified by the second, since the effect of the second will be to overwrite the result of the
first.

Solution
1. Grant the user issuing the first hold an exclusive lock on the data held
2. No other user will be allowed to access the data while it is locked to the first user
3. The user issuing the second hold will have to wait until the first user releases the lock
4. The second user will in turn be granted an exclusive lock on the data
5. The effect of the second hold will be to retrieve the data as updated by the first user.

However, the exclusive locking technique leads in turn to other problems that is deadlock and starvation
(discussed previously)

DATA SECURITY
The protection of data in the database against unauthorised disclosure, alteration or destruction.

Authorisation Mechanisms
c) Identification
d) Authentication

Identification Users have to identify themselves to the system before accessing the database by supplying
an operator/username using machine readable cards
Authentication - The process if proving their identification by providing passwords, pin numbers, answering
some questions from the system.

Access Control
For each user the system will maintain a user profile, generated from the user definition supplied by the
DBA.

The details of the appropriate identification and authentication procedures would have been given on the
access controls. Operations allowed for a particular user to perform are to be given. The DBMS will go
through a series of test to determine whether to grant or delay access to the user. The tests may be arranged
in a sequence of increasing complexity, so that a program may reach its final decision as quickly as
possible.

73 dbase notes.doc
Compiled by W. Sithole

DATABASE INTEGRITY
Ensuring that the data is accurate all times.

Database access control lock procedure.


Used to ensure that a given operation is authorised
Ensures that integrity constraints are not violated.

Constraints
Each relation in the database will have a set of integrity constraints associated with it.
These constraints will be held in the data dictionary as part of the conceptual schema
They specify for example, that values of a particular attribute in some relation are to be within certain
boundary, or that within each tuple of some relation the values of one attribute may not exceed that of
another.

Integrity Constraints and Enforcements

3. Primary Key Posses a property of uniqueness. No 2 tuples in the relation may have the same value
for this attribute or attribute combination
4. No component of a primary key value may be null

Enforcement
The DBMS must reject any attempt to generate a tuple whose key value is null or is a duplicate of the
one that already exists.

Bounds Entry
Values occurring in a particular attribute may be required to lie within certain bounds (eg values of
employee age: 15<age<60)
The constraints are specified by the Bounds Entry. The lower and upper limits have to be defined.

Values Entry
There may be a very small set of permitted values of some particular attribute combination eg permitted
values for primary colour are red, blue, green etc. In this case the permitted values could simply be
listed in a values entry for the relevant attribute or attribute combination

NB: It might be desirable to list values or ranges of values that are not permissible for the attributes
concerned.

Format Entry
Values of a particular attribute may have to conform to a particular format. Eg the first character of a
supplier number must be the letter S.
The constraint is specified in a format entry for the relevant attribute.

Average Function
The set of values of a particular attribute relation may have to specify some statistical constraints eg no
employee may earn a salary that is more than twice the average salary for the department.
The predicate defining this constraint will enforce the library function AVERAGE
To enforce it the DBMS will have to monitor all storage operations against the employee relation

NB
All examples given above are of static constraints that is they specify conditions that must hold for
every given state of the database.
Another important type of constraint involves transition from one state to another eg when employees
salary is updated, the new value must be greater than the old value.
To specify such constraints it will need to specify the old and new values
The keywords OLD and NEW are reserved for this purpose.

74 dbase notes.doc
Compiled by W. Sithole

A special case of transition is that from non-existence (ie addition of new tuple) or from existence to non-
existence (ie deletion of an existing tuple)

RECOVERY ROUTINES

Recovery routines are used to restore the database, or some portion of the database, to an earlier state after a
system failure (hardware or software) has caused the contents of the database in main storage to be lost.

They take as input a backup copy of the database (produced by the dump routines) together with the system
journal (which contains details of operations that have occurred since the dump was taken) and produce as
output a new copy of the data as it was before the failure occurred.

NB: Any transactions that were in progress at the time of the failure will probably have to be restarted.

BACKUP ROUTINE

Dump routines
These are used to take backup copies of selected portions of the database, also usually on tape.
It is normal practice to dump the database regularly say once a week
If the database is very large it may be more practical to dump one seventh of every day
Each time a dump is taken, a new system journal may be started and the previous one erased or
archived
Backup is normally initiated automatically by the DBMS before the database has committed its change.

Checkpoint/Restart Routines
Backing up and rerunning a long transaction in its entirety can be a time consuming process
Some systems permit transactions to take checkpoint at suitable points in their executions
The checkpoint routines will cause all changes made since the last checkpoint to be committed.
The checkpoint facility allows a long transaction to be divided up into a sequence of short ones
The checkpoint routine may also record values of specified program variables in a checkpoint entry in
the system journal

Audit Trail/System Journal/System Log


Used to record every operation on the database
For each operation the journal will typically include the following information:
(a) An identification of the transaction concerned
(b) A time stamp
(c) An identification of the terminal and user concerned
(d) The full text of the input change

And in the case of an operation involving change to the database, the type of change and address of the data
changed, together with its before and after values

Encryption/scrambling
Used to protect or is the protection of the database against an infiltrator who attempts to by pass against
the system
Example of by passing the system involves a user who physically removes part of the database for
example by stealing a disk pack
Apart from normal security measures to prevent unauthorised personnel from entering the computer
centre, the most important safeguard against physical removal of part of the database is the use of
scrambling techniques
Scrambling/encryption and privacy transformations techniques involves the following:

75 dbase notes.doc
Compiled by W. Sithole

(a) Shuffling the characters of each tuple (or record or message) into different order
(b) Replacement of each character (or group of characters) by a different character (or group
of characters), from the same alphabet or different one
(c) Groups of characters are algebraically combined in some way with a special group of
characters (privacy key) supplied by the owner of the data.

TRANSACTIONS

A transaction is a unit of work with the property that the database is;
a) In a consistent state (state of integrity) both before it and after it but
b) Is possibly not in such in state between these 2 times

In general any changes made to the database during a transaction should not be visible to concurrent
transactions until such changes have been made, in order to prevent these concurrent transactions from
seeing the database in an inconsistent state.
Any data changed by a given transaction including data created or destroyed by that transaction should
remain locked until that transaction terminates
The above discipline must be enforced by the DBMS
A transaction will be backed out if on completion it is found that the database is not in a state of
integrity
A transaction may also be backed out if the system detects a deadlock. A general strategy for such a
situation is to choose one of the deadlocked transaction, say the one most recently started or the one
that has made the changes and remove it from the system, thus freeing its locked resources for use by
other transactions.
The process of back out involves undoing all the changes that the transaction has, made releasing all
resources locked by the transaction and scheduling all the transaction for re-execution.

Example of Transaction
In a banking system a typical transaction might be:
Transfer amount X from account A to account B This would be viewed as a single operation and a user
would have to enter a command such as

Transfer X = 100 A = 462351 B = 90554 at a terminal

The above transaction requires several changes to be made to the underlying database.
Specifically it involves updating the balance value in 2 distinct account tuples
Although the database is in a state of integrity before and after the sequence of changes, it may not be
throughout the entire transaction, ie some of the intermediate state (or transitions) may violate one or
more integrity constraints
It follows that there is need to be able to specify that certain constraints should not be checked until the
end of the transaction. These are called deferred constraints
By contrast, constraints that are enforced continuously during the intermediate steps of the transaction
are called intermediate

NB: The data sublanguages must include some means of signaling the end of the transaction, in order to
cause the DBMS to apply deferred checks

76 dbase notes.doc
Compiled by W. Sithole

CONCURRENCY

In most systems, several users can access a database concurrently. The operating system switches execution
from one user program to another to minimise waiting for input or output operations
Within this approach transactions are often interleaved, that is, several steps are performed on transaction
A, then several steps on transaction B, followed by more steps on transaction A and so on.

Effects of concurrent updates:


The effects of concurrent update without concurrency control are illustrated below.

1. 2 users are in the process of updating the same record which represents a savings account record for
customer A
2. At present time customer A has a balance of $100 in her account
3. User 1 reads her record into the user work area, intending to post a customer withdrawal of $150
4. Next user 2 reads the same record into that user area, intending to post a customer deposit of $25
5. User 1 posts the withdrawal and stores the record, which now indicates a balance of $50
6. User 2 then posts the deposit (increasing the balance to $125) and stores this record on top of the one
stored by user 1
7. The record now indicates a balance of $125
8. In this case the transaction for user 1 has been lost because of interference between transaction

Data Sharing
There are several problems, which can result from the sharing of access to the database that is there, is lost
update. If 2 users are allowed to hold the same tuple concurrently the first of the 2 subsequent update
operations will be nullified by the second, since the effect of the second will be to overwrite the result of the
first.

Solution
6. Grant the user issuing the first hold an exclusive lock on the data held
7. No other user will be allowed to access the data while it is locked to the first user
8. The user issuing the second hold will have to wait until the first user releases the lock
9. The second user will in turn be granted an exclusive lock on the data
10. The effect of the second hold will be to retrieve the data as updated by the first user.

However, the exclusive locking technique leads in turn to other problems that is deadlock and starvation
(discussed previously)

Database Security and Protection

Techniques for protecting the database authorisation to database access

Database Semantic Integrity

Techniques to keep database in a consistent state with respect to specified constraints on the
database

Both Database security and Protection, and Database semantic Integrity are stored in the
DBMS catalog.

SUPPORT ROUTINES

Journaling Routines:
Records every operation in system log/audit trail/system journal

Dump Routines:

77 dbase notes.doc
Compiled by W. Sithole

Take back-up copies of the database, restarts a new system log after every dump routine

Recovery Routines:
Used to restore the database or some portion of the database after a system failure (hardware or
software) has caused contents of the database buffers in main storage to be lost.

Backout Routines:
Initiated automatically by the DBMS before transaction changes are committed.

Checkpoint/Restart Routines:
Cause all changes made since the last checkpoint to be committed. Instead of restarting a long
transaction it only restarts from the last checkpoint.

Detection Routines:
Detects any violations and back the transaction out of the system with the information on list of
constraints violated and offending tuples.

78 dbase notes.doc

Você também pode gostar