Você está na página 1de 277

Database Security, Integrity

and Recovery

Kyanganda S.

ICS 2415 Advanced Dbase Systems

Database Security
and Integrity

Definitions
Threats to security
Threats to integrity
Resolution of Problems

Kyanganda S.

ICS 2415 Advanced Dbase Systems

Database Security
SECURITY
Protecting the database from unauthorised users
Ensures that users are allowed to do the things they
are trying to do

Kyanganda S.

ICS 2415 Advanced Dbase Systems

Database Security
INTEGRITY
Protecting the database from authorised users
Ensures that what users are trying to do is correct

Kyanganda S.

ICS 2415 Advanced Dbase Systems

Database Security
TYPES OF SYSTEM FAILURES
1. HARDWARE
DISK , CPU , NETWORK

2.

SOFTWARE

SYSTEM, DATABASE, PROGRAM

Kyanganda S.

ICS 2415 Advanced Dbase Systems

Database Security
Important security features include:
Views
Authorisation & controls
User defined procedures
Encryption procedures

Kyanganda S.

ICS 2415 Advanced Dbase Systems

Authorisation Rules
An example: a person who can supply a particular
password may be authorised to read any record, but
cannot modify any of those records.
Authorisation Table for subjects i.e. Salesperson
Customer Records

Order Records

Read

Insert

Modify

Delete

Kyanganda S.

ICS 2415 Advanced Dbase Systems

Authorisation Rules
Authorisation Table for Objects i.e. Order Records

Salesperson
Password

Order Entry

Accounting

(Zahra)

(Maina)

(Shirin)

Read

Insert

Modify

Delete

Kyanganda S.

ICS 2415 Advanced Dbase Systems

Database Integrity
CONSTRAINTS
Can be classed in 3 different ways:

1. Business constraints
2. Entity constraints
3. Referential constraints

Kyanganda S.

ICS 2415 Advanced Dbase Systems

Database Integrity
BUSINESS CONSTRAINTS
A value in one column may be constrained by value
of another or by some calculation
or formulae.

Kyanganda S.

ICS 2415 Advanced Dbase Systems

Database Integrity
ENTITY CONSTRAINTS
Individual columns of a table may be constrained e.g. not
null

REFERENTIAL CONSTRAINTS
Some times referred to as key constraints, e.g.
Table 2 depends on Table 1

Kyanganda S.

ICS 2415 Advanced Dbase Systems

Database Integrity
create table account_dets
(acc_id char(6) primary key,
acc_custid char(6) references customer(cust_id),
acc_odraft number(4) check (acc_odraft <= 200),
acc_type char(2) constraint type_chk
check (acc_type in (AB, CD, EF)),
acc_crtdate date not null);

Kyanganda S.

ICS 2415 Advanced Dbase Systems

Database Integrity
BENEFITS OF USING CONSTRAINTS
Guaranteed integrity and consistency

Defined as part of table definition


Applies across all applications
Cannot be circumvented
Application development productivity
Requires no special programming
Easy to specify and maintain(reduced coding)
Defined once only
Kyanganda S.

ICS 2415 Advanced Dbase Systems

Database Integrity
CONCURRENCY CONTROL
WHAT IS IT?

The co-ordination of simultaneous requests, for the


same data, from multiple users

Kyanganda S.

ICS 2415 Advanced Dbase Systems

Database Integrity
CONCURRENCY CONTROL
WHY IS IT IMPORTANT?

Simultaneous execution of transactions over a shared


database may create several data integrity and
consistency problems

Kyanganda S.

ICS 2415 Advanced Dbase Systems

Database Integrity
Janet
Time
1. Read balance (1000)

John
1. Read Balance (1000)

2. Withdraw 200 (800)


Balance 800
3. Write balance
Balance 800

2. Withdraw 300 (700)


3. Write Balance
Balance 700
ERROR

Kyanganda S.

ICS 2415 Advanced Dbase Systems

Database Integrity
The three main integrity problems are:
Lost updates
Uncommitted data
Inconsistent retrievals

Kyanganda S.

ICS 2415 Advanced Dbase Systems

Database Integrity
LOCKING
Two kinds of Locks:

1. Shared Locks (allows read only access)


2. Exclusive Locks (prevents reading of a
record)

Kyanganda S.

ICS 2415 Advanced Dbase Systems

Database Integrity
Time
User 1
1. Lock record X

User2

1. Lock record Y
2. Request record Y
2. Request Record X
(Wait for X)

(Wait for Y)
DEADLOCK
Kyanganda S.

ICS 2415 Advanced Dbase Systems

Database Recovery
The process of restoring the database to a
correct state in the event of a failure, e.g.
System Crashes
Media Failures
Application Software Errors
Natural Physical Disasters
Carelessness
Sabotage
Kyanganda S.

ICS 2415 Advanced Dbase Systems

Database Recovery

Basic Recovery Facilities


Backup Facilities
Journaling Facilities
Checkpoint facilities
Recovery Facilities

Kyanganda S.

ICS 2415 Advanced Dbase Systems

Transactions
Basic unit of recovery
Properties of Transaction (ACID)
Atomicity
Consistency
Isolation
Durability

Purpose of recovery manager is to enforce


Atomicity and Durability
Kyanganda S.

ICS 2415 Advanced Dbase Systems

Staff Salary
Update Example
Read Operations:
Find address of the disk block that contains record with primary key x
transfer block into a DB buffer in main memory
copy salary data from DB buffer into variable salary

Write Operations:
as steps 1 & 2 above
copy salary data from variable salary into the DB buffer
write DB buffer back to disk

Kyanganda S.

ICS 2415 Advanced Dbase Systems

Storing Data
Buffer contents flushed to secondary storage
permanent
buffer full

Main Memory
Database
Buffer

Secondary
Storage

Kyanganda S.

Commit

ICS 2415 Advanced Dbase Systems

Database Update Procedures


Update Trans1
Database
(State 1)

Update Trans2 Update Trans3

Database
(State 2)

Database
(State 3)

Database
Backup
Database
(State 2)

Kyanganda S.

ICS 2415 Advanced Dbase Systems

Database
(State 4)

Back-up Facilities
DBMS provides a mechanism for taking backup
copies of the database and log file at regular
intervals.
A dump or copy or backup file contains all or
part of the database
backups taken without having to stop the
system

Kyanganda S.

ICS 2415 Advanced Dbase Systems

Journal Facilities
REDO LOGS
This is the main logging file. The file
contains two different types of
logging records.
AFTER IMAGES
BEFORE IMAGES

Kyanganda S.

ICS 2415 Advanced Dbase Systems

Journal Facilities
REDO LOGS - AFTER IMAGES
After any column of any row on any table in the
database is changed, then the new values are
not only written to the database but also to the
redo log. The complete row is written to the log.
If a row is deleted then notification is also put on
to the redo log. After images are used in roll
forward recovery.

Kyanganda S.

ICS 2415 Advanced Dbase Systems

Journal Facilities
REDO LOGS - BEFORE IMAGES
Before a row is updated the data is copied to
the redo log. It is not a simple copy from the
database because a separate area of the
database maintains the immediate pre-update
version of each row updated in the database.
The extra area is called the ROLLBACK
SEGMENT. The redo log takes before image
copies from the rollback segment in the
database.
Kyanganda S.

ICS 2415 Advanced Dbase Systems

Sample Log File


Tid

Time

Operation

T1

10:12

START

T1

10:13

UPDATE

T2

10:14

START

T2

10:16

INSERT

TENANT NO37

T2

10:17

DELETE

TENANT NO9

(old value)

T2

10:17

UPDATE

PROPERTY PG16

(old value)

T1

10:18

COMMIT

10:19

CHECKPOINT

Kyanganda S.

Object

TENANT NO21

Before
Image

(old value)

After Image

(new value)

(new value)

(new value)

T2

ICS 2415 Advanced Dbase Systems

pPtr

nPtr

Types of Recovery

Duplicate Databases
Rollback Recovery
Rollforward Recovery

Reprocessing Transactions
Kyanganda S.

ICS 2415 Advanced Dbase Systems

Duplicate Databases
Requires 2 copies of the database
Advantages
Fast Recovery (seconds)
Good for disk failures
Disadvantages
No protection against power failure
Expensive
Kyanganda S.

ICS 2415 Advanced Dbase Systems

Rollback Recovery
Changes made to the database are
undone
(Backward Recovery )
Rollback enables the updating to be
undone to a predetermined point in the
database processing that provides a
consistent database state.
Kyanganda S.

ICS 2415 Advanced Dbase Systems

Database Update Procedures


Update Trans1

Database
(State 1)

Update Trans2 Update Trans3

Database
(State 2)

Database
(State 3)

Database
Backup
Database
(State 2)

Kyanganda S.

ICS 2415 Advanced Dbase Systems

Database
(State 4)

Rollback Recovery

Database
(with
changes)
ROLLBACK

Database
(without
changes)

Before
Images
Kyanganda S.

ICS 2415 Advanced Dbase Systems

Roll Forward Recovery


This recovery technique updates an out-of-date
database up-to-the current processing position.
If the data is inconsistent then the database may
need to rollback to the previous consistent state.

Kyanganda S.

ICS 2415 Advanced Dbase Systems

Database Update Procedures


Update Trans1

Database
(State 1)

Update Trans2 Update Trans3

Database
(State 2)

Database
(State 3)

Database
Backup
Database
(State 2)
Kyanganda S.

ICS 2415 Advanced Dbase Systems

Database
(State 4)

Roll Forward Recovery

Database
(without
changes)
ROLL FORWARD

Database
(with
changes)

After
Images

Kyanganda S.

ICS 2415 Advanced Dbase Systems

Reprocessing Transactions
Similar to Forward Recovery
Uses update transactions instead of after
images
ADVANTAGES
Simple
DISADVANTAGES
Slow
Kyanganda S.

ICS 2415 Advanced Dbase Systems

Database Update Procedures


Update Trans1
Database
(State 1)

Update Trans2 Update Trans3

Database
(State 2)

Database
(State 3)

Database
Backup
Database
(State 2)

Kyanganda S.

ICS 2415 Advanced Dbase Systems

Database
(State 4)

Database Recovery Procedures


Problem

Recovery Procedure

Storage Medium
Destruction

*Duplicate Database
Forward Recovery
Reprocess Transactions
*Backward Recovery
Forward Recovery or reprocess
transactions - bring forward to
just before termination
*Backward Recovery
Reprocess Transactions
(exclusing those from the update
that created incorrect data)

Transaction error or
system failure

Incorrect Data

Kyanganda S.

ICS 2415 Advanced Dbase Systems

Summary
This lecture has looked at security and
recovery procedures
Ensuring that these two are administered
correctly cuts out the majority of problems
with database administration

Kyanganda S.

ICS 2415 Advanced Dbase Systems

Further Reading

Security
Connolly & Begg, chapter 19

Concurrency Control
Connolly & Begg, chapter 20?

Integrity and Recovery


Connolly & Begg, chapters 18 and 19?
Kyanganda S.

ICS 2415 Advanced Dbase Systems

Advanced Database Security

Kyanganda S.

ICS 2415 Advanced Dbase Systems

44

Contents
Definitions
Countermeasures
Security Controls
Data Protection and Privacy
Statistical Databases
Web Database Security Issues and Solutions
SQL Injection
Kyanganda S.

ICS 2415 Advanced Dbase Systems

45

Database Security Definition


Definition (revisited):
The protection of the database against intentional or
unintentional threats using computer-based or noncomputer-based controls

Areas in which to reduce risk:

theft and fraud


loss of confidentiality
loss of privacy
loss of integrity
loss of availability

Kyanganda S.

ICS 2415 Advanced Dbase Systems

46

Countermeasures
Ways to reduce risk
Include
Computer Based Controls
Non-computer Based Controls

Kyanganda S.

ICS 2415 Advanced Dbase Systems

47

Computer Based Controls


Security of a DBMS is only as good as the OS
Computer based Security controls available:

authorization and authentication


views
backup and recovery
Integrity
Encryption
within database and data transport

RAID for fault tolerance


associated procedures
e.g. backup, auditing, testing, upgrading, virus checking
Kyanganda S.

ICS 2415 Advanced Dbase Systems

48

Non-computer based Controls


Include:
Security policy and contingency plan
personnel controls
secure positioning of equipment
escrow agreements
maintenance agreements
physical access controls
Both internal and external

Kyanganda S.

ICS 2415 Advanced Dbase Systems

49

Data Security
Two (original) broad approaches to data security:
Discretionary access control
a given user has different access rights (privileges) on different objects
flexible, but limited to which rights users can have on an object
privileges can be passed on at users discretion

Mandatory access control


each data object is labelled with a certain classification level
each user is given a certain clearance level
rigid, hierarchic

Kyanganda S.

ICS 2415 Advanced Dbase Systems

50

Role Based Access Control


A specific function within an organisation
Authorizations are granted to the roles
Instead of users

Users are made members of roles


Privileges can not be passed on to other users
Simplifies authorization management
Supported in SQL

Kyanganda S.

ICS 2415 Advanced Dbase Systems

51

System R Authorization Model


One of the first authorization model for RDBMS
As part of System R RDBMS

Based on concept of Protection Objects


Tables and views

Access modes

SELECT
INSERT
DELETE
UPDATE

Not all applicable for views


Kyanganda S.

ICS 2415 Advanced Dbase Systems

52

System R Authorization Model


Users can give access to other users through use of
GRANT and REVOKE

Removing REVOKE is recursive


System R has a closed world policy
If no authorization then access is denied
However authorization can be granted later

Negative authorization
Denials are expressed
Denials take precedence

Kyanganda S.

ICS 2415 Advanced Dbase Systems

53

SQL Facilities
SQL supports discretionary access control using view
mechanism and authorization system
e.g. CREATE VIEW S_NINE_TO_FIVE AS
SELECT S.S#, S.SNAME, S.STATUS, S.CITY
FROM S
WHERE to_char(SYSDATE, 'HH24:MI:SS) >= 09:00:00
AND to_char(SYSDATE, 'HH24:MI:SS) <= 17:00:00;
GRANT SELECT, UPDATE (STATUS)
ON S_NINE_TO_FIVE
TO Purchasing;

parameterised view

Also referential and entity integrity


Kyanganda S.

ICS 2415 Advanced Dbase Systems

54

Oracle Virtual Private Databases


Fine-grained access control based on tuple-level
access
Uses dynamic query modification
Users are given a specific policy
The policy returns a specific WHERE clause in the query
depending on the policy
SELECT * FROM prop_for_rent

Becomes
SELECT * FROM prop_for_rent WHERE prop_type = F

Kyanganda S.

ICS 2415 Advanced Dbase Systems

55

Data Protection and Privacy


Privacy
concerns the right of an individual not to have personal
information collected, stored and disclosed either willfully
or indiscriminately

Data Protection Act


the protection of personal data from unlawful
acquisition, storage and disclosure, and the provision of
the necessary safeguards to avoid the destruction or
corruption of the legitimate data held

New Freedom of Information Act


Kyanganda S.

ICS 2415 Advanced Dbase Systems

56

Statistical Databases
A database that permits queries that derive
aggregated information (e.g. sums, averages)
but not queries that derive individual information

Tracking
possible to make inferences from legal queries to
deduce answers to illegal ones
SELECT COUNT(*) FROM STATS X WHERE X.SEX=M AND
X.OCCUPATION = Programmer)
SELECT SUM(X.SALARY) FROM STATS X WHERE X.SEX=M AND
X.OCCUPATION = Programmer)

Kyanganda S.

ICS 2415 Advanced Dbase Systems

57

Statistical Databases
Various strategies can be used to minimize
problems
prevent queries from operating on only a few
database entries
swap attribute values among tuples
randomly add in additional entries
use only a random sample
maintain history of query results and reject queries
that use a high number of records identical to
previous queries
Kyanganda S.

ICS 2415 Advanced Dbase Systems

58

Web Database Security Issues


Internet is an open network
traffic can easily be monitored, e.g. credit card numbers

Challenge is to ensure that information conforms to:


privacy, integrity, authenticity, non-fabrication, nonrepudiation

Information also needs protected on web server


Also need to protect from executable content

Kyanganda S.

ICS 2415 Advanced Dbase Systems

59

Web Database Security Solutions


Various methods can be used
proxy servers
improve performance and filter requests

firewalls
prevents unauthorised access to/from a private network

digital certificates
electronic message attachments to verify that user is
authentic

Kerberos
centralised security server for all data and resources on
network
Kyanganda S.

ICS 2415 Advanced Dbase Systems

60

Web Database Security Solutions


Secure Sockets Layer and Secure HTTP
SSL - secure connection between client and server
S-HTTP - individual messages transmitted securely

Secure Electronic Transactions


certificates which splits transactions so that only relevant
information is provided to each user

Java - Java Virtual Machine (JVM)


class loader - checks applications do not violate system
integrity by checking class hierarchies
bytecode verifier - verify that code will not crash or violate
system integrity

Active-X

uses digital signatures, user is responsible for security

Kyanganda S.

ICS 2415 Advanced Dbase Systems

61

SQL Injection
a technique used to take advantage of nonvalidated input vulnerabilities to pass SQL
commands through a Web application for
execution by a backend database1
Can chain SQL commands
Embed SQL commands in a string
Ability to execute arbitrary SQL queries
1 http://imperva.com/application_defense_center/glossary/sql_injection.html

Kyanganda S.

ICS 2415 Advanced Dbase Systems

62

SQL Injection: Example 1


Form asking for username and password
Original Query:
SQLQuery = SELECT count(*) FROM users
WHERE username = + $usename +
AND password = + $password + ;

Specify username and password = OR 1 = 1


SELECT count(*) FROM users WHERE
username = OR 1 = 1 AND password =
OR 1 = 1;
Kyanganda S.

ICS 2415 Advanced Dbase Systems

63

SQL Injection : Example 2


SQLQuery = SELECT * FROM staff WHERE staff_no =
+ $name + ;
Enter staff_no: 100 OR 1 = 1

Will give the query:


SELECT * FROM staff WHERE staff_no = 100 OR 1
= 1;
Even worse:
Enter staff_no: 100; DROP TABLE staff; SELECT * FROM
sys.user_tables
Enter staff_no: 100 UNION SELECT SELECT Username,
Password FROM Users
Kyanganda S.

ICS 2415 Advanced Dbase Systems

64

SQL Injection : Remedies


Can include:
Strip quotation marks and other spurious
characters from strings
Use stored procedures
Limit field lengths or even dont allow text entries

Restrict UNION

Kyanganda S.

ICS 2415 Advanced Dbase Systems

65

Summary
Have looked at a number of issues and
solutions for database security
e.g. access controls, SQL features, etc.

Web security is an important problem


Need to consider security of data transmission, the
data server and the clients

Kyanganda S.

ICS 2415 Advanced Dbase Systems

66

Further Reading
Connolly and Begg, chapter 19

Date (7th edition), chapter 17


both Connolly and Date have general introductions to security
concepts, with mention of some advanced features

Bertino and Sandhu: Database Security Concepts,


Approaches and Challenges, IEEE Transactions on
Dependable and Secure Computing, Vol. 2, No. 1, 2005
Oracle 8i Virtual Private Database White Paper:

http://www.oracle.com/technology/deploy/security/oracle8i/
pdf/vpd_wp6.pdf
Kyanganda S.

ICS 2415 Advanced Dbase Systems

67

Client/Server, Distributed and


Internet Databases

Kyanganda S.

ICS 2415 Advanced Dbase Systems

68

Contents
Client/Server Databases
Web Databases
Distributed Databases

Kyanganda S.

ICS 2415 Advanced Dbase Systems

69

Client/Server Architecture
In a file server client architecture each client must run a
copy of the DBMS
A better solution is to have a central database server
which performs all database commands sent to it from
client PCs.
Application programs on each client PC can then
concentrate on user interface functions.
Database recovery, security and concurrency control is
managed centrally on the server.
Kyanganda S.

ICS 2415 Advanced Dbase Systems

70

Client/Server Architecture
DATABASE SERVER
The SERVER portion of the client/server database
system which provides processing and shared access
functions.

Kyanganda S.

ICS 2415 Advanced Dbase Systems

71

Client/Server Architecture
CLIENT
Manages the user interface (controls the PC screen,
interprets data sent to it by the server and displays the
results of database queries)
The client forms queries in a specified language (usually
SQL) to retrieve data from the database. This query process
is usually transparent to the user.

Kyanganda S.

ICS 2415 Advanced Dbase Systems

72

Client/Server Architecture
CLIENT/SERVER ADVANTAGES
Allows companies to harness the benefits of
microcomputer technology such as low cost.
Processing can be performed close to the source of
the data - more speed.
Allows the use of GUI interfaces that are commonly
available on PCs and workstations.
Paves the way for truly open systems.

Kyanganda S.

ICS 2415 Advanced Dbase Systems

73

Client/Server Architecture
CLIENT/SERVER DESIGN ISSUES
The server must be upgradeable to allow for the
growth in clients.
Gateway software is normally required for accessing
databases held on a mainframe.
The server must have capabilities for backup,
recovery, security and UPS.

Kyanganda S.

ICS 2415 Advanced Dbase Systems

74

Client/Server Architecture
CLIENT/SERVER DESIGN ISSUES
Can be complex and so require specialised and expensive
tools such as database servers and APIs.
A lack of comprehensive standards.
Front-end GUI software often requires expensive client
workstations.

Kyanganda S.

ICS 2415 Advanced Dbase Systems

75

Traditional Client-Server
Architecture
Traditional
Database Systems
are based on a
two-tier clientserver architecture
Fat clients

Kyanganda S.

Client

User interface
Main business and data
processing logic

Database
Server

Server-side validation
Database access

ICS 2415 Advanced Dbase Systems

76

Web Architecture
Need for enterprise
scalability causes
problems which
can be solved by a
three-tier
architecture
Thin clients

Kyanganda S.

Client

User interface

Application
Server

Business logic
Data processing logic

Database
Server

Server-side validation
Database access

ICS 2415 Advanced Dbase Systems

77

Web as a Database Platform


Advantages

DBMS advantages
E.g. transactions, concurrency, synchronisation, security, integrity

Simplicity
HTML is a simple markup language, however with new scripting languages
this simplicity is being lost

Platform independence
Web clients are mostly platform independent

Graphical User Interface


Users prefer a GUI to a text based application

Standardization
HTML is a de facto standard

Kyanganda S.

ICS 2415 Advanced Dbase Systems

78

Advantages (cont).
Cross-platform support
Users on all types of computer can access a machine with a web browser

Transparent network access


Access solely by URL

Scalable deployment
Applications upgraded on server only

Innovation
Organisations can provide new services and reach new customers

Kyanganda S.

ICS 2415 Advanced Dbase Systems

79

Web as a Database Platform


Disadvantages
Reliability
Internet is a slow and unreliable communication medium
No guarantee of delivery

Security
Data accessible on web
User authentication and secure data transmissions are critical

Cost
A report from Forrester Research claims that maintaining a commercial web
site costs $200 to $3.4 million

Scalability
Unreliable and potentially very large peak loads
Needs highly scalable server architectures

Kyanganda S.

ICS 2415 Advanced Dbase Systems

80

Disadvantages (cont.)
Limited HTML Functionality
Need to extend HTML with scripting languages
Adds a performance overhead

Statelessness
No concept of a database connection

Bandwidth
Internet is slow! 1.5mbps compared to 10-100mbps

Performance
Many scripting languages are interpreted languages

Immaturity of development tools


This is improving!
Kyanganda S.

ICS 2415 Advanced Dbase Systems

81

Web Database Approaches


Traditional web pages are normally static
To run queries, need to be able to produce
dynamic HTML pages

Kyanganda S.

ICS 2415 Advanced Dbase Systems

82

Client Side vs.


Server Side
To access database and process information from the
database, need executable content
Acts as a gateway between the Web and the
database Server
This can run at either of two locations
Client Side
Server Side

Kyanganda S.

ICS 2415 Advanced Dbase Systems

83

Web Database Approaches


Approaches include:
CGI - Common Gateway Interface
HTTP Cookies - allows machine to store information,
e.g. user authentication
JavaScript - code which runs on client machine
PHP - Hypertext Preprocessor
Active Server Pages - MS Access dynamic forms

Kyanganda S.

ICS 2415 Advanced Dbase Systems

84

Database Connectivity
Client Side, 2 approaches:
Extend the browser using scripts, or add-ons or applets,
e.g. plug-ins, JavaScript, ActiveX, Java applets
Link browser to other (external) applications, e.g. legacy systems

Server Side, 2 approaches:


Embed scripts within web page source, e.g. PHP, Java servlets
Create programs which are executed when accessed by client, e.g.
CGI

Kyanganda S.

ICS 2415 Advanced Dbase Systems

85

Client Side
Advantages
Distribution of processing
Feedback speed
Web-page functionality

Disadvantages
Platform/environment dependent
Security and integrity
Download time
Programming limitations
Kyanganda S.

ICS 2415 Advanced Dbase Systems

86

Server Side
Advantages
Platform/browser independent
Security and integrity
Download time
Programming limitations direct access to database

Disadvantages
Lack of debugging tools
Lack of direct control over user interface

Kyanganda S.

ICS 2415 Advanced Dbase Systems

87

Distributed Databases
DECENTRALIZED DATABASE
stored on computers at multiple locations.
computers are not interconnected by a network.
users at the various sites cannot share data.

DISTRIBUTED DATABASE
Spread physically across computers in multiple locations that
are connected by a data communications link.

Kyanganda S.

ICS 2415 Advanced Dbase Systems

88

Distribution Types
Geographical Distribution: Several databases run
under the control of different CPU's at a variety of
different locations.
Platform Distribution: Databases exist on diverse
hardware platforms, and are 'brought together' by
the distributed database manager.
Architectural Distribution: Different database
architectures exist together, e.g. an object-oriented
database communicating with a relational database

Kyanganda S.

ICS 2415 Advanced Dbase Systems

89

Dates Rules
Distributed Database Requirements:
For a distributed database to be as such, a
fundamental principle must be adhered to:
To the user, a distributed database should look exactly like
a non-distributed system

Local Autonomy:
All operational controls and data maintenance are
controlled only by that site.

Kyanganda S.

ICS 2415 Advanced Dbase Systems

90

Dates Rules
No Reliance On A Central Site:
This follows on from the first objective and is selfexplanatory

Continuous Operation:
A distributed approach leads to greater reliability
and availability. The database should still be able to
function, even if one of its sites is unavailable.

Kyanganda S.

ICS 2415 Advanced Dbase Systems

91

Dates Rules
Distributed Transaction Management:
Transaction processing is the key to the successful
usage of distributed databases.
Must cater for two core aspects of transaction
management i.e. recovery control and
concurrency.

Location Independence
Otherwise known as Transparency.
Kyanganda S.

ICS 2415 Advanced Dbase Systems

92

Dates Rules
Fragmentation Independence:
Horizontal Partitioning: different rows from the
same table are stored at different sites.
Vertical Partitioning: different columns from the
same table are maintained at different sites.
Replication Independence:
Replication occurs when a stored relation can be
represented by many distinct copies (replicas), stored at
many sites. As with fragmentation, users must not be aware
that the data is replicated.

Kyanganda S.

ICS 2415 Advanced Dbase Systems

93

Dates Rules
Distributed Query Processing:
Queries may retrieve information from several
sites. Therefore distributed queries must be
optimised.

Kyanganda S.

ICS 2415 Advanced Dbase Systems

94

Dates Rules
Hardware Independence:

Presenting a 'single-image' system to the end user


regardless of platform.

Operating System Independence:


Same as above, but based upon software.

Network Independence:
Support for a disparate variety of communication
networks.

DBMS Independence:

Achieving heterogeneity between different database


management systems via a common interface, i.e. The SQL
language.

Kyanganda S.

ICS 2415 Advanced Dbase Systems

95

Distributed Databases
ADVANTAGES
Increased reliability and availability
Encourages local ownership of data
Modular growth
Lower communication costs
Faster response

Kyanganda S.

ICS 2415 Advanced Dbase Systems

96

Distributed Databases
DISADVANTAGES
Software complexity and cost
Processing overhead
Data integrity
Slow response

Kyanganda S.

ICS 2415 Advanced Dbase Systems

97

Distributed Databases
HOW SHOULD A DATABASE BE
DISTRIBUTED ?

Four basic strategies


1. Data replication
2. Horizontal partitioning
3. Vertical partitioning
4. Combinations of the above
Kyanganda S.

ICS 2415 Advanced Dbase Systems

98

Data Replication

Separate copy of the database stored at the


different sites.
Preferred for systems where:
Most transactions are read only
Data is relatively static, for example timetables or
catalogues.

Kyanganda S.

ICS 2415 Advanced Dbase Systems

99

Data Replication Advantages


Advantages
Reliability - If one site fails another copy of the
data can be found at a second site.
Fast response - Each site has a full copy of the
data therefore queries can be processed locally.

Kyanganda S.

ICS 2415 Advanced Dbase Systems

100

Distributed databases

Horizontal Partitioning:
The base table is split horizontally into several
different tables at different sites.
Selected rows from a table are put into tables at
different sites.

Kyanganda S.

ICS 2415 Advanced Dbase Systems

101

Distributed databases

Advantages
Efficiency - Data items are stored where they are
most often used away from other applications.
Optimisation - Data optimised for local use
Security - Only relevant data is available

Kyanganda S.

ICS 2415 Advanced Dbase Systems

102

Distributed databases

Disadvantages
Inconsistent access speed - When data from
several different partitions are required, access
speed can vary significantly.
Backup vulnerability

Kyanganda S.

ICS 2415 Advanced Dbase Systems

103

Distributed databases
Vertical PARTITIONING
Some of the columns in a table are projected into
a table at one of the sites and other columns are
projected into a table at another site.The same
advantages and disadvantages of horizontal
partitioning apply.

Kyanganda S.

ICS 2415 Advanced Dbase Systems

104

Distributed databases
Combinations
To complicate matters even further it is possible
to have a strategy which is a combination of all
the above. Some data stored centrally, some
distributed both horizontally and vertically. It
could be a real challenge (or a nightmare).

Kyanganda S.

ICS 2415 Advanced Dbase Systems

105

Distributed databases
DISTRIBUTED DBMS
Determine the location from which data is to be
retrieved.
Translate requests from different nodes.
Provide functions such as security, recovery,
concurrency and optimisation.

Kyanganda S.

ICS 2415 Advanced Dbase Systems

106

Distributed databases
DISTRIBUTED DBMS
IT SHOULD ALSO OFFER:

Location transparency
Replication transparency
Failure transparency
Concurrency transparency
Commit protocol

Kyanganda S.

ICS 2415 Advanced Dbase Systems

107

Further Reading
Distributed Databases
Connolly and Begg, chapter 22

Web Databases
Connolly and Begg, chapter 29
Sections 29.1 to 29.3

Kyanganda S.

ICS 2415 Advanced Dbase Systems

108

Object-Oriented Databases

Kyanganda S.

ICS 2415 Advanced Dbase Systems

109

Contents

Complex Applications
RDBMS Weaknesses
Next Generation Data Models
Object-Oriented Databases
Further Reading

Kyanganda S.

ICS 2415 Advanced Dbase Systems

110

Relational DBMS Suitability


Relational DBMS are suitable for certain types of
applications
simple data types, e.g. dates, strings
large number of instances, e.g. students, employees
well defined relationships between data, e.g. student, course
relationships and use of joins
short transactions, e.g. simple queries

Most successful for business applications


On-line transaction processing

Kyanganda S.

ICS 2415 Advanced Dbase Systems

111

Complex Applications
RDBMS are inadequate for applications including:
CAD, CAM
CASE
Office Information Systems
Multimedia systems
GIS
Science and medicine

Kyanganda S.

ICS 2415 Advanced Dbase Systems

112

Complex Applications
CAD, CAM

complex objects
graphics
a large number of types but few instances of each type
hierarchical design not static

CASE

software development lifecycle


co-operative engineering
concurrent sharing of design
code/documentation

Kyanganda S.

ICS 2415 Advanced Dbase Systems

113

Complex Applications
Office Information and Multimedia Systems
e-mail support
documentation
SGML documents

Geographic Information Systems


spatial and temporal information, e.g. satellite/survey photos, maps
pattern recognition

Kyanganda S.

ICS 2415 Advanced Dbase Systems

114

RDBMS Weaknesses
Poor separation of real world entities
normalisation leads to entities that dont closely match
real world
joins costly

Semantic overloading
all data held as relationships
no mechanism for differentiation between entities and
relationships

Kyanganda S.

ICS 2415 Advanced Dbase Systems

115

RDBMS Weaknesses
Poor support for integrity and enterprise constraints
relational systems good for supporting referential, entity and
simple business constraints
not good for more complex enterprise constraints

Homogeneous data structure


data pushed into rows and columns
not all real world data can be organised in this way

Kyanganda S.

ICS 2415 Advanced Dbase Systems

116

RDBMS Weaknesses
Limited operations
SQL does not allow new operations to be defined
e.g. select age from person;

Difficulty handling recursive queries


e.g. find all ancestors

Impedance mismatch
need to embed SQL to get computational completeness
data types in SQL and programming language dont match

Kyanganda S.

ICS 2415 Advanced Dbase Systems

117

RDBMS Weaknesses
Concurrency, schema changes and poor navigational
access
no support for long duration transactions
difficult to change schema, e.g. add columns to a table
RDBMS based on content based access
Not navigational

Kyanganda S.

ICS 2415 Advanced Dbase Systems

118

Data Models
1st Generation

Hierarchical
Network
Relational

2nd Generation

Entity-Relational

3rd
Generation

Kyanganda S.

Semantic
Object-Relational

Object-Oriented

ICS 2415 Advanced Dbase Systems

119

Object-Oriented Databases

Overview and Origins


OODB Strategies
OO Database System Manifesto
Advantages and Disadvantages
Object Database Standard
OQL
JDO

Kyanganda S.

ICS 2415 Advanced Dbase Systems

120

OO Databases Overview
Object-Oriented Database

e.g. ObjectStore, Objectivity, Jasmine, POET


based on object-oriented programming techniques
Information is represented in the form of objects
Objects: A uniquely identifiable entity that contains both
attributes that describe the state of a real-world object
and the actions associated with it (Connelly & Begg, 2005)

Kyanganda S.

ICS 2415 Advanced Dbase Systems

121

OO Databases Overview
include concepts such as
user extensible type system, complex objects,
encapsulation, inheritance, polymorphism, dynamic
binding, object identity

ODMG standard being devised to define data model


and query language standard
also defines interoperability between ODMG compliant
systems

Kyanganda S.

ICS 2415 Advanced Dbase Systems

122

Origins of OO Databases
Traditional Database Systems
persistence, sharing, transactions, concurrency control, recovery
control, security, integrity, querying

Semantic Data Models


generalisation, aggregation, navigational querying

Object-Oriented Programming
object identity, encapsulation, inheritance, types and classes,
methods, complex objects, polymorphism, extensibility

Special Requirements
versioning, schema evolution
Kyanganda S.

ICS 2415 Advanced Dbase Systems

123

OODBMS Development
Strategies
Various approaches:
Extend an existing OO-PL with database capabilities
Provide extensible OO DBMS libraries
Embed OO database language constructs in a
conventional host language
Extend an existing database language with OO
capabilities
Develop a novel data model/data language

Kyanganda S.

ICS 2415 Advanced Dbase Systems

124

Object Oriented DB System


Manifesto
Developed by Atkinson et. Al. 1989
Devised 13 mandatory features for an OODBMS
based on two criteria:
should be an OO system
should be a DBMS

Kyanganda S.

ICS 2415 Advanced Dbase Systems

125

Object-Oriented DB System
Manifesto
OO Criteria:
1. Complex objects must be supported
2. Object identity must be supported
3. Encapsulation must be supported
4. Types or classes must be supported
5. Types or classes must be able to inherit from their
ancestors
6. Dynamic binding must be supported
7. The DML must be computationally complete
8. The set of data types must be extensible
Kyanganda S.

ICS 2415 Advanced Dbase Systems

126

Object-Oriented DB System
Manifesto
DBMS Criteria:
9. Data persistence must be provided
10. The DBMS must be capable of managing very
large databases
11. The DBMS must support concurrent users
12. The DBMS must be capable of recovery from
hardware and software
13. The DBMS must provide a simple way of
querying data
Kyanganda S.

ICS 2415 Advanced Dbase Systems

127

OODB Advantages

Enriched modelling capabilities


Extensibility
Removal of impedance mismatch
More expressive query language
Support for schema evolution
Support for long duration transactions
Applicability to advanced database applications
Improved performance

Kyanganda S.

ICS 2415 Advanced Dbase Systems

128

OODB Disadvantages

Lack of universal data model


Lack of experience
Lack of standards
Query optimisation compromises encapsulation
Locking at object level may impact performance
Complexity
Lack of support for views
Lack of support for security

Kyanganda S.

ICS 2415 Advanced Dbase Systems

129

And most importantly


What about integrity?

Kyanganda S.

ICS 2415 Advanced Dbase Systems

130

Object Database Standard


Standard for Object-Oriented Data Model proposed
by Object Data Management Group
ODMG Object model is a superset of OMG object
model
Consists of

Object model (OM)


Object Definition Language (ODL)
Object Interchange Format (OIF)
Object Query Language (OQL)
Language bindings: C++, Smalltalk, Java

Kyanganda S.

ICS 2415 Advanced Dbase Systems

131

Object Model
Basic Constructs
object
literals

both characterised by types


objects
attributes
relationships
operations
Kyanganda S.

ICS 2415 Advanced Dbase Systems

132

Types
Interface Definition

defines abstract behaviour of object type


e.g. interface Employee{..};

Class
defines abstract behaviour and abstract state
extended interface with information for ODMS schema
definition
objects are class instances
e.g. class Person{..};

Literal

abstract state of a literal type


e.g. struct Complex {float ie, float im;};

Kyanganda S.

ICS 2415 Advanced Dbase Systems

133

Types (cont)
Inheritance
applied to both interfaces and classes
inheritance of behaviour between object types

Extend
applied to object types only
inheritance of state and behaviour

Extent
set of all instances of a class
extension
must have an unique key

Kyanganda S.

ICS 2415 Advanced Dbase Systems

134

Objects
Instances of a class
Have an unique object identifier
remains for lifetime of object

Names
equivalent to global variables

Lifetime can be
transient
persistent
type and lifetime are independent

Kyanganda S.

ICS 2415 Advanced Dbase Systems

135

Objects
Collections
Set, Bag, List, Array
Dictionary - sequenced key-value pairs

Structured objects
Date, Interval, Time, Timestamp

Literals
Atomic, Collection, Structured

Kyanganda S.

ICS 2415 Advanced Dbase Systems

136

Example ODL Schema


class Student
(extent students)
{
attribute short id;
attribute string name;
attribute string address;
attribute date dob;
relationship set<Module> takes
inverse Module takenby;
short age();
};

Kyanganda S.

ICS 2415 Advanced Dbase Systems

137

Example ODL Schema


class Module
(extent modules)
{
attribute string title;
attribute short semester;
relationship set<Student> takenby
inverse Student takes;
};
class Postgrad extends Student
(extent postgrads)
{
attribute string thesis_title;
};
Kyanganda S.

ICS 2415 Advanced Dbase Systems

138

Object Interchange Format


Used to dump/load current state of ODBMS to/from
a set of files
e.g.

Sarah Person{Name Sarah,


PersonAddress{Street Willow Lane,
City Durham,
Phone {CountryCode 44,
AreaCode 191,
PersonCode 1234}}}
Kyanganda S.

ICS 2415 Advanced Dbase Systems

139

Object Query Language


Similar to SQL92
extensions: complex objects, object identity, path
expressions, polymorphism, operation invocation, late
binding

e.g.

select distinct x.age


from Persons x
where x.name = Pat;

Return literal of type set<struct>


select distinct struct(a : x.age, s : x.sex)
from Persons x
where x.name = Pat;
Kyanganda S.

ICS 2415 Advanced Dbase Systems

140

OQL Examples
Path Expressions
select c.address
from Persons p, p.children c
where p.address.street = Main Street
and count(p.children) >= 2
and c.address.city != p.address.city;

Methods
select max(select c.age from p.children c)
from Persons p
where p.name = Paul;
Kyanganda S.

ICS 2415 Advanced Dbase Systems

141

OQL Polymorphism Examples


Late Binding
select p.activities
from Persons p;

Class Indication
select ((Student)p).grade
from Persons p
where course of study in p.activities;

Kyanganda S.

ICS 2415 Advanced Dbase Systems

142

Java Data Objects


ODMG disbanded in 2001
ODMG Java Data Binding superceded by JDO:

Provides transparent persistence


Scales from embedded to enterprise
Integrates with EJB and J2EE
Is being widely adopted in the database industry

More information at www.odmg.org

Kyanganda S.

ICS 2415 Advanced Dbase Systems

143

Further Reading
Connolly & Begg, chapters 25, 26 and 27
Date 7th ed., chapter on Object-Oriented databases
Atkinson et al, Object-Oriented Database System
Manifesto, Proc. 1st Intl Conference on DOOD, Japan,
1989.
Cattell, et. al., The Object Data Standard: ODMG3.0

Kyanganda S.

ICS 2415 Advanced Dbase Systems

144

Object-Relational Databases

Kyanganda S.

ICS 2415 Advanced Dbase Systems

Contents
Background
Extensions to Relational Model
Database World
Advantages and Disadvantages of ORDBMS

Third Generation Databases


Postgres
Oracle
SQL3 and SQL:2003
Comparison of OO/OR Models
Further Reading
Kyanganda S.

ICS 2415 Advanced Dbase Systems

Extensions to Relational
Model
Advanced Emerging Database Applications use:
user extensible type system, encapsulation, inheritance, polymorphism,
dynamic binding, complex objects, object identity

Extend relational model with OO features:


Extended Relational DBMS
Object-Relational DBMS
Universal Server

Standard based on SQL - SQL3 (started 1991!)

Kyanganda S.

ICS 2415 Advanced Dbase Systems

The Database World


Stonebraker proposed a four quadrant view of the
database world:
Relational
DBMS

Search capabilities/
multi-user support

File systems

ObjectRelational
DBMS
ObjectOriented
DBMS

Data complexity/extensibility

However distinction between OQL and


SQL is becoming less clear

Kyanganda S.

ICS 2415 Advanced Dbase Systems

Object-Relational
Advantages
Weaknesses of RDBMS given last time
Reuse and Sharing
extending the DBMS server to perform standard
functionality centrally
functionality shared by all applications, e.g. spatial data
types

Evolutionary rather than revolutionary


SQL3 upwardly compatible with current SQL standard
Current standard SQL:2003

Kyanganda S.

ICS 2415 Advanced Dbase Systems

Object-Relational
Disadvantages
Complexity and Associated Increased Costs
simplicity and purity of relational model is lost
majority of applications do not achieve optimal performance

Semantic gap between object-oriented and relational


OO applications not as data centric as Relational

Objectives of Initial SQL standard were to minimise user


effort and be easy to learn

Kyanganda S.

ICS 2415 Advanced Dbase Systems

3rd Generation
Database Manifesto
Manifesto developed by Stonebraker et. Al (1990)
1. A third generation DBMS must have a rich type system
2. Inheritance is a good idea
3. Functions, including database procedures and methods and
encapsulation, are a good idea
4. Unique identifiers for records should be assigned by the DBMS only if a
user-defined primary key is not available
5. Rules (triggers, constraints) will become a major feature in future
systems. They should not be associated with a specific function or
collection
Kyanganda S.

ICS 2415 Advanced Dbase Systems

3rd Generation DBMS


6. Essentially, all programmatic access to a database should be through a
non-procedural, high-level access language
7. There should be at least two ways to specify collections, one using
enumeration of members, and one using the query language to
specify membership
8. Updateable views are essential
9. Performance indicators have almost nothing to do with data models
and must not appear in them
10. 3rd generation DBMS must be accessible from multiple high-level
languages

Kyanganda S.

ICS 2415 Advanced Dbase Systems

3rd Generation DBMS


11. Persistent forms of a high-level language, for a variety of highlevel languages, are a good idea. They will all be supported
on top of a single DBMS by compiler extensions and a complex
runtime system
12. For better or worse, SQL is intergalactic dataspeak.
13. Queries and their resulting answers must be the lowest level of
communication between a client and a server

Kyanganda S.

ICS 2415 Advanced Dbase Systems

3rd Generation DBMS


Atkinson, OODB Manifesto, 1989
Stonebraker et al devised 3rd Generation DB System
Manifesto in 1990
Darwen and Date published a 3rd Manifesto in 1995 in
defense of the relational data model
certain OO features are desirable, but should be orthogonal to the
relational model
Relational model needs no extension, no correction, no
subsumption, no perversion
SQL is a perversion of the model
define a language D, but with a front-end layer that allows SQL to be used

Kyanganda S.

ICS 2415 Advanced Dbase Systems

SQL3 (aka SQL99, and


SQL:1999!)
The ANSI/ISO SQL3 standard includes new
features including:
row and reference type constructors
user defined types (UDTs)
can participate in supertype/subtype relationships

user defined procedures, functions and operators


type constructors for collection types
arrays, sets, lists, multisets

support for large objects


BLOBS and CLOBS

Superceded by SQL:2003
Kyanganda S.

ICS 2415 Advanced Dbase Systems

SQL:2003
Row types
a data type that can represent types of rows in tables
e.g.
CREATE TABLE branch(
bno
VARCHAR(3),
address ROW(
street VARCHAR(25),
town VARCHAR(15),
pcode ROW(
city_id VARCHAR(4)
subpart VARCHAR(4))));

INSERT INTO branch


VALUES(B5, (22 Deer Rd, Sidcup, (SW1, 4EH)));
Kyanganda S.

ICS 2415 Advanced Dbase Systems

SQL:2003
User Defined Types (UDT)
abstract data types 2 types, distinct and structured
structured types consists of one or more attribute and routine defns
CREATE TYPE age_type as INTEGER FINAL;

CREATE TYPE person_type AS (


PRIVATE
date_of_birth DATE CHECK(date_of_birth > DATE 1990-01-01);
PUBLIC
fname VARCHAR(15) NOT NULL,
lname VARCHAR(15) NOT NULL,
FUNCTION get_age (P person_type) RETURNS age_type
RETURN /* code to calc age */
END; ...
END) NOT FINAL;

Kyanganda S.

ICS 2415 Advanced Dbase Systems

SQL:2003
User defined routines (UDR)
may be defined as part of a UDT or as part of a schema
can be a procedure, function or method
Can be written in SQL or in an external programming
language

Polymorphism
uses a generalised object model, i.e.
No two functions in the same schema allowed to have same
signature (no. of arguments, same data types, same return type)
No two procedures allowed to have same name and number of
parameters

Kyanganda S.

ICS 2415 Advanced Dbase Systems

SQL:2003
Subtypes/supertypes
multiple inheritance is not supported
substitutability
when an instance of a supertype is expected, an instance
of the subtype can be used in place

Tables
A UDT instance can only persist if stored as a
column in a table
can use table inheritance
Completely independent from UDT facility

Kyanganda S.

ICS 2415 Advanced Dbase Systems

SQL:2003
Querying
uses SQL92 syntax with extensions to handle objects
e.g.
SELECT s.lname, s.get_age
FROM staff s
WHERE s.is_manager;
SELECT p.lname, p.address
FROM person p
WHERE p.get_age > 65;
SELECT p.lname, p.address
FROM ONLY (person) p
WHERE p.get_age > 65;

Kyanganda S.

ICS 2415 Advanced Dbase Systems

SQL:2003
Reference Types and OID
system generated, type REF
Reference types can be used to define relationships
between row types
reference types uniquely identify rows
allows rows to be shared across tables
complex joins can be replaced by simple path expressions
reference types do not provide referential integrity

Collection types
ARRAYs, LISTs, SETs, MULTISETs

Kyanganda S.

ICS 2415 Advanced Dbase Systems

SQL:2003
Persistent Stored Modules (SQL/PSM)
SQL:2003 now computationally complete
New statements added:

blocks
Assignment
IF .. THEN .. ELSE .. ENDIF, and CASE
REPEAT BLOCKS
CALL and RETURN for invoking procedures
Condition handling

Kyanganda S.

ICS 2415 Advanced Dbase Systems

SQL:2003
Triggers
An SQL statement that is automatically executed by the DBMS as
a side effect of a modification to a table
Triggering events include insertion, deletion and update of rows in
a table
Useful for:
Verifying input data
Maintaining complex integrity constraints
alerts

Kyanganda S.

ICS 2415 Advanced Dbase Systems

Oracle 8

An object-relational extension to Oracle 7


Object types can be used to create object tables with object
identifiers:

Does not support object hierarchies

Oracle 9 does support object hierarchies

New types:

attributes
Methods (normally written in PL/SQL)

VARRAYs and nested tables


REFs
LOBs

Oracle 9i, 10g further updates to Oracle 8


Examples in tutorial booklet
Kyanganda S.

ICS 2415 Advanced Dbase Systems

Comparison of
ORDBMS v OODBMS
Feature
OID
Encapsulation
Inheritance
Polymorphism
Complex Objects
Relationships
Create/Access
persistent data
Ad hoc query facility
Navigation
Integrity Constraints
Object server/page
server
Schema evolution

ORDBMS
Supported (REF type)
Supported (UDT)
Supported
Supported
Supported (UDT)
Strongly supported
Supported, not
transparent
Strong support
Supported (REF type)
Strong supported
Object server

OODBMS
Supported
Broken for queries
Supported
Supported (OOPL)
Supported
Supported
Supported, degree of
transparancy differs
Supported in ODMG2
Strong support
No
Either

Limited support

Varying support

ACID transactions

Strong support

Supported

Recovery

Strong support

Varying support

Adv. trans models

No support

Varying support

Security, Integrity,
Views

Strong support

Limited support

Kyanganda S.

taken from Connolly and Begg

ICS 2415 Advanced Dbase Systems

Further Reading
Connolly and Begg, chapter 28
a very good discussion

Stonebraker, Object-Relational DBMSs: The Next Great Wave,


1996.
Third Manifesto
www.thirdmanifesto.com

OR and OO manifestos are available from citeseer


http://citeseer.ist.psu.edu/

Dietrich and Urban Advanced Course in DB Systems


chapter 8 covers SQL:2003

Oracle Object-relational Tutorial from module

Kyanganda S.

ICS 2415 Advanced Dbase Systems

Database Performance

Kyanganda S.

ICS 2415 Advanced Dbase Systems

167

Contents
Database Performance
Denormalisation
Indexes
Clustering
Query Optimisation

Benchmarking
Wisconsin, TPC-C, 007, Bucky

Summary
Kyanganda S.

ICS 2415 Advanced Dbase Systems

168

Database Performance
Query performance is necessary to achieve
acceptable performance of a RDBMS
Various ways in which this can be achieved:
De-normalisation of data to reduce joins
Creating indexes on frequently retrieved attributes
Clustering tables to reduce the number of disk reads
Automatic optimisation of queries

Kyanganda S.

ICS 2415 Advanced Dbase Systems

169

Normalisation
Normalisation improves the logical database design
and prevents anomalies BUT
More tables more joins
Joining > 3 tables is likely to be slow

De-normalisation reverses normalisation for


efficiency purposes

Kyanganda S.

ICS 2415 Advanced Dbase Systems

170

Database Performance
Example:
Branch(BranchNo, street, city, postcode, mgrstaffno)
Could also be:

Branch(BranchNo, street, postcode, mgrstaffno)


Postcode(Postcode,city)

Kyanganda S.

ICS 2415 Advanced Dbase Systems

171

De-normalisation
Advantages:
Minimises need for joins
Reduces number of foreign keys in relations
Reduces number of indexes
Saves storage space

Reduces number of relations

Kyanganda S.

ICS 2415 Advanced Dbase Systems

172

De-normalisation
Disadvantages
Speed up retrievals, but may slow down updates
Increases application complexity
Relation size can increase
Sacrifices flexibility

Kyanganda S.

ICS 2415 Advanced Dbase Systems

173

Indexes
INDEXES
An index is a table or some other data structure that is used
to determine the location of a row within a table that
satisfies some condition.

Indexes may be defined on both primary and non key


attributes.

Kyanganda S.

ICS 2415 Advanced Dbase Systems

174

Indexes
Oracle allows faster access on any named table by using an
index.

each row within a table is given a unique value or rowid.


each rowid can be held in an index.
an index can be created at any time.
any column within a table can be indexed.

Kyanganda S.

ICS 2415 Advanced Dbase Systems

175

When to create an Index?


Before any input of data for Unique index

After data input for Non-unique index

Kyanganda S.

ICS 2415 Advanced Dbase Systems

176

Creating Indexes
HOW DO YOU CREATE AN INDEX ?
EXAMPLE :(a) CREATE INDEX TENIDX ON
TENANT(SURNAME);
(b) CREATE UNIQUE INDEX TENIDX ON
TENANT(SURNAME);

Kyanganda S.

ICS 2415 Advanced Dbase Systems

177

Index Guidelines
GUIDELINES FOR USE OF INDEXES
> 200 rows in a table

a column is frequently used in a where clause


specific columns are frequently used as join columns

Kyanganda S.

ICS 2415 Advanced Dbase Systems

178

Indexes
POINTS TO WATCH
avoid if possible > 3 indexes on any one table
avoid indexing a column with too few distinct values
For example:- male/female
avoid indexing a column with too many distinct values
avoid if > 15% of rows will be retrieved

Kyanganda S.

ICS 2415 Advanced Dbase Systems

179

Clusters
A disk is arranged in blocks
Blocks are retrieved as a whole and buffered
Disk Access time is slow compared with Memory
access
Gains can be made if the number of block transfers
can be reduced

Kyanganda S.

ICS 2415 Advanced Dbase Systems

180

Database Performance
CLUSTERING
clusters physically arrange the data on disk so that
frequently retrieved info is stored together
allows 2 or more tables to be stored in the same physical
block
can greatly reduce access time for join operations
can also reduce storage space requirements

Kyanganda S.

ICS 2415 Advanced Dbase Systems

181

Database Performance
CLUSTER DEFINITION
clustering is transparent to the user
no queries have to be modified
no applications need to be changed
tables are queried in the same way whether clustered or not

Kyanganda S.

ICS 2415 Advanced Dbase Systems

182

Database Performance
DECIDING WHERE TO USE CLUSTERS
Each table can only reside in 1 cluster
At least one attribute in the cluster must be NOT NULL
Consider the query transactions in the system
How often is the query submitted?
How time critical is the query?
Whats the amount of data retrieved?

Kyanganda S.

ICS 2415 Advanced Dbase Systems

183

Clustering Tables
Street

City

Postcode

22 Deer St

London

SW1 4EH

Branch
No
B005

163 Main St

Glasgow

G11 9QX

B003

Staff
No
SL21
SL41
SG37
SG14
SG5

First
Name
John
Julie
Ann
David
Susan

Branch Table

Last
name
White
Lee
Beech
Ford
Brand

Position

DOB Salary

Manager
Assistant
Assistant
Supervisor
Manager

310000
9000
12000
18000
24000

Staff Table

Tables Clustered on Common Column

Kyanganda S.

ICS 2415 Advanced Dbase Systems

184

Database Performance
CLUSTERING EXERCISE
STOCK

WAREHOUSE
3

PRODUCT
1000

Kyanganda S.

ICS 2415 Advanced Dbase Systems

185

Database Performance
To speed up access time to data in these three tables
(WAREHOUSE, PRODUCT, STOCK) it is necessary to cluster
either STOCK around WAREHOUSE, or STOCK around
PRODUCT.
How do we decide which will be the most efficient?
For the purpose of this exercise we will assume that each
block can hold 100 records.

Kyanganda S.

ICS 2415 Advanced Dbase Systems

186

Database Performance
If STOCK is clustered around PRODUCT
No of products = 1000. There will be 1 record for each
PRODUCT in each WAREHOUSE. Therefore each product
would have 3 records
Each block would contain 100/3 products, i.e. 33 products.
There would therefore be a 1 in 3 chance of accessing a
particular stock item by reading one block of data.

Kyanganda S.

ICS 2415 Advanced Dbase Systems

187

Database Performance
If STOCK is clustered around WAREHOUSE
No of warehouses = _____. There will be ____ record for
each item of STOCK in each warehouse. Therefore each
warehouse would have ______ records. The records for each
warehouse would have to be stored across ______ blocks.
Access would therefore be faster if STOCK is clustered
around the product.

Kyanganda S.

ICS 2415 Advanced Dbase Systems

188

Database Performance
SQL OPTIMISATION

Select *
from ...;

Kyanganda S.

DBMS

DATA FILES

ICS 2415 Advanced Dbase Systems

189

Query Optimisation
Automatic query optimisation can dramatically
improve query execution time
e.g. Consider the simple SQL query
select s.student_no, s.student_name, c.course_name
from student s, course c
where s.course_id = c.course_id
and s.age > 25;

This query is more optimal if the selections and


projections are performed before the join
Kyanganda S.

ICS 2415 Advanced Dbase Systems

190

Example
1000 students of which only 100 are over the age of
25, and there are 50 courses
Alternative 1: Join first
read the 1000 students, read all courses 1000 times (once
for each student), construct an intermediate table of
1000 records (which may be too large to fit in memory)
restrict the result to those over the age of 25 (100 rows at
most)
project the result over the required attributes
Kyanganda S.

ICS 2415 Advanced Dbase Systems

191

Example
Alternative: Restrict first
read 1000 tuples but restrict to those over the age of 25,
returning an intermediate table of only 100 rows - which
has a much better potential of being storable in main
memory
join the result with the course table, again returning an
intermediate table of only 100 rows
project the result over the required attributes
Obviously this version is BETTER!
Could be improved further by doing the projection before the join.
Kyanganda S.

ICS 2415 Advanced Dbase Systems

192

Query Processing Stages


Four stages in query processing
1. Cast the query into internal form
normally tree based (relational algebra)

2. Convert to canonical form


3. Choose candidate low-level procedures
using indexes, clustering, etc.

4. Generate query plans and choose and run


optimal query

the

based on cost formulas and database statistics


Rule or cost based in Oracle

Kyanganda S.

ICS 2415 Advanced Dbase Systems

193

Query Cast into


Internal Form
RESULT
PROJECT over student_no,
RESTRICT where age > 25
JOIN over course_id
S
Kyanganda S.

C
ICS 2415 Advanced Dbase Systems

194

Canonical Form
Canonical form
given a set Q of queries, and a notion of equivalence
between two queries q1 and q2 in set Q, then there exists
a subset C of Q, the set of canonical forms for Q, if and
only if every query q in Q is equivalent to only one query
c in C.
The query c is the canonical form of the query q

Uses expression transformation rules

Kyanganda S.

ICS 2415 Advanced Dbase Systems

195

Expression Transformation
Rules
Examples (not complete)
(A WHERE p1) WHERE p2 == A WHERE p1 and p2
(A PROJECT x,y) PROJECT y == A PROJECT y
(A UNION B) PROJECT x == (A PROJECT x) UNION (B
PROJECT x)
(A JOIN B) PROJECT x == (A PROJECT x1) JOIN (B PROJECT
x2)
A JOIN B == B JOIN A
(A JOIN B) JOIN C == A JOIN (B JOIN C)
(A JOIN B) PROJECT x = A PROJECT x
where x is FK from B to A

Kyanganda S.

ICS 2415 Advanced Dbase Systems

196

Choose Candidate Low-Level


Procedures
How to execute the query represented by that converted
form
Take into consideration:

Indexes
Other physical access paths
Distribution of data values
Clustering

Specify as a series of low-level operations


Each low level operation has a set of predefined
implementation procedures
Kyanganda S.

ICS 2415 Advanced Dbase Systems

197

Generate Query Plans/Choose


Cheapest
Each query plan from stage 3 will have a cost formula
generated from the cost formula for each low-level
procedure
Oracle supports
Rule Based
Rank queries according to algebra operations
15 rules in Oracle

Cost Based
Optimal rule based query may not in fact be optimal due to cost of
operating query, e.g. join order
Need to gather statistics
Kyanganda S.

ICS 2415 Advanced Dbase Systems

198

Database Performance
OPTIMIZING PERFORMANCE
Performance can be regarded as a
balancing act between: access performance
update performance
ease of use/modification

Kyanganda S.

ICS 2415 Advanced Dbase Systems

199

Benchmarking
Software and systems development projects include
performance evaluation work, but sometimes not
sufficient to prevent major performance problems
Benchmarking is a useful tool which can be used at
the prototyping stage to improve performance of
the DBMS application
There are many benchmarks available

Kyanganda S.

ICS 2415 Advanced Dbase Systems

200

Database Benchmarking
A tool for comparing the performance of DBMS
summarise relative performance in a single figure

Usually measured in transactions per second (tps)


with a cost measure in terms of system cost over 5 years

Two principal uses of benchmarks


providing comparative performance measures
a source of data and queries that represent experimental
approximations to other problems

Kyanganda S.

ICS 2415 Advanced Dbase Systems

201

Data Generation Approaches


Artificial
generation of data with entirely artificial properties
designed to investigate particular aspects of a system, e.g. join
e.g. Wisconsin

Synthetic Workload
produce a simplified version of an application
use synthetically generated data with similar properties to real
system, e.g. a banking application
e.g. Transaction Processing Performance Council (TPC)

Kyanganda S.

ICS 2415 Advanced Dbase Systems

202

Wisconsin Benchmark
First systematic benchmark definition
compares particular features of DBMS rather than a
simple overall performance metric

A single-user series of tests, comprising


selections and projections with varying selectivities on
clustered, indexed non-indexed attributes
joins of varying selectivities
aggregate functions (e.g. min, max, sum)
updates/deletions involving key/non-key attributes
Kyanganda S.

ICS 2415 Advanced Dbase Systems

203

Wisconsin Benchmark
Straightforward to implement
Scalable, e.g. parallel architectures
Useful readily understandable results
Lack of highly skewed attribute distribution
Simple join queries

Kyanganda S.

ICS 2415 Advanced Dbase Systems

204

TPC-C
Measures performance of a typical order entry
application
from initiation at a terminal until response arrives back
from server
benchmark encompasses time taken by server, network
and other system components
terminals emulated using a negative-exponential
transaction arrival distribution

Kyanganda S.

ICS 2415 Advanced Dbase Systems

205

TPC-C Schema

Taken from TPC-C Standard Specification: available at www.tpc.org

Kyanganda S.

ICS 2415 Advanced Dbase Systems

206

TPC-C
5 transactions covering

New order
A payment
Order status enquiry
A delivery
A stock level inquiry

10 terminals at each warehouse


All 5 transactions available at each terminal
produce an equal number of New-Order and Payment transactions and to
produce one Delivery transaction, one Order-Status transaction, and one StockLevel transaction for every ten New-Order transactions

Metric
number of New-Order transactions executed per minute

Kyanganda S.

ICS 2415 Advanced Dbase Systems

207

Other TPC Benchmarks


TPC-H
Ad-hoc decision support environments

TPC-App
Application server and web services benchmark

Further info on
www.tpc.org

Kyanganda S.

ICS 2415 Advanced Dbase Systems

208

OO7
A benchmark for Object Database Systems
Examines the performance characteristics of different types
of retrieval/traversal, object creation/deletion and updates
and query processor
A number of sample implementations are provided
Based on a complex parts hierarchy
Further info
ftp.cs.wisc.edu

Kyanganda S.

ICS 2415 Advanced Dbase Systems

209

OO7 Tests
Test 1:
Raw traversal speed, traversal with updates, operations
on manuals
Tests with/without full cache

Test 2:
Exact matches, range searches, path lookup, scan, make,
join

Test 3:
Insert/update a group of composite parts
Kyanganda S.

ICS 2415 Advanced Dbase Systems

210

BUCKY
An Object-Relational Benchmark
Objective
To test the key features that add the object to object-relational
database systems, as defined by Stonebraker
Inheritance, Complex Objects, ADTs
Not triggers

Based on a university database schema


Exists as an object-relational and a relational schema
can compare performance tradeoffs between using object aspects
of DBMS compared to purely relational
Kyanganda S.

ICS 2415 Advanced Dbase Systems

211

BUCKY Schema

Taken from: The BUCKY Object-Relational Benchmark, Carey, el. al

Kyanganda S.

ICS 2415 Advanced Dbase Systems

212

BUCKY Queries
Aim is to test various object queries, involving
1. row types with inheritance
2. inter-object references
3. set-valued attributes
4. methods of row objects
5. ADT attributes and their methods

Two BUCKY performance metrics


O-R Efficiency Index, for comparing O-R and relational
implementations
O-R Power Rating, for comparing O-R systems
Kyanganda S.

ICS 2415 Advanced Dbase Systems

213

Summary
Database application performance can be improved by

Indexes
Clustering
De-normalisation
Query Optimisation

But it is important at the design stage to ensure that the


application is designed with optimal performance in mind
Benchmarking is a tool which can help with this

Its a black art!

Kyanganda S.

ICS 2415 Advanced Dbase Systems

214

Further Reading
Connelly and Begg, chapter 18, 21
Also information in OODB chapter on benchmarking

Date: chapter on Query Optimisation


Bitton, De Witt and Turbyfill Benchmarking
Database Systems: A Systematic Approach, Proc.
9th VLDB 1983.
Carey, et. al The BUCKY Object-Relational
Benchmark,
http://www.cs.wisc.edu/~naughton/bucky.html
Kyanganda S.

ICS 2415 Advanced Dbase Systems

215

Data Warehousing and Data


Mining

Kyanganda S.

ICS 2415 Advanced Dbase Systems

216

Contents

Data Warehousing
OLAP
Data Mining
Further Reading

Kyanganda S.

ICS 2415 Advanced Dbase Systems

217

Data Warehousing
OLTP (online transaction processing) systems
range in size from megabytes to terabytes
high transaction throughput

Decision makers require access to all data


Historical and current
'A data warehouse is a subject-oriented, integrated,

time-variant and non-volatile collection of data in


support of managements decision-making process'
(Inmon 1993)

Kyanganda S.

ICS 2415 Advanced Dbase Systems

218

Benefits
Potential high returns on investment
90% of companies in 1996 reported return of investment
(over 3 years) of > 40%

Competitive advantage
Data can reveal previously unknown, unavailable and
untapped information

Increased productivity of corporate decision-makers


Integration allows more substantive, accurate and
consistent analysis

Kyanganda S.

ICS 2415 Advanced Dbase Systems

219

Comparison
OLTP

Data Warehouse

Holds current data

Holds historic data

Stores detailed data

Detailed, lightly/highly
summarised data

Data is dynamic

Data largely static

Repetitive processing

Ad hoc querying, unstructured


and heuristic processing

High transaction throughput

Medium-low level transaction


throughput

Predictable usage patterns

Unpredictable usage patterns

Transaction driven

Analysis driven

Application oriented

Subject oriented

Supports day-to-day decisions

Strategic decisions

Large number of
clerical/operational users

Lower number of managerial


users

Source:
Connolly and
Kyanganda
S. Begg p1153ICS 2415 Advanced Dbase Systems

220

Typical Architecture
Mainframe operational
n/w,h/w data

Warehouse mgr
Reporting query, app
development,EIS tools

Meta-data
Departmental
RDBMS data

Highly
summarized
Query
data
manager

Load
mgr

OLAP tools

Lightly summarized
data

Private data

Detailed data

DBMS

Warehouse mgr
Data-mining tools
External data

Archive/backup

Source: Connolly and Begg p1157


Kyanganda S.

ICS 2415 Advanced Dbase Systems

221

Data Warehouses
Types of Data
Detailed
Summarised
Meta-data
Archive/Back-up

Kyanganda S.

ICS 2415 Advanced Dbase Systems

222

Information Flows
Operational data
source 1

Warehouse Mgr
Metadata

Meta-flow

Reporting query, app


development,EIS tools
Highly
summ.
data

Inflow
Load
mgr
Lightly
summ.
Detailed data

Outflow
Query
manager

OLAP tools

Upflow
DBMS

Warehouse mgr
Downflow
Operational data
source n

Data-mining tools
Archive/backup

Source Connolly and Begg p1162


Kyanganda S.

ICS 2415 Advanced Dbase Systems

223

Information Flow Processes


Five primary information flows
Inflow - extraction, cleansing and loading of data from
source systems into warehouse
Upflow - adding value to data in warehouse through
summarizing, packaging and distributing data
Downflow - archiving and backing up data in warehouse
Outflow - making data available to end users
Metaflow - managing the metadata

Kyanganda S.

ICS 2415 Advanced Dbase Systems

224

Problems of Data Warehousing


1.
2.
3.
4.
5.
6.
7.
8.
9.
10.

Underestimation of resources for data loading


Hidden problems with source systems
Required data not captured
Increased end-user demands
Data homogenization
High demand for resources
Data ownership
High maintenance
Long duration projects
Complexity of integration
Kyanganda S.

ICS 2415 Advanced Dbase Systems

225

Data Warehouse Design


Data must be designed to allow ad-hoc queries to be
answered with acceptable performance constraints
Queries usually require access to factual data
generated by business transactions
e.g. find the average number of properties rented out with a
monthly rent greater than 700 at each branch office over
the last six months

Uses Dimensionality Modelling

Kyanganda S.

ICS 2415 Advanced Dbase Systems

226

Dimensionality Modelling
Similar to E-R modelling but with constraints
composed of one fact table with a composite primary key
dimension tables have a simple primary key which corresponds
exactly to one foreign key in the fact table
uses surrogate keys based on integer values
Can efficiently and easily support ad-hoc end-user queries

Kyanganda S.

ICS 2415 Advanced Dbase Systems

227

Star Schemas
The most common dimensional model
A fact table surrounded by dimension tables
Fact tables
contains FK for each dimension table
large relative to dimension tables
read-only

Dimension tables
reference data
query performance speeded up by denormalising into a
single dimension table
Kyanganda S.

ICS 2415 Advanced Dbase Systems

228

E-R Model Example

Source: Connolly and Begg

Kyanganda S.

ICS 2415 Advanced Dbase Systems

229

Star Schema Example

Source: Connolly and Begg

Kyanganda S.

ICS 2415 Advanced Dbase Systems

230

Other Schemas
Snowflake schemas
variant of star schema
each dimension can have its own dimensions

Starflake schemas
hybrid structure
contains mixture of (denormalised) star and (normalised)
snowflake schemas

Kyanganda S.

ICS 2415 Advanced Dbase Systems

231

OLAP
Online Analytical Processing
dynamic synthesis, analysis and consolidation of large
volumes of multi-dimensional data
normally implemented using specialized multi-dimensional
DBMS
a method of visualising and manipulating data with many interrelationships

Support common analytical operations such as


consolidation
drill-down
slicing and dicing

Kyanganda S.

ICS 2415 Advanced Dbase Systems

232

Codds OLAP Rules


1. Multi-dimensional conceptual view
2. Transparency
3. Accessibility
4. Consistent reporting performance
5. Client-server architecture
6. Generic dimensionality
7. Dynamic sparse matrix handling
8. Multi-user support
9. Unrestricted cross-dimensional operations
10. Intuitive data manipulation
11. Flexible reporting
12. Unlimited dimensions and aggregation levels

Kyanganda S.

ICS 2415 Advanced Dbase Systems

233

OLAP Tools
Categorised according to architecture of underlying
database
Multi-dimensional OLAP
data typically aggregated and stored according to predicted usage
use array technology

Relational OLAP
use of relational meta-data layer with enhanced SQL

Managed Query Environment


deliver data direct from DBMS or MOLAP server to desktop in form
of a datacube

Kyanganda S.

ICS 2415 Advanced Dbase Systems

234

MOLAP

RDB
Server

MOLAP
server
Request
Result

Load

Database/Application
Logic Layer

Kyanganda S.

Presentation
Layer

ICS 2415 Advanced Dbase Systems

235

ROLAP
ROLAP
server

RDB
Server

Request

SQL
Result

Database
Layer

Kyanganda S.

Result

Application
Logic Layer

Presentation
Layer

ICS 2415 Advanced Dbase Systems

236

MQE

RDB
Server

End-user
tools

SQL
Result
MOLAP
server

Request
Load

Kyanganda S.

Result

ICS 2415 Advanced Dbase Systems

237

Data Mining
The process of extracting valid, previously unknown,

comprehensible and actionable information from


large databases and using it to make crucial business
decisions (Simoudis, 1996)
focus is to reveal information which is hidden or unexpected
patterns and relationships are identified by examining the
underlying rules and features of the data
work from data up
require large volumes of data

Kyanganda S.

ICS 2415 Advanced Dbase Systems

238

Example Data Mining


Applications
Retail/Marketing
Identifying buying patterns of customers
Finding associations among customer demographic
characteristics
Predicting response to mailing campaigns
Market basket analysis

Kyanganda S.

ICS 2415 Advanced Dbase Systems

239

Example Data Mining


Applications
Banking
Detecting patterns of fraudulent credit card use
Identifying loyal customers
Predicting customers likely to change their credit card
affiliation
Determining credit card spending by customer groups

Kyanganda S.

ICS 2415 Advanced Dbase Systems

240

Data Mining Techniques


Four main techniques
Predictive Modelling
Database Segmentation
Link Analysis
Deviation Direction

Kyanganda S.

ICS 2415 Advanced Dbase Systems

241

Data Mining Techniques


Predictive Modelling
using observations to form a model of the important
characteristics of some phenomenon

Techniques:
Classification
Value Prediction

Kyanganda S.

ICS 2415 Advanced Dbase Systems

242

Classification ExampleTree Induction


Customer renting property
> 2 years
No

Yes
Customer age
> 25 years?

Rent property
No

Rent property

Yes

Buy property

Source: Connolly and Begg

Kyanganda S.

ICS 2415 Advanced Dbase Systems

243

Data Mining Techniques


Database Segmentation:
to partition a database into an unknown number of
segments (or clusters) of records which share a number of
properties

Techniques:
Demographic clustering
Neural clustering

Kyanganda S.

ICS 2415 Advanced Dbase Systems

244

Segmentation: Scatterplot
Example

Source: Connolly and Begg

Kyanganda S.

ICS 2415 Advanced Dbase Systems

245

Data Mining Techniques


Link Analysis
establish associations between individual records (or sets
of records) in a database
e.g. when a customer rents property for more than two years
and is more than 25 years old, then in 40% of cases, the
customer will buy the property

Techniques
Association discovery
Sequential pattern discovery
Similar time sequence discovery
Kyanganda S.

ICS 2415 Advanced Dbase Systems

246

Data Mining Techniques


Deviation Detection
identify outliers, something which deviates from some
known expectation or norm
Statistics
Visualisation

Kyanganda S.

ICS 2415 Advanced Dbase Systems

247

Deviation Detection:
Visualisation Example

Source: Connolly and Begg

Kyanganda S.

ICS 2415 Advanced Dbase Systems

248

Mining and Warehousing


Data mining needs single, separate, clean,
integrated, self-consistent data source
Data warehouse well equipped:
populated with clean, consistent data
contains multiple sources
utilises query capabilities
capability to go back to data source

Kyanganda S.

ICS 2415 Advanced Dbase Systems

249

Further Reading
Connolly and Begg, chapters 31 to 34.
W H Inmon, Building the Data Warehouse, New
York, Wiley and Sons, 1993.
Benyon-Davies P, Database Systems (2nd ed),
Macmillan Press, 2000, ch 34, 35 & 36.

Kyanganda S.

ICS 2415 Advanced Dbase Systems

250

Interoperability and XML

Kyanganda S.

ICS 2415 Advanced Dbase Systems

251

Objectives
To investigate issues surrounding interoperability
To gain a basic understanding of XML and its
developments related to database systems
To gain a basic understanding of the use of XML
towards achieving interoperability

Kyanganda S.

ICS 2415 Advanced Dbase Systems

252

Interoperability
IEEE (1990) Definition:
the ability of two or more systems or components to
exchange information and to use the information that
has been exchanged
IEEE Standard Computer Dictionary: A Compilation of IEEE

Standard Computer Glossaries

Current simple solutions


mediation
transformation
Kyanganda S.

ICS 2415 Advanced Dbase Systems

253

Features

Exchange of messages and requests


Use of each others functionality
Client-server abilities
Distribution
Operate multiple systems as single unit
Communication despite incompatibilities
Extensibility and evolution

Kyanganda S.

ICS 2415 Advanced Dbase Systems

254

The Problems and Difficulties


Different data models
There can be major semantic differences even within
the same data model
Properties may be called by different names
Different data types may be used
What about recreating local defined functions?

Kyanganda S.

ICS 2415 Advanced Dbase Systems

255

The Problems and Difficulties


All this implies we know where they are and
we have a physical means of getting to them
Databases are by their nature protectors of data,
they do not share easily
Many (particularly legacy systems) do not have
any form of web interface
Most databases are security protected
Databases do not advertise their services to the
web
Kyanganda S.

ICS 2415 Advanced Dbase Systems

256

EBCDIC
EBCDIC: /eb's*-dik/, /eb'see`dik/, or /eb'k*-dik/ n.

[abbreviation, Extended Binary Coded Decimal Interchange Code]


A character set used on early IBM computers. It exists in at least six mutually
incompatible versions, all featuring such delights as non-contiguous letter sequences
and the absence of several punctuation characters fairly important for modern
computer languages (exactly which characters are absent varies according to which
version of EBCDIC you're looking at). IBM adapted EBCDIC from punched card code
in the early 1960s and promulgated it as a customer-control tactic, spurning the
already established ASCII standard. Today, IBM claims to be an open-systems
company, but IBM's own description of the EBCDIC variants and how to convert
between them is still an internally classified top-secret.

EBCDIC is the most common alternate character code but there are others.
http://www.cheverus.org/advanced/data/EBCDIC.html

Kyanganda S.

ICS 2415 Advanced Dbase Systems

257

Some Simple Problems


1
Differing schema
SCHEMA 1
author char(50)

SCHEMA 2
author_surname char(50)
author_inits char(10)

title varchar(300)

title varchar(200)

keyword set(char(30))

keywd array(8) (char(30))

both are valid schema in SQL:2003


also A.N.Other, A N Other, Other N A, ...

Kyanganda S.

ICS 2415 Advanced Dbase Systems

258

Some Simple Problems 2


Homogeneous Models
the same information may be held as attribute name,
relation name or a value in different databases
e.g. library fines
as a dedicated relation:
as an attribute:
or as a value:

Kyanganda S.

Fine(amount, borrowed_id)
Loan(id, isbn, date_out, fine)
Charge(1.25, fine)

ICS 2415 Advanced Dbase Systems

259

Complex Problems
Heterogeneous models
Need to relate model constructions to one another, for
example
relate classes in object-oriented to user-defined types in objectrelational
Or even more problematic, to tables in a relational database

All problems are magnified at this level!

Kyanganda S.

ICS 2415 Advanced Dbase Systems

260

Extensible Markup Language


(XML)
A simplified version of SGML, designed specifically for Web
documents
a meta-language to create customised tags which provide
functionality not available in HTML
links can point to multiple documents
links can be bi-directional
links to relative objects

broken into:
stylesheet (XSL standard)
document type definition (DTD) for well-formed documents
document data

Kyanganda S.

ICS 2415 Advanced Dbase Systems

261

Sample XML Database


unicode
<?xml version = 1.0 encoding = UTF-8 standalone=yes>
<?xml:stylesheet type =text/xsl href=staff_list.xsl>
<!DOCTYPE STAFFLIST SYSTEM staff_list.dtd>
<STAFFLIST>
<STAFF branchNo = B005>
root element only 1 per document
<STAFFNO>SL21</STAFFNO>
attribute
<NAME>
<FNAME>John</FNAME><LNAME>White</LNAME>
</NAME>
<POSITION>Manager>
elements ordered
</STAFF>
attributes unordered
<STAFF branchNo=B003>

</STAFFLIST>

Kyanganda S.

ICS 2415 Advanced Dbase Systems

262

Sample DTD
<!ELEMENT STAFFLIST (STAFF)*>
<!ELEMENT STAFF (NAME, POSITION, DOB?, SALARY)>
<!ELEMENT NAME (FNAME, LNAME)>
<!ELEMENT FNAME (#PCDATA)>
<!ELEMENT LNAME (#PCDATA)>
<!ELEMENT POSITION (#PCDATA)>
<!ATTLIST STAFF branchNo CDATA #IMPLIED>

Kyanganda S.

ICS 2415 Advanced Dbase Systems

263

Sample StyleSheet
<?xml version = 1.0?>
<xsl:stylesheet xmlns:xsl = http://www.w3.org/TR/WD-xsl>
<xsl:template match = />
<html><body>
<center><h2>DreamHome Estate agents</h2></center>
<table border = 1 bgcolor = #ffffff>
<tr>
<th>staffNo</th>
--- repeat for other column headings
<xsl:for-each select=STAFFLIST/STAFF>
<tr><xsl:value-of-select=STAFFNO/></td>
<tr><xsl:value-of-select=NAME/FNAME/></td></tr>
</xsl:for-each></table></body></html>
</xsl-stylesheet>
Kyanganda S.

ICS 2415 Advanced Dbase Systems

264

Benefits of XML

Simplicity
Open standard and platform/vendor-independent
Extensibility
Reuse
Separation of content and presentation
Improved load balancing
Due to client side processing

Kyanganda S.

ICS 2415 Advanced Dbase Systems

265

Benefits of XML
Support for integration of data from multiple
sources
Ability to describe data from a wide variety of
applications
More advanced search engines
XQuery

New opportunities

Kyanganda S.

ICS 2415 Advanced Dbase Systems

266

XML Schema

<xsd:group-name =STAFFTYPE
<xsd:elementname=STAFF>
<xsd:complexType>
<xsd:sequence>
<xsd:element name = STAFFNO type=STAFFNOTYPE/>
<xsd:element name = NAME>
<xsd:complexType>
<xsd:sequence>
<xsd:element name = FNAME> type = xsd:string/>
<xsd:element name = LNAME> type = xsd:string/>
</xsd:sequence>
</xsd:complexType>
...
Kyanganda S.

ICS 2415 Advanced Dbase Systems

267

XQuery
A query language for XML
e.g. List the staff at branch B005 with a salary
greater than 15000
FOR $S IN document(staff_list.xml)//STAFF
WHERE $S/SALARY > 15000 AND
$S/@branchNo = B005
RETURN $S/STAFFNO
Kyanganda S.

ICS 2415 Advanced Dbase Systems

268

Storing XML in a Relational


Database
Three approaches
Fine Grained
Course Grained
Medium Grained

Kyanganda S.

ICS 2415 Advanced Dbase Systems

269

Fine Grained Approach


-Good for queries
which need to
inspect/manipulate
specific elements in
the XML document
-Not good for queries
which manipulate (e.g.
retrieve/store) the
entire document
Kyanganda S.

Child

Element
(+ parent)

Document

CharData
Attribute

ICS 2415 Advanced Dbase Systems

270

Course Grained
Approach
One table:

DocId

Name

Body

Best for queries which manipulate whole document


e.g. retrieve/store a document

Worst for queries which manipulate elements


e.g. retrieve children of a tag

Kyanganda S.

ICS 2415 Advanced Dbase Systems

271

Medium Grained Approach


A compromise between fine and course grained
Slice document tree up into sections
Store sub-sections using a course grained approach

Good for both types of queries

Kyanganda S.

ICS 2415 Advanced Dbase Systems

272

XML RDF
Resource Description Framework
XML Schema defines a grammar
therefore we have all the problems shown previously (e.g.
names)
RDF provides a way to encode domain models
an infrastructure that enables the encoding, exchange and reuse
of structured meta-data (W3C)
Defines semantics, syntax and structure

this is what we need for interoperable systems


Kyanganda S.

ICS 2415 Advanced Dbase Systems

273

RDF Data Model


RDF Data Model consists of three objects
Resource
anything that can have a URL

Property
a specific attribute which is used to describe a resource

Statement
a combination of a resource, a property and a value
usually known as the subject, predicate and object
e.g. The Author of http://www.dreamhome.co.uk/ staff_list.xml
is John White

Kyanganda S.

ICS 2415 Advanced Dbase Systems

274

RDF Example
The statement would be defined in RDF (simplified)
as:
<?xml version="1.0"?>
<RDF>
<Description about="
http://www.myhome.net/staff_list.xml ">
<author>Fred Smith</author>
<created>25 May 2006</created>
</Description>
</RDF>
Kyanganda S.

ICS 2415 Advanced Dbase Systems

275

Summary
XML is being increasingly used in data models, data
transmission and data integration
Interoperability is the key issue and a major research
area in database systems
XML and RDF have the potential as a stepping stone to
achieving this

Kyanganda S.

ICS 2415 Advanced Dbase Systems

276

Further Reading
Connolly and Begg
chapter 30 (sections 30.1, 30.2 and 30.3) discusses XML and its
related technologies

Graves
Designing XML Databases, Prentice Hall.

XML Tutorial
www.w3cschools.com/xml

RDF introduction (Idiots Guide!)


http://archive.dstc.edu.au/RDU/reports/RDF-Idiot/

Kyanganda S.

ICS 2415 Advanced Dbase Systems

277

Você também pode gostar