Escolar Documentos
Profissional Documentos
Cultura Documentos
and Recovery
Kyanganda S.
Database Security
and Integrity
Definitions
Threats to security
Threats to integrity
Resolution of Problems
Kyanganda S.
Database Security
SECURITY
Protecting the database from unauthorised users
Ensures that users are allowed to do the things they
are trying to do
Kyanganda S.
Database Security
INTEGRITY
Protecting the database from authorised users
Ensures that what users are trying to do is correct
Kyanganda S.
Database Security
TYPES OF SYSTEM FAILURES
1. HARDWARE
DISK , CPU , NETWORK
2.
SOFTWARE
Kyanganda S.
Database Security
Important security features include:
Views
Authorisation & controls
User defined procedures
Encryption procedures
Kyanganda S.
Authorisation Rules
An example: a person who can supply a particular
password may be authorised to read any record, but
cannot modify any of those records.
Authorisation Table for subjects i.e. Salesperson
Customer Records
Order Records
Read
Insert
Modify
Delete
Kyanganda S.
Authorisation Rules
Authorisation Table for Objects i.e. Order Records
Salesperson
Password
Order Entry
Accounting
(Zahra)
(Maina)
(Shirin)
Read
Insert
Modify
Delete
Kyanganda S.
Database Integrity
CONSTRAINTS
Can be classed in 3 different ways:
1. Business constraints
2. Entity constraints
3. Referential constraints
Kyanganda S.
Database Integrity
BUSINESS CONSTRAINTS
A value in one column may be constrained by value
of another or by some calculation
or formulae.
Kyanganda S.
Database Integrity
ENTITY CONSTRAINTS
Individual columns of a table may be constrained e.g. not
null
REFERENTIAL CONSTRAINTS
Some times referred to as key constraints, e.g.
Table 2 depends on Table 1
Kyanganda S.
Database Integrity
create table account_dets
(acc_id char(6) primary key,
acc_custid char(6) references customer(cust_id),
acc_odraft number(4) check (acc_odraft <= 200),
acc_type char(2) constraint type_chk
check (acc_type in (AB, CD, EF)),
acc_crtdate date not null);
Kyanganda S.
Database Integrity
BENEFITS OF USING CONSTRAINTS
Guaranteed integrity and consistency
Database Integrity
CONCURRENCY CONTROL
WHAT IS IT?
Kyanganda S.
Database Integrity
CONCURRENCY CONTROL
WHY IS IT IMPORTANT?
Kyanganda S.
Database Integrity
Janet
Time
1. Read balance (1000)
John
1. Read Balance (1000)
Kyanganda S.
Database Integrity
The three main integrity problems are:
Lost updates
Uncommitted data
Inconsistent retrievals
Kyanganda S.
Database Integrity
LOCKING
Two kinds of Locks:
Kyanganda S.
Database Integrity
Time
User 1
1. Lock record X
User2
1. Lock record Y
2. Request record Y
2. Request Record X
(Wait for X)
(Wait for Y)
DEADLOCK
Kyanganda S.
Database Recovery
The process of restoring the database to a
correct state in the event of a failure, e.g.
System Crashes
Media Failures
Application Software Errors
Natural Physical Disasters
Carelessness
Sabotage
Kyanganda S.
Database Recovery
Kyanganda S.
Transactions
Basic unit of recovery
Properties of Transaction (ACID)
Atomicity
Consistency
Isolation
Durability
Staff Salary
Update Example
Read Operations:
Find address of the disk block that contains record with primary key x
transfer block into a DB buffer in main memory
copy salary data from DB buffer into variable salary
Write Operations:
as steps 1 & 2 above
copy salary data from variable salary into the DB buffer
write DB buffer back to disk
Kyanganda S.
Storing Data
Buffer contents flushed to secondary storage
permanent
buffer full
Main Memory
Database
Buffer
Secondary
Storage
Kyanganda S.
Commit
Database
(State 2)
Database
(State 3)
Database
Backup
Database
(State 2)
Kyanganda S.
Database
(State 4)
Back-up Facilities
DBMS provides a mechanism for taking backup
copies of the database and log file at regular
intervals.
A dump or copy or backup file contains all or
part of the database
backups taken without having to stop the
system
Kyanganda S.
Journal Facilities
REDO LOGS
This is the main logging file. The file
contains two different types of
logging records.
AFTER IMAGES
BEFORE IMAGES
Kyanganda S.
Journal Facilities
REDO LOGS - AFTER IMAGES
After any column of any row on any table in the
database is changed, then the new values are
not only written to the database but also to the
redo log. The complete row is written to the log.
If a row is deleted then notification is also put on
to the redo log. After images are used in roll
forward recovery.
Kyanganda S.
Journal Facilities
REDO LOGS - BEFORE IMAGES
Before a row is updated the data is copied to
the redo log. It is not a simple copy from the
database because a separate area of the
database maintains the immediate pre-update
version of each row updated in the database.
The extra area is called the ROLLBACK
SEGMENT. The redo log takes before image
copies from the rollback segment in the
database.
Kyanganda S.
Time
Operation
T1
10:12
START
T1
10:13
UPDATE
T2
10:14
START
T2
10:16
INSERT
TENANT NO37
T2
10:17
DELETE
TENANT NO9
(old value)
T2
10:17
UPDATE
PROPERTY PG16
(old value)
T1
10:18
COMMIT
10:19
CHECKPOINT
Kyanganda S.
Object
TENANT NO21
Before
Image
(old value)
After Image
(new value)
(new value)
(new value)
T2
pPtr
nPtr
Types of Recovery
Duplicate Databases
Rollback Recovery
Rollforward Recovery
Reprocessing Transactions
Kyanganda S.
Duplicate Databases
Requires 2 copies of the database
Advantages
Fast Recovery (seconds)
Good for disk failures
Disadvantages
No protection against power failure
Expensive
Kyanganda S.
Rollback Recovery
Changes made to the database are
undone
(Backward Recovery )
Rollback enables the updating to be
undone to a predetermined point in the
database processing that provides a
consistent database state.
Kyanganda S.
Database
(State 1)
Database
(State 2)
Database
(State 3)
Database
Backup
Database
(State 2)
Kyanganda S.
Database
(State 4)
Rollback Recovery
Database
(with
changes)
ROLLBACK
Database
(without
changes)
Before
Images
Kyanganda S.
Kyanganda S.
Database
(State 1)
Database
(State 2)
Database
(State 3)
Database
Backup
Database
(State 2)
Kyanganda S.
Database
(State 4)
Database
(without
changes)
ROLL FORWARD
Database
(with
changes)
After
Images
Kyanganda S.
Reprocessing Transactions
Similar to Forward Recovery
Uses update transactions instead of after
images
ADVANTAGES
Simple
DISADVANTAGES
Slow
Kyanganda S.
Database
(State 2)
Database
(State 3)
Database
Backup
Database
(State 2)
Kyanganda S.
Database
(State 4)
Recovery Procedure
Storage Medium
Destruction
*Duplicate Database
Forward Recovery
Reprocess Transactions
*Backward Recovery
Forward Recovery or reprocess
transactions - bring forward to
just before termination
*Backward Recovery
Reprocess Transactions
(exclusing those from the update
that created incorrect data)
Transaction error or
system failure
Incorrect Data
Kyanganda S.
Summary
This lecture has looked at security and
recovery procedures
Ensuring that these two are administered
correctly cuts out the majority of problems
with database administration
Kyanganda S.
Further Reading
Security
Connolly & Begg, chapter 19
Concurrency Control
Connolly & Begg, chapter 20?
Kyanganda S.
44
Contents
Definitions
Countermeasures
Security Controls
Data Protection and Privacy
Statistical Databases
Web Database Security Issues and Solutions
SQL Injection
Kyanganda S.
45
Kyanganda S.
46
Countermeasures
Ways to reduce risk
Include
Computer Based Controls
Non-computer Based Controls
Kyanganda S.
47
48
Kyanganda S.
49
Data Security
Two (original) broad approaches to data security:
Discretionary access control
a given user has different access rights (privileges) on different objects
flexible, but limited to which rights users can have on an object
privileges can be passed on at users discretion
Kyanganda S.
50
Kyanganda S.
51
Access modes
SELECT
INSERT
DELETE
UPDATE
52
Negative authorization
Denials are expressed
Denials take precedence
Kyanganda S.
53
SQL Facilities
SQL supports discretionary access control using view
mechanism and authorization system
e.g. CREATE VIEW S_NINE_TO_FIVE AS
SELECT S.S#, S.SNAME, S.STATUS, S.CITY
FROM S
WHERE to_char(SYSDATE, 'HH24:MI:SS) >= 09:00:00
AND to_char(SYSDATE, 'HH24:MI:SS) <= 17:00:00;
GRANT SELECT, UPDATE (STATUS)
ON S_NINE_TO_FIVE
TO Purchasing;
parameterised view
54
Becomes
SELECT * FROM prop_for_rent WHERE prop_type = F
Kyanganda S.
55
56
Statistical Databases
A database that permits queries that derive
aggregated information (e.g. sums, averages)
but not queries that derive individual information
Tracking
possible to make inferences from legal queries to
deduce answers to illegal ones
SELECT COUNT(*) FROM STATS X WHERE X.SEX=M AND
X.OCCUPATION = Programmer)
SELECT SUM(X.SALARY) FROM STATS X WHERE X.SEX=M AND
X.OCCUPATION = Programmer)
Kyanganda S.
57
Statistical Databases
Various strategies can be used to minimize
problems
prevent queries from operating on only a few
database entries
swap attribute values among tuples
randomly add in additional entries
use only a random sample
maintain history of query results and reject queries
that use a high number of records identical to
previous queries
Kyanganda S.
58
Kyanganda S.
59
firewalls
prevents unauthorised access to/from a private network
digital certificates
electronic message attachments to verify that user is
authentic
Kerberos
centralised security server for all data and resources on
network
Kyanganda S.
60
Active-X
Kyanganda S.
61
SQL Injection
a technique used to take advantage of nonvalidated input vulnerabilities to pass SQL
commands through a Web application for
execution by a backend database1
Can chain SQL commands
Embed SQL commands in a string
Ability to execute arbitrary SQL queries
1 http://imperva.com/application_defense_center/glossary/sql_injection.html
Kyanganda S.
62
63
64
Restrict UNION
Kyanganda S.
65
Summary
Have looked at a number of issues and
solutions for database security
e.g. access controls, SQL features, etc.
Kyanganda S.
66
Further Reading
Connolly and Begg, chapter 19
http://www.oracle.com/technology/deploy/security/oracle8i/
pdf/vpd_wp6.pdf
Kyanganda S.
67
Kyanganda S.
68
Contents
Client/Server Databases
Web Databases
Distributed Databases
Kyanganda S.
69
Client/Server Architecture
In a file server client architecture each client must run a
copy of the DBMS
A better solution is to have a central database server
which performs all database commands sent to it from
client PCs.
Application programs on each client PC can then
concentrate on user interface functions.
Database recovery, security and concurrency control is
managed centrally on the server.
Kyanganda S.
70
Client/Server Architecture
DATABASE SERVER
The SERVER portion of the client/server database
system which provides processing and shared access
functions.
Kyanganda S.
71
Client/Server Architecture
CLIENT
Manages the user interface (controls the PC screen,
interprets data sent to it by the server and displays the
results of database queries)
The client forms queries in a specified language (usually
SQL) to retrieve data from the database. This query process
is usually transparent to the user.
Kyanganda S.
72
Client/Server Architecture
CLIENT/SERVER ADVANTAGES
Allows companies to harness the benefits of
microcomputer technology such as low cost.
Processing can be performed close to the source of
the data - more speed.
Allows the use of GUI interfaces that are commonly
available on PCs and workstations.
Paves the way for truly open systems.
Kyanganda S.
73
Client/Server Architecture
CLIENT/SERVER DESIGN ISSUES
The server must be upgradeable to allow for the
growth in clients.
Gateway software is normally required for accessing
databases held on a mainframe.
The server must have capabilities for backup,
recovery, security and UPS.
Kyanganda S.
74
Client/Server Architecture
CLIENT/SERVER DESIGN ISSUES
Can be complex and so require specialised and expensive
tools such as database servers and APIs.
A lack of comprehensive standards.
Front-end GUI software often requires expensive client
workstations.
Kyanganda S.
75
Traditional Client-Server
Architecture
Traditional
Database Systems
are based on a
two-tier clientserver architecture
Fat clients
Kyanganda S.
Client
User interface
Main business and data
processing logic
Database
Server
Server-side validation
Database access
76
Web Architecture
Need for enterprise
scalability causes
problems which
can be solved by a
three-tier
architecture
Thin clients
Kyanganda S.
Client
User interface
Application
Server
Business logic
Data processing logic
Database
Server
Server-side validation
Database access
77
DBMS advantages
E.g. transactions, concurrency, synchronisation, security, integrity
Simplicity
HTML is a simple markup language, however with new scripting languages
this simplicity is being lost
Platform independence
Web clients are mostly platform independent
Standardization
HTML is a de facto standard
Kyanganda S.
78
Advantages (cont).
Cross-platform support
Users on all types of computer can access a machine with a web browser
Scalable deployment
Applications upgraded on server only
Innovation
Organisations can provide new services and reach new customers
Kyanganda S.
79
Security
Data accessible on web
User authentication and secure data transmissions are critical
Cost
A report from Forrester Research claims that maintaining a commercial web
site costs $200 to $3.4 million
Scalability
Unreliable and potentially very large peak loads
Needs highly scalable server architectures
Kyanganda S.
80
Disadvantages (cont.)
Limited HTML Functionality
Need to extend HTML with scripting languages
Adds a performance overhead
Statelessness
No concept of a database connection
Bandwidth
Internet is slow! 1.5mbps compared to 10-100mbps
Performance
Many scripting languages are interpreted languages
81
Kyanganda S.
82
Kyanganda S.
83
Kyanganda S.
84
Database Connectivity
Client Side, 2 approaches:
Extend the browser using scripts, or add-ons or applets,
e.g. plug-ins, JavaScript, ActiveX, Java applets
Link browser to other (external) applications, e.g. legacy systems
Kyanganda S.
85
Client Side
Advantages
Distribution of processing
Feedback speed
Web-page functionality
Disadvantages
Platform/environment dependent
Security and integrity
Download time
Programming limitations
Kyanganda S.
86
Server Side
Advantages
Platform/browser independent
Security and integrity
Download time
Programming limitations direct access to database
Disadvantages
Lack of debugging tools
Lack of direct control over user interface
Kyanganda S.
87
Distributed Databases
DECENTRALIZED DATABASE
stored on computers at multiple locations.
computers are not interconnected by a network.
users at the various sites cannot share data.
DISTRIBUTED DATABASE
Spread physically across computers in multiple locations that
are connected by a data communications link.
Kyanganda S.
88
Distribution Types
Geographical Distribution: Several databases run
under the control of different CPU's at a variety of
different locations.
Platform Distribution: Databases exist on diverse
hardware platforms, and are 'brought together' by
the distributed database manager.
Architectural Distribution: Different database
architectures exist together, e.g. an object-oriented
database communicating with a relational database
Kyanganda S.
89
Dates Rules
Distributed Database Requirements:
For a distributed database to be as such, a
fundamental principle must be adhered to:
To the user, a distributed database should look exactly like
a non-distributed system
Local Autonomy:
All operational controls and data maintenance are
controlled only by that site.
Kyanganda S.
90
Dates Rules
No Reliance On A Central Site:
This follows on from the first objective and is selfexplanatory
Continuous Operation:
A distributed approach leads to greater reliability
and availability. The database should still be able to
function, even if one of its sites is unavailable.
Kyanganda S.
91
Dates Rules
Distributed Transaction Management:
Transaction processing is the key to the successful
usage of distributed databases.
Must cater for two core aspects of transaction
management i.e. recovery control and
concurrency.
Location Independence
Otherwise known as Transparency.
Kyanganda S.
92
Dates Rules
Fragmentation Independence:
Horizontal Partitioning: different rows from the
same table are stored at different sites.
Vertical Partitioning: different columns from the
same table are maintained at different sites.
Replication Independence:
Replication occurs when a stored relation can be
represented by many distinct copies (replicas), stored at
many sites. As with fragmentation, users must not be aware
that the data is replicated.
Kyanganda S.
93
Dates Rules
Distributed Query Processing:
Queries may retrieve information from several
sites. Therefore distributed queries must be
optimised.
Kyanganda S.
94
Dates Rules
Hardware Independence:
Network Independence:
Support for a disparate variety of communication
networks.
DBMS Independence:
Kyanganda S.
95
Distributed Databases
ADVANTAGES
Increased reliability and availability
Encourages local ownership of data
Modular growth
Lower communication costs
Faster response
Kyanganda S.
96
Distributed Databases
DISADVANTAGES
Software complexity and cost
Processing overhead
Data integrity
Slow response
Kyanganda S.
97
Distributed Databases
HOW SHOULD A DATABASE BE
DISTRIBUTED ?
98
Data Replication
Kyanganda S.
99
Kyanganda S.
100
Distributed databases
Horizontal Partitioning:
The base table is split horizontally into several
different tables at different sites.
Selected rows from a table are put into tables at
different sites.
Kyanganda S.
101
Distributed databases
Advantages
Efficiency - Data items are stored where they are
most often used away from other applications.
Optimisation - Data optimised for local use
Security - Only relevant data is available
Kyanganda S.
102
Distributed databases
Disadvantages
Inconsistent access speed - When data from
several different partitions are required, access
speed can vary significantly.
Backup vulnerability
Kyanganda S.
103
Distributed databases
Vertical PARTITIONING
Some of the columns in a table are projected into
a table at one of the sites and other columns are
projected into a table at another site.The same
advantages and disadvantages of horizontal
partitioning apply.
Kyanganda S.
104
Distributed databases
Combinations
To complicate matters even further it is possible
to have a strategy which is a combination of all
the above. Some data stored centrally, some
distributed both horizontally and vertically. It
could be a real challenge (or a nightmare).
Kyanganda S.
105
Distributed databases
DISTRIBUTED DBMS
Determine the location from which data is to be
retrieved.
Translate requests from different nodes.
Provide functions such as security, recovery,
concurrency and optimisation.
Kyanganda S.
106
Distributed databases
DISTRIBUTED DBMS
IT SHOULD ALSO OFFER:
Location transparency
Replication transparency
Failure transparency
Concurrency transparency
Commit protocol
Kyanganda S.
107
Further Reading
Distributed Databases
Connolly and Begg, chapter 22
Web Databases
Connolly and Begg, chapter 29
Sections 29.1 to 29.3
Kyanganda S.
108
Object-Oriented Databases
Kyanganda S.
109
Contents
Complex Applications
RDBMS Weaknesses
Next Generation Data Models
Object-Oriented Databases
Further Reading
Kyanganda S.
110
Kyanganda S.
111
Complex Applications
RDBMS are inadequate for applications including:
CAD, CAM
CASE
Office Information Systems
Multimedia systems
GIS
Science and medicine
Kyanganda S.
112
Complex Applications
CAD, CAM
complex objects
graphics
a large number of types but few instances of each type
hierarchical design not static
CASE
Kyanganda S.
113
Complex Applications
Office Information and Multimedia Systems
e-mail support
documentation
SGML documents
Kyanganda S.
114
RDBMS Weaknesses
Poor separation of real world entities
normalisation leads to entities that dont closely match
real world
joins costly
Semantic overloading
all data held as relationships
no mechanism for differentiation between entities and
relationships
Kyanganda S.
115
RDBMS Weaknesses
Poor support for integrity and enterprise constraints
relational systems good for supporting referential, entity and
simple business constraints
not good for more complex enterprise constraints
Kyanganda S.
116
RDBMS Weaknesses
Limited operations
SQL does not allow new operations to be defined
e.g. select age from person;
Impedance mismatch
need to embed SQL to get computational completeness
data types in SQL and programming language dont match
Kyanganda S.
117
RDBMS Weaknesses
Concurrency, schema changes and poor navigational
access
no support for long duration transactions
difficult to change schema, e.g. add columns to a table
RDBMS based on content based access
Not navigational
Kyanganda S.
118
Data Models
1st Generation
Hierarchical
Network
Relational
2nd Generation
Entity-Relational
3rd
Generation
Kyanganda S.
Semantic
Object-Relational
Object-Oriented
119
Object-Oriented Databases
Kyanganda S.
120
OO Databases Overview
Object-Oriented Database
Kyanganda S.
121
OO Databases Overview
include concepts such as
user extensible type system, complex objects,
encapsulation, inheritance, polymorphism, dynamic
binding, object identity
Kyanganda S.
122
Origins of OO Databases
Traditional Database Systems
persistence, sharing, transactions, concurrency control, recovery
control, security, integrity, querying
Object-Oriented Programming
object identity, encapsulation, inheritance, types and classes,
methods, complex objects, polymorphism, extensibility
Special Requirements
versioning, schema evolution
Kyanganda S.
123
OODBMS Development
Strategies
Various approaches:
Extend an existing OO-PL with database capabilities
Provide extensible OO DBMS libraries
Embed OO database language constructs in a
conventional host language
Extend an existing database language with OO
capabilities
Develop a novel data model/data language
Kyanganda S.
124
Kyanganda S.
125
Object-Oriented DB System
Manifesto
OO Criteria:
1. Complex objects must be supported
2. Object identity must be supported
3. Encapsulation must be supported
4. Types or classes must be supported
5. Types or classes must be able to inherit from their
ancestors
6. Dynamic binding must be supported
7. The DML must be computationally complete
8. The set of data types must be extensible
Kyanganda S.
126
Object-Oriented DB System
Manifesto
DBMS Criteria:
9. Data persistence must be provided
10. The DBMS must be capable of managing very
large databases
11. The DBMS must support concurrent users
12. The DBMS must be capable of recovery from
hardware and software
13. The DBMS must provide a simple way of
querying data
Kyanganda S.
127
OODB Advantages
Kyanganda S.
128
OODB Disadvantages
Kyanganda S.
129
Kyanganda S.
130
Kyanganda S.
131
Object Model
Basic Constructs
object
literals
132
Types
Interface Definition
Class
defines abstract behaviour and abstract state
extended interface with information for ODMS schema
definition
objects are class instances
e.g. class Person{..};
Literal
Kyanganda S.
133
Types (cont)
Inheritance
applied to both interfaces and classes
inheritance of behaviour between object types
Extend
applied to object types only
inheritance of state and behaviour
Extent
set of all instances of a class
extension
must have an unique key
Kyanganda S.
134
Objects
Instances of a class
Have an unique object identifier
remains for lifetime of object
Names
equivalent to global variables
Lifetime can be
transient
persistent
type and lifetime are independent
Kyanganda S.
135
Objects
Collections
Set, Bag, List, Array
Dictionary - sequenced key-value pairs
Structured objects
Date, Interval, Time, Timestamp
Literals
Atomic, Collection, Structured
Kyanganda S.
136
Kyanganda S.
137
138
139
e.g.
140
OQL Examples
Path Expressions
select c.address
from Persons p, p.children c
where p.address.street = Main Street
and count(p.children) >= 2
and c.address.city != p.address.city;
Methods
select max(select c.age from p.children c)
from Persons p
where p.name = Paul;
Kyanganda S.
141
Class Indication
select ((Student)p).grade
from Persons p
where course of study in p.activities;
Kyanganda S.
142
Kyanganda S.
143
Further Reading
Connolly & Begg, chapters 25, 26 and 27
Date 7th ed., chapter on Object-Oriented databases
Atkinson et al, Object-Oriented Database System
Manifesto, Proc. 1st Intl Conference on DOOD, Japan,
1989.
Cattell, et. al., The Object Data Standard: ODMG3.0
Kyanganda S.
144
Object-Relational Databases
Kyanganda S.
Contents
Background
Extensions to Relational Model
Database World
Advantages and Disadvantages of ORDBMS
Extensions to Relational
Model
Advanced Emerging Database Applications use:
user extensible type system, encapsulation, inheritance, polymorphism,
dynamic binding, complex objects, object identity
Kyanganda S.
Search capabilities/
multi-user support
File systems
ObjectRelational
DBMS
ObjectOriented
DBMS
Data complexity/extensibility
Kyanganda S.
Object-Relational
Advantages
Weaknesses of RDBMS given last time
Reuse and Sharing
extending the DBMS server to perform standard
functionality centrally
functionality shared by all applications, e.g. spatial data
types
Kyanganda S.
Object-Relational
Disadvantages
Complexity and Associated Increased Costs
simplicity and purity of relational model is lost
majority of applications do not achieve optimal performance
Kyanganda S.
3rd Generation
Database Manifesto
Manifesto developed by Stonebraker et. Al (1990)
1. A third generation DBMS must have a rich type system
2. Inheritance is a good idea
3. Functions, including database procedures and methods and
encapsulation, are a good idea
4. Unique identifiers for records should be assigned by the DBMS only if a
user-defined primary key is not available
5. Rules (triggers, constraints) will become a major feature in future
systems. They should not be associated with a specific function or
collection
Kyanganda S.
Kyanganda S.
Kyanganda S.
Kyanganda S.
Superceded by SQL:2003
Kyanganda S.
SQL:2003
Row types
a data type that can represent types of rows in tables
e.g.
CREATE TABLE branch(
bno
VARCHAR(3),
address ROW(
street VARCHAR(25),
town VARCHAR(15),
pcode ROW(
city_id VARCHAR(4)
subpart VARCHAR(4))));
SQL:2003
User Defined Types (UDT)
abstract data types 2 types, distinct and structured
structured types consists of one or more attribute and routine defns
CREATE TYPE age_type as INTEGER FINAL;
Kyanganda S.
SQL:2003
User defined routines (UDR)
may be defined as part of a UDT or as part of a schema
can be a procedure, function or method
Can be written in SQL or in an external programming
language
Polymorphism
uses a generalised object model, i.e.
No two functions in the same schema allowed to have same
signature (no. of arguments, same data types, same return type)
No two procedures allowed to have same name and number of
parameters
Kyanganda S.
SQL:2003
Subtypes/supertypes
multiple inheritance is not supported
substitutability
when an instance of a supertype is expected, an instance
of the subtype can be used in place
Tables
A UDT instance can only persist if stored as a
column in a table
can use table inheritance
Completely independent from UDT facility
Kyanganda S.
SQL:2003
Querying
uses SQL92 syntax with extensions to handle objects
e.g.
SELECT s.lname, s.get_age
FROM staff s
WHERE s.is_manager;
SELECT p.lname, p.address
FROM person p
WHERE p.get_age > 65;
SELECT p.lname, p.address
FROM ONLY (person) p
WHERE p.get_age > 65;
Kyanganda S.
SQL:2003
Reference Types and OID
system generated, type REF
Reference types can be used to define relationships
between row types
reference types uniquely identify rows
allows rows to be shared across tables
complex joins can be replaced by simple path expressions
reference types do not provide referential integrity
Collection types
ARRAYs, LISTs, SETs, MULTISETs
Kyanganda S.
SQL:2003
Persistent Stored Modules (SQL/PSM)
SQL:2003 now computationally complete
New statements added:
blocks
Assignment
IF .. THEN .. ELSE .. ENDIF, and CASE
REPEAT BLOCKS
CALL and RETURN for invoking procedures
Condition handling
Kyanganda S.
SQL:2003
Triggers
An SQL statement that is automatically executed by the DBMS as
a side effect of a modification to a table
Triggering events include insertion, deletion and update of rows in
a table
Useful for:
Verifying input data
Maintaining complex integrity constraints
alerts
Kyanganda S.
Oracle 8
New types:
attributes
Methods (normally written in PL/SQL)
Comparison of
ORDBMS v OODBMS
Feature
OID
Encapsulation
Inheritance
Polymorphism
Complex Objects
Relationships
Create/Access
persistent data
Ad hoc query facility
Navigation
Integrity Constraints
Object server/page
server
Schema evolution
ORDBMS
Supported (REF type)
Supported (UDT)
Supported
Supported
Supported (UDT)
Strongly supported
Supported, not
transparent
Strong support
Supported (REF type)
Strong supported
Object server
OODBMS
Supported
Broken for queries
Supported
Supported (OOPL)
Supported
Supported
Supported, degree of
transparancy differs
Supported in ODMG2
Strong support
No
Either
Limited support
Varying support
ACID transactions
Strong support
Supported
Recovery
Strong support
Varying support
No support
Varying support
Security, Integrity,
Views
Strong support
Limited support
Kyanganda S.
Further Reading
Connolly and Begg, chapter 28
a very good discussion
Kyanganda S.
Database Performance
Kyanganda S.
167
Contents
Database Performance
Denormalisation
Indexes
Clustering
Query Optimisation
Benchmarking
Wisconsin, TPC-C, 007, Bucky
Summary
Kyanganda S.
168
Database Performance
Query performance is necessary to achieve
acceptable performance of a RDBMS
Various ways in which this can be achieved:
De-normalisation of data to reduce joins
Creating indexes on frequently retrieved attributes
Clustering tables to reduce the number of disk reads
Automatic optimisation of queries
Kyanganda S.
169
Normalisation
Normalisation improves the logical database design
and prevents anomalies BUT
More tables more joins
Joining > 3 tables is likely to be slow
Kyanganda S.
170
Database Performance
Example:
Branch(BranchNo, street, city, postcode, mgrstaffno)
Could also be:
Kyanganda S.
171
De-normalisation
Advantages:
Minimises need for joins
Reduces number of foreign keys in relations
Reduces number of indexes
Saves storage space
Kyanganda S.
172
De-normalisation
Disadvantages
Speed up retrievals, but may slow down updates
Increases application complexity
Relation size can increase
Sacrifices flexibility
Kyanganda S.
173
Indexes
INDEXES
An index is a table or some other data structure that is used
to determine the location of a row within a table that
satisfies some condition.
Kyanganda S.
174
Indexes
Oracle allows faster access on any named table by using an
index.
Kyanganda S.
175
Kyanganda S.
176
Creating Indexes
HOW DO YOU CREATE AN INDEX ?
EXAMPLE :(a) CREATE INDEX TENIDX ON
TENANT(SURNAME);
(b) CREATE UNIQUE INDEX TENIDX ON
TENANT(SURNAME);
Kyanganda S.
177
Index Guidelines
GUIDELINES FOR USE OF INDEXES
> 200 rows in a table
Kyanganda S.
178
Indexes
POINTS TO WATCH
avoid if possible > 3 indexes on any one table
avoid indexing a column with too few distinct values
For example:- male/female
avoid indexing a column with too many distinct values
avoid if > 15% of rows will be retrieved
Kyanganda S.
179
Clusters
A disk is arranged in blocks
Blocks are retrieved as a whole and buffered
Disk Access time is slow compared with Memory
access
Gains can be made if the number of block transfers
can be reduced
Kyanganda S.
180
Database Performance
CLUSTERING
clusters physically arrange the data on disk so that
frequently retrieved info is stored together
allows 2 or more tables to be stored in the same physical
block
can greatly reduce access time for join operations
can also reduce storage space requirements
Kyanganda S.
181
Database Performance
CLUSTER DEFINITION
clustering is transparent to the user
no queries have to be modified
no applications need to be changed
tables are queried in the same way whether clustered or not
Kyanganda S.
182
Database Performance
DECIDING WHERE TO USE CLUSTERS
Each table can only reside in 1 cluster
At least one attribute in the cluster must be NOT NULL
Consider the query transactions in the system
How often is the query submitted?
How time critical is the query?
Whats the amount of data retrieved?
Kyanganda S.
183
Clustering Tables
Street
City
Postcode
22 Deer St
London
SW1 4EH
Branch
No
B005
163 Main St
Glasgow
G11 9QX
B003
Staff
No
SL21
SL41
SG37
SG14
SG5
First
Name
John
Julie
Ann
David
Susan
Branch Table
Last
name
White
Lee
Beech
Ford
Brand
Position
DOB Salary
Manager
Assistant
Assistant
Supervisor
Manager
310000
9000
12000
18000
24000
Staff Table
Kyanganda S.
184
Database Performance
CLUSTERING EXERCISE
STOCK
WAREHOUSE
3
PRODUCT
1000
Kyanganda S.
185
Database Performance
To speed up access time to data in these three tables
(WAREHOUSE, PRODUCT, STOCK) it is necessary to cluster
either STOCK around WAREHOUSE, or STOCK around
PRODUCT.
How do we decide which will be the most efficient?
For the purpose of this exercise we will assume that each
block can hold 100 records.
Kyanganda S.
186
Database Performance
If STOCK is clustered around PRODUCT
No of products = 1000. There will be 1 record for each
PRODUCT in each WAREHOUSE. Therefore each product
would have 3 records
Each block would contain 100/3 products, i.e. 33 products.
There would therefore be a 1 in 3 chance of accessing a
particular stock item by reading one block of data.
Kyanganda S.
187
Database Performance
If STOCK is clustered around WAREHOUSE
No of warehouses = _____. There will be ____ record for
each item of STOCK in each warehouse. Therefore each
warehouse would have ______ records. The records for each
warehouse would have to be stored across ______ blocks.
Access would therefore be faster if STOCK is clustered
around the product.
Kyanganda S.
188
Database Performance
SQL OPTIMISATION
Select *
from ...;
Kyanganda S.
DBMS
DATA FILES
189
Query Optimisation
Automatic query optimisation can dramatically
improve query execution time
e.g. Consider the simple SQL query
select s.student_no, s.student_name, c.course_name
from student s, course c
where s.course_id = c.course_id
and s.age > 25;
190
Example
1000 students of which only 100 are over the age of
25, and there are 50 courses
Alternative 1: Join first
read the 1000 students, read all courses 1000 times (once
for each student), construct an intermediate table of
1000 records (which may be too large to fit in memory)
restrict the result to those over the age of 25 (100 rows at
most)
project the result over the required attributes
Kyanganda S.
191
Example
Alternative: Restrict first
read 1000 tuples but restrict to those over the age of 25,
returning an intermediate table of only 100 rows - which
has a much better potential of being storable in main
memory
join the result with the course table, again returning an
intermediate table of only 100 rows
project the result over the required attributes
Obviously this version is BETTER!
Could be improved further by doing the projection before the join.
Kyanganda S.
192
the
Kyanganda S.
193
C
ICS 2415 Advanced Dbase Systems
194
Canonical Form
Canonical form
given a set Q of queries, and a notion of equivalence
between two queries q1 and q2 in set Q, then there exists
a subset C of Q, the set of canonical forms for Q, if and
only if every query q in Q is equivalent to only one query
c in C.
The query c is the canonical form of the query q
Kyanganda S.
195
Expression Transformation
Rules
Examples (not complete)
(A WHERE p1) WHERE p2 == A WHERE p1 and p2
(A PROJECT x,y) PROJECT y == A PROJECT y
(A UNION B) PROJECT x == (A PROJECT x) UNION (B
PROJECT x)
(A JOIN B) PROJECT x == (A PROJECT x1) JOIN (B PROJECT
x2)
A JOIN B == B JOIN A
(A JOIN B) JOIN C == A JOIN (B JOIN C)
(A JOIN B) PROJECT x = A PROJECT x
where x is FK from B to A
Kyanganda S.
196
Indexes
Other physical access paths
Distribution of data values
Clustering
197
Cost Based
Optimal rule based query may not in fact be optimal due to cost of
operating query, e.g. join order
Need to gather statistics
Kyanganda S.
198
Database Performance
OPTIMIZING PERFORMANCE
Performance can be regarded as a
balancing act between: access performance
update performance
ease of use/modification
Kyanganda S.
199
Benchmarking
Software and systems development projects include
performance evaluation work, but sometimes not
sufficient to prevent major performance problems
Benchmarking is a useful tool which can be used at
the prototyping stage to improve performance of
the DBMS application
There are many benchmarks available
Kyanganda S.
200
Database Benchmarking
A tool for comparing the performance of DBMS
summarise relative performance in a single figure
Kyanganda S.
201
Synthetic Workload
produce a simplified version of an application
use synthetically generated data with similar properties to real
system, e.g. a banking application
e.g. Transaction Processing Performance Council (TPC)
Kyanganda S.
202
Wisconsin Benchmark
First systematic benchmark definition
compares particular features of DBMS rather than a
simple overall performance metric
203
Wisconsin Benchmark
Straightforward to implement
Scalable, e.g. parallel architectures
Useful readily understandable results
Lack of highly skewed attribute distribution
Simple join queries
Kyanganda S.
204
TPC-C
Measures performance of a typical order entry
application
from initiation at a terminal until response arrives back
from server
benchmark encompasses time taken by server, network
and other system components
terminals emulated using a negative-exponential
transaction arrival distribution
Kyanganda S.
205
TPC-C Schema
Kyanganda S.
206
TPC-C
5 transactions covering
New order
A payment
Order status enquiry
A delivery
A stock level inquiry
Metric
number of New-Order transactions executed per minute
Kyanganda S.
207
TPC-App
Application server and web services benchmark
Further info on
www.tpc.org
Kyanganda S.
208
OO7
A benchmark for Object Database Systems
Examines the performance characteristics of different types
of retrieval/traversal, object creation/deletion and updates
and query processor
A number of sample implementations are provided
Based on a complex parts hierarchy
Further info
ftp.cs.wisc.edu
Kyanganda S.
209
OO7 Tests
Test 1:
Raw traversal speed, traversal with updates, operations
on manuals
Tests with/without full cache
Test 2:
Exact matches, range searches, path lookup, scan, make,
join
Test 3:
Insert/update a group of composite parts
Kyanganda S.
210
BUCKY
An Object-Relational Benchmark
Objective
To test the key features that add the object to object-relational
database systems, as defined by Stonebraker
Inheritance, Complex Objects, ADTs
Not triggers
211
BUCKY Schema
Kyanganda S.
212
BUCKY Queries
Aim is to test various object queries, involving
1. row types with inheritance
2. inter-object references
3. set-valued attributes
4. methods of row objects
5. ADT attributes and their methods
213
Summary
Database application performance can be improved by
Indexes
Clustering
De-normalisation
Query Optimisation
Kyanganda S.
214
Further Reading
Connelly and Begg, chapter 18, 21
Also information in OODB chapter on benchmarking
215
Kyanganda S.
216
Contents
Data Warehousing
OLAP
Data Mining
Further Reading
Kyanganda S.
217
Data Warehousing
OLTP (online transaction processing) systems
range in size from megabytes to terabytes
high transaction throughput
Kyanganda S.
218
Benefits
Potential high returns on investment
90% of companies in 1996 reported return of investment
(over 3 years) of > 40%
Competitive advantage
Data can reveal previously unknown, unavailable and
untapped information
Kyanganda S.
219
Comparison
OLTP
Data Warehouse
Detailed, lightly/highly
summarised data
Data is dynamic
Repetitive processing
Transaction driven
Analysis driven
Application oriented
Subject oriented
Strategic decisions
Large number of
clerical/operational users
Source:
Connolly and
Kyanganda
S. Begg p1153ICS 2415 Advanced Dbase Systems
220
Typical Architecture
Mainframe operational
n/w,h/w data
Warehouse mgr
Reporting query, app
development,EIS tools
Meta-data
Departmental
RDBMS data
Highly
summarized
Query
data
manager
Load
mgr
OLAP tools
Lightly summarized
data
Private data
Detailed data
DBMS
Warehouse mgr
Data-mining tools
External data
Archive/backup
221
Data Warehouses
Types of Data
Detailed
Summarised
Meta-data
Archive/Back-up
Kyanganda S.
222
Information Flows
Operational data
source 1
Warehouse Mgr
Metadata
Meta-flow
Inflow
Load
mgr
Lightly
summ.
Detailed data
Outflow
Query
manager
OLAP tools
Upflow
DBMS
Warehouse mgr
Downflow
Operational data
source n
Data-mining tools
Archive/backup
223
Kyanganda S.
224
225
Kyanganda S.
226
Dimensionality Modelling
Similar to E-R modelling but with constraints
composed of one fact table with a composite primary key
dimension tables have a simple primary key which corresponds
exactly to one foreign key in the fact table
uses surrogate keys based on integer values
Can efficiently and easily support ad-hoc end-user queries
Kyanganda S.
227
Star Schemas
The most common dimensional model
A fact table surrounded by dimension tables
Fact tables
contains FK for each dimension table
large relative to dimension tables
read-only
Dimension tables
reference data
query performance speeded up by denormalising into a
single dimension table
Kyanganda S.
228
Kyanganda S.
229
Kyanganda S.
230
Other Schemas
Snowflake schemas
variant of star schema
each dimension can have its own dimensions
Starflake schemas
hybrid structure
contains mixture of (denormalised) star and (normalised)
snowflake schemas
Kyanganda S.
231
OLAP
Online Analytical Processing
dynamic synthesis, analysis and consolidation of large
volumes of multi-dimensional data
normally implemented using specialized multi-dimensional
DBMS
a method of visualising and manipulating data with many interrelationships
Kyanganda S.
232
Kyanganda S.
233
OLAP Tools
Categorised according to architecture of underlying
database
Multi-dimensional OLAP
data typically aggregated and stored according to predicted usage
use array technology
Relational OLAP
use of relational meta-data layer with enhanced SQL
Kyanganda S.
234
MOLAP
RDB
Server
MOLAP
server
Request
Result
Load
Database/Application
Logic Layer
Kyanganda S.
Presentation
Layer
235
ROLAP
ROLAP
server
RDB
Server
Request
SQL
Result
Database
Layer
Kyanganda S.
Result
Application
Logic Layer
Presentation
Layer
236
MQE
RDB
Server
End-user
tools
SQL
Result
MOLAP
server
Request
Load
Kyanganda S.
Result
237
Data Mining
The process of extracting valid, previously unknown,
Kyanganda S.
238
Kyanganda S.
239
Kyanganda S.
240
Kyanganda S.
241
Techniques:
Classification
Value Prediction
Kyanganda S.
242
Yes
Customer age
> 25 years?
Rent property
No
Rent property
Yes
Buy property
Kyanganda S.
243
Techniques:
Demographic clustering
Neural clustering
Kyanganda S.
244
Segmentation: Scatterplot
Example
Kyanganda S.
245
Techniques
Association discovery
Sequential pattern discovery
Similar time sequence discovery
Kyanganda S.
246
Kyanganda S.
247
Deviation Detection:
Visualisation Example
Kyanganda S.
248
Kyanganda S.
249
Further Reading
Connolly and Begg, chapters 31 to 34.
W H Inmon, Building the Data Warehouse, New
York, Wiley and Sons, 1993.
Benyon-Davies P, Database Systems (2nd ed),
Macmillan Press, 2000, ch 34, 35 & 36.
Kyanganda S.
250
Kyanganda S.
251
Objectives
To investigate issues surrounding interoperability
To gain a basic understanding of XML and its
developments related to database systems
To gain a basic understanding of the use of XML
towards achieving interoperability
Kyanganda S.
252
Interoperability
IEEE (1990) Definition:
the ability of two or more systems or components to
exchange information and to use the information that
has been exchanged
IEEE Standard Computer Dictionary: A Compilation of IEEE
253
Features
Kyanganda S.
254
Kyanganda S.
255
256
EBCDIC
EBCDIC: /eb's*-dik/, /eb'see`dik/, or /eb'k*-dik/ n.
EBCDIC is the most common alternate character code but there are others.
http://www.cheverus.org/advanced/data/EBCDIC.html
Kyanganda S.
257
SCHEMA 2
author_surname char(50)
author_inits char(10)
title varchar(300)
title varchar(200)
keyword set(char(30))
Kyanganda S.
258
Kyanganda S.
Fine(amount, borrowed_id)
Loan(id, isbn, date_out, fine)
Charge(1.25, fine)
259
Complex Problems
Heterogeneous models
Need to relate model constructions to one another, for
example
relate classes in object-oriented to user-defined types in objectrelational
Or even more problematic, to tables in a relational database
Kyanganda S.
260
broken into:
stylesheet (XSL standard)
document type definition (DTD) for well-formed documents
document data
Kyanganda S.
261
</STAFFLIST>
Kyanganda S.
262
Sample DTD
<!ELEMENT STAFFLIST (STAFF)*>
<!ELEMENT STAFF (NAME, POSITION, DOB?, SALARY)>
<!ELEMENT NAME (FNAME, LNAME)>
<!ELEMENT FNAME (#PCDATA)>
<!ELEMENT LNAME (#PCDATA)>
<!ELEMENT POSITION (#PCDATA)>
<!ATTLIST STAFF branchNo CDATA #IMPLIED>
Kyanganda S.
263
Sample StyleSheet
<?xml version = 1.0?>
<xsl:stylesheet xmlns:xsl = http://www.w3.org/TR/WD-xsl>
<xsl:template match = />
<html><body>
<center><h2>DreamHome Estate agents</h2></center>
<table border = 1 bgcolor = #ffffff>
<tr>
<th>staffNo</th>
--- repeat for other column headings
<xsl:for-each select=STAFFLIST/STAFF>
<tr><xsl:value-of-select=STAFFNO/></td>
<tr><xsl:value-of-select=NAME/FNAME/></td></tr>
</xsl:for-each></table></body></html>
</xsl-stylesheet>
Kyanganda S.
264
Benefits of XML
Simplicity
Open standard and platform/vendor-independent
Extensibility
Reuse
Separation of content and presentation
Improved load balancing
Due to client side processing
Kyanganda S.
265
Benefits of XML
Support for integration of data from multiple
sources
Ability to describe data from a wide variety of
applications
More advanced search engines
XQuery
New opportunities
Kyanganda S.
266
XML Schema
<xsd:group-name =STAFFTYPE
<xsd:elementname=STAFF>
<xsd:complexType>
<xsd:sequence>
<xsd:element name = STAFFNO type=STAFFNOTYPE/>
<xsd:element name = NAME>
<xsd:complexType>
<xsd:sequence>
<xsd:element name = FNAME> type = xsd:string/>
<xsd:element name = LNAME> type = xsd:string/>
</xsd:sequence>
</xsd:complexType>
...
Kyanganda S.
267
XQuery
A query language for XML
e.g. List the staff at branch B005 with a salary
greater than 15000
FOR $S IN document(staff_list.xml)//STAFF
WHERE $S/SALARY > 15000 AND
$S/@branchNo = B005
RETURN $S/STAFFNO
Kyanganda S.
268
Kyanganda S.
269
Child
Element
(+ parent)
Document
CharData
Attribute
270
Course Grained
Approach
One table:
DocId
Name
Body
Kyanganda S.
271
Kyanganda S.
272
XML RDF
Resource Description Framework
XML Schema defines a grammar
therefore we have all the problems shown previously (e.g.
names)
RDF provides a way to encode domain models
an infrastructure that enables the encoding, exchange and reuse
of structured meta-data (W3C)
Defines semantics, syntax and structure
273
Property
a specific attribute which is used to describe a resource
Statement
a combination of a resource, a property and a value
usually known as the subject, predicate and object
e.g. The Author of http://www.dreamhome.co.uk/ staff_list.xml
is John White
Kyanganda S.
274
RDF Example
The statement would be defined in RDF (simplified)
as:
<?xml version="1.0"?>
<RDF>
<Description about="
http://www.myhome.net/staff_list.xml ">
<author>Fred Smith</author>
<created>25 May 2006</created>
</Description>
</RDF>
Kyanganda S.
275
Summary
XML is being increasingly used in data models, data
transmission and data integration
Interoperability is the key issue and a major research
area in database systems
XML and RDF have the potential as a stepping stone to
achieving this
Kyanganda S.
276
Further Reading
Connolly and Begg
chapter 30 (sections 30.1, 30.2 and 30.3) discusses XML and its
related technologies
Graves
Designing XML Databases, Prentice Hall.
XML Tutorial
www.w3cschools.com/xml
Kyanganda S.
277