Escolar Documentos
Profissional Documentos
Cultura Documentos
Database
It is often said that we live in an information society and that information is a very
valuable resource (or, as some people say, information is power). In this information
society, the term database has become a rather common term although its meaning
seems to have become somewhat vague as the importance of database systems has
grown. Some people use the term database of an organization to mean all the data in
the organization (whether computerized or not). Other people use the term to mean the
software that manages the data. We would use it to mean a collection of computerized
information such that it is available to many people for various uses.
“A database is a well organized collection of data that are related in a meaningful way
which can be accessed in different logical orders but are stored only once. The data in
the database is therefore integrated, structured, and shared.”
Or
For example, it is easy to find a book in a library because books are arranged in an
organized manner. The tools that help us
Store Data
The phone book is a good example of a simple database. For example, to find the
phone number of Mr. Jonathan Smith, living at No.5 4 thAvenue Apt 15, New York, NY,
one may use the following process: Get the phone book for the city of New York. Flip to
the section where the last name is ‘Smith’. More than one Mr. Smith may be listed. In
that case, look for Mr. Smith whose first name is Jonathan. Look for Jonathan Smith
living at No. 5 4 th Avenue Apt 15, New York, NY to get the corresponding phone
number. In the example above, the phone number can be found because the
information in the phone book was organized in a certain manner.
In the early days of computing, computers were mainly used for solving numerical
problems. Even in those days, it was observed that some of the tasks were common in
many problems that were being solved. It therefore was considered desirable to build
special subroutines which perform frequently occurring computing tasks such as
computing Sin(x). This lead to the development of mathematical subroutine libraries that
are now considered integral part of any computer system.
By the late fifties storage, maintenance and retrieval of non-numeric data had become
very important. Again, it was observed that some of the data processing tasks occurred
quite frequently e.g. sorting. This lead to the development of generalized routines for
some of the most common and frequently used tasks. A general routine by its nature
tends to be somewhat less efficient than a routine designed for a specific problem. In
the late fifties and early sixties, when the hardware costs were high, use of general
routines was not very popular since it became a matter of trade-off between hardware
and software costs. In the last two decades, hardware costs have gone down
dramatically while the cost of building software has gone up. It is no more a trade-off
between hardware and software costs since hardware is relatively cheap and building
reliable software is expensive.
Database management systems evolved from generalized routines for file processing
as the users demanded more extensive and more flexible facilities for managing data.
Database technology has undergone major changes during the 1970's. An interesting
history of database systems is presented by Fry and Sibley (1976).
Types of Databases
There are many techniques and technologies for managing
HDBMS (Hierarchical DBMS)
IMS
NDBMS (Network DBMS)
IDMS
RDBMS (Relational DBMS)
Oracle
DBMS: - Database Management Systems have been used for quite a while. Software
packages like dBase; Clipper, etc had facilities for managing data. The user had to take
care of two aspects – specify what data was to be retrieved and how to go and get the
data.
NDBMS: - Managed the data according to network models. IBM’s product IDMS is a
Network Database. It is also used in mainframe computers. Like the HDBMS (IMS), it is
still used in a few installations. Over a period of time disadvantages of the network
database technology prompted the search for an alternate solution. With more powerful
personal computer systems and the inherent limitations of the Hierarchical and Network
database models, the popularity of the Relational Database Model increased.
RDBMS: - Was invented by the team lead by Dr. Edmund F. Codd and funded by IBM
in the early 1970’s. The Relational Model is based on the principles of relational
algebra. This model also known as the Relational Database Management System is
very popular and is in use by a majority of the Fortune 500 companies. This model will
be used in great detail in this course. Many vendors sell products that conform to this
model. Some of these vendors are: Oracle, SQL Server, Sybase, Informix, Ingress,
Gupta SQL, DB2, and Microsoft Access.
ORDBMS: - This model has been proposed and is presently being implemented by
the Oracle Corporation. This model addresses the shortcomings of the RDBMS model.
The Object Relational Database Management System is not purely object oriented.
However, it is implemented via the same relational engine that drives the Relational
Database Management System.
Server: - This is usually a high powered computer. Typically it could consist of several
processors (2, 3 or 4), a very large amount of RAM (2GB or 4GB), a very large amount of hard
disk storage space (possibly a few hundred GB), high speed network data transfer rates, etc.
The server need not be from a specific vendor. It could be purchased from any vendor. The
server platform may be UNIX, Windows NT, AIX, Sun Solaris, Novell, etc.
Client: - These are usually personal computers. They have processing power of their own.
The client operating system may be Windows 95, Windows 98, Windows 2000, Windows NT
Workstation, Sun Sparc Station, Mac OS, etc.
• The server is powered on and is ready to receive connections / requests. The network
administrator configures the server.
• Each client is powered on and is required to provide user identification and authentication
information (such as a user ID and a password) to connect to the server. In a multiple server
environment, the client computer may also be required to specify the location of the server to
which it must connect. This setup creates the infrastructure for other programs to be used on
this configuration. In most environments, databases, e-mail servers, web servers are run on a
server computer. Clients use the resources offered by these databases, websites, etc.
DBMS
(Database Management System)
As discussed earlier, a database is a well organized collection of data. To be able to carry out
operations like insertion, deletion and retrieval, the database needs to be managed by a
substantial package of software. This software is usually called a Database Management
System (DBMS). The primary purpose of a DBMS is to allow a user to store, update and
retrieve data in abstract terms and thus make it easy to maintain and retrieve information from a
database. A DBMS relieves the user from having to know about exact physical representations
of data and having to specify detailed algorithms for storing, updating and retrieving data.
A DBMS is usually a very large software package that carries out many different tasks including
the provision of facilities to enable the user to access and modify information in the database.
The database is an intermediate link between the physical database, the computer and the
operating system, and on the other hand, the users. To provide the various facilities to different
types of users, a DBMS normally provides one or more specialized programming languages
often called Database Languages. Different DBMS provide different database languages
although a language called SQL has recently taken on the role of a de facto standard. Database
languages come in different forms. A language is needed to describe the database to the DBMS
as well as provide facilities for changing the database and for defining and changing physical
data structures. Another language is needed for manipulating and retrieving data stored in the
DBMS. These languages are called Data Description Languages (DDL) and Data Manipulation
Languages (DML) respectively.
Some DBMS packages are marketed by computer manufacturers that will run only on that manufacturer's
machines (e.g. IBM's IMS) but increasingly independent software houses are designing and selling DBMS
software that would run on several different types of machines (e.g. ORACLE).
Hierarchical Model
The hierarchical model is the basis of the oldest database management systems, which grew
out of early attempts to organize data for the U.S space program. Since these databases were
ad-hoc solution to immediate problems, they were created without the strong theoretical
foundations that later systems had. Their designers were familiar with file organizations and
data structures, and used these concepts to solve the immediate data representation problems
of users. The hierarchical model uses the tree as its basic data structure. Nodes of the trees in
the hierarchical model represent data records or record segments, which are the portions of the
data records. Relationships are represented as links or pointers between nodes.
The network uses a network or plex structure, which is data structure consisting of nodes and
branches. Unlike a tree, a plex structure allows a node to have more than one parent. The
nodes of the network represent records of various types. Relationships between records are
represented as links, which become pointers in the implementation.
Relational Model
The relational model was proposed by Codd in 1970 and continues to be the subject of much
research. It is now widely used by both mainframe and microcomputers-based DBMSs because
of its simplicity from user’s point of view and its power. The relation model uses the theory of
relations from mathematics and adapts it for use in database theory. In relation model both
entities and relationships are represented by relations which are physically represented as
tables or two-dimensional arrays, and attributes as columns of those tables.
Before going into the detail of the Relational and Object Relational Database Management
Systems we should have a knowledge of the Relational Design.
RDBMS
(Relational Database Management System)
In 1970 an IBM researcher named Ted Codd wrote a paper that described a new approach to
the management of “large shared data banks.” In his paper Codd identifies two objectives for
managing shared data. The first of these is data independence, which dictates that applications
using a database be maintained independent of the physical details by which the database
organizes itself. Second, he describes a series of rules to ensure that the shared data
is consistent by eliminating any redundancy in the database’s design. Codd’s paper deliberately
ignores any consideration of how his model might be implemented. He was attempting to define
an abstraction of the problem of information management: a general framework for thinking
about and working with information. The Relational Model Codd described had three parts: a
data model,
a means of expressing operations in a high-level language, and a set of design principles that
ensured the elimination of certain kinds of redundant data problems. Codd’s relational model
views data as being stored in tables containing a variable number of rows (or records), each of
which has a fixed number of columns. Something like a telephone directory or a registry of
births, deaths, and marriages, is a good analogy for a table. Each entry contains different
information but is identical in its form to all other entries of the same kind. The relational model
also describes a number of logical operations that could be performed over the data. These
operations included a means of getting a subset of rows (all names in the telephone directory
with the surname “Brown”), a subset of columns (just the name and number), or data from a
combination of tables (a person who is married to a person with a particular phone number). By
borrowing mathematical techniques from predicate logic, Codd was able to derive a series of
design principles that could be used to guarantee that the database’s structure was free of the
kinds of redundancy so problematic in other systems. Greatly expanded by later writers, these
ideas formed the basis of the theory of normal forms. Properly applied, the system of normal
form rules can ensure that the database’s logical design is free of redundancy and, therefore,
any possibility of anomalies in the data.
Relations
Attributes
Domains
Codd describe a data storage system that possessed three characteristics that were sorely needed at that
time:
3. Ad hoc query: - This characteristic would enable the user to tell the database which
data to retrieve without indicating how to accomplish the task.
Some time passed before a commercial product actually implemented some of the relational database
features that Codd described. Today many vendors sell Relational Database Management
Systems; some of the more well-known vendors are Oracle, Sybase, IBM, Informix, Microsoft, and
Computer Associates. Of these vendors, Oracle has emerged as the leader. The Oracle RDBMS engine
has been ported to more platforms than any other database product. Because of Oracle's multiplatform
support, many application software vendors have chosen Oracle as their database platform. And now,
Oracle Corporation has ported the same RDBMS engine to the desktop environment with its release of
Personal Oracle.
Relational Database Implementation
Today's relational databases implement a number of extremely useful features that
Codd did not mention in his original article. However, as of this writing, no commercially
available database is a full implementation of Codd's rules for relational databases.
During the early 1970s, several parallel research projects set out to implement a working
RDBMS. This turned out to be very hard. It wasn’t until the late 1980s that RDBMS products
worked acceptably in the kinds of high-end, online transactions processing applications served
so well by earlier technologies. Despite the technical shortcomings RDBMS technology
exploded in popularity because even the earliest products made it cheaper, quicker, and easier
to build information systems. For an increasing number of applications, economics favored
spending more on hardware and less on people. RDBMS technology made it possible to
develop information systems that, while desirable from a business management point of view,
had been deemed too expensive. To emphasize the difference between the relational and pre-
relational approaches, a four hundred line C program can be replaced by the SQL-92
expression in Listing below.
Name VARCHAR(128),
DOB DATE,
Salary DECIMAL(10,2),
Address VARCHAR(128)
);
The code in list implements considerably more functionality than a C program because RDBMS
provide transactional guarantees for data changes. They automatically employ locking, logging,
and backup and recovery facilities to guarantee the integrity of the data they store. Also,
RDBMS provide elaborate security features. Different tables in the same database can be made
accessible only by different groups of users. All of this built-in functionality means that
developers focus more of their effort on their system’s functionality and less on complex
technical details.
Every entity has a set of attributes that uniquely identify each row in that entity.
Architecture of RDBMS
There are three standard levels for Relational Database management system
Conceptual level
External level
Internal level
External Level
The external level consist of many different external views or external models of the database. Each user
has a model of the real world represented in a form that is suitable for that user. A particular user interacts
with only certain entities in the real world and is interested in only some of their attributes and
relationships. Therefore, that user’s view contain only information about those aspects of the real world.
Conceptual Level
The conceptual level consists of base tables, which are physical stored tables. These tables are
created by Database Administrator using a CREATE Table command. Once the table is created
the DBA can create “VIEW” for Users. A view may be a subset of single base table or may be
created by combining base tables or performing other Operations on them.
Internal Level
The level which covers the physical implementation of the database. It includes the data
structure and file organization used to store data on physical storage devices. The internal
schema, written in DDL, is a complete description of the internal model. It includes such items
as how data is represented, how records are sequenced, what index exist, what pointers exist.
An internal record is a single stored record. It is the unit that is passed up to the internal level.
Primary Key
Every entity has a set of attributes that uniquely define an instance of that entity. This set of attributes is
called the primary key. The primary key may be composed of a single attribute
A basic tenet of relational theory is that no part of the primary key can be null. If you think about that idea
for a moment, it seems intuitive: The primary key must uniquely identify each row in an entity; therefore, if
the primary key (or a part of it) is null, it wouldn't be able to identify anything.
Data Integrity
According to relational theory, every entity has a set of attributes that uniquely identify each row in that
entity. Relational theory also states that no duplicate rows can exist in a table.
Referential Integrity
Tables are related to one another through foreign keys. A foreign key is one table column for which the
set of possible values is found in the primary key of a second table. Referential integrity is achieved when
the set of values in a foreign key column is restricted to the primary key that it references or to the null
value. Once the database designer declares primary and foreign keys, enforcing data and referential
integrity is the responsibility of the DBMS.
INTRODUCTION
The normal forms defined in relational database theory represent guidelines for record design. The
guidelines corresponding to first through fifth normal forms are presented here, in terms that do not
require an understanding of relational theory. The design guidelines are meaningful even if one is not
using a relational database system. We present the guidelines without referring to the concepts of the
relational model in order to emphasize their generality, and also to make them easier to understand. Our
presentation conveys an intuitive sense of the intended constraints on record design, although in its
informality it may be imprecise in some technical details. A comprehensive treatment of the subject is
provided by Date.
The normalization rules are designed to prevent update anomalies and data inconsistencies. With respect
to performance tradeoffs, these guidelines are biased toward the assumption that all non-key fields will be
updated frequently. They tend to penalize retrieval, since data which may have been retrievable from one
record in an un-normalized design may have to be retrieved from several records in the normalized form.
There is no obligation to fully normalize all records when actual performance requirements are taken into
account.
FIRST NORMAL FORM
First normal form deals with the "shape" of a record type. Under first normal form, all occurrences of a
record type must contain the same number of fields. First normal form excludes variable repeating fields
and groups. This is not so much a design guideline as a matter of definition. Relational database theory
doesn't deal with records having a variable number of fields.
Second and third normal forms deal with the relationship between non-key and key fields. Under second
and third normal forms, a non-key field must provide a fact about the key, us the whole key, and nothing
but the key. In addition, the record must satisfy first normal form. We deal now only with "single-valued"
facts. The fact could be a one-to-many relationship, such as the department of an employee, or a one-to-
one relationship, such as the spouse of an employee. Thus the phrase "Y is a fact about X" signifies a
one-to-one or one-to-many relationship between Y and X. In the general case, Y might consist of one or
more fields, and so might X. In the following example, QUANTITY is a fact about the combination of
PART and WAREHOUSE.
Second Normal Form
Second normal form is violated when a non-key field is a fact about a subset of a key. It is only relevant
when the key is composite, i.e., consists of several fields. Consider the following inventory record:
---------------------------------------------------
| PART | WAREHOUSE | QUANTITY | WAREHOUSE-ADDRESS |
====================-------------------------------
The key here consists of the PART and WAREHOUSE fields together, but WAREHOUSE-ADDRESS is a
fact about the WAREHOUSE alone. The basic problems with this design are:
The warehouse address is repeated in every record that refers to a part stored in that warehouse.
If the address of the warehouse changes, every record referring to a part stored in that
warehouse must be updated.
Because of the redundancy, the data might become inconsistent, with different records showing
different addresses for the same warehouse.
If at some point in time there are no parts stored in the warehouse, there may be no record in
which to keep the warehouse's address.
To satisfy second normal form, the record shown above should be decomposed into (replaced by) the two
records:
-------------------------------
---------------------------------
| PART | WAREHOUSE | QUANTITY | | WAREHOUSE |
WAREHOUSE-ADDRESS |
====================-----------
=============--------------------
When a data design is changed in this way, replacing unnormalized records with normalized records, the
process is referred to as normalization. The term "normalization" is sometimes used relative to a
particular normal form. Thus a set of records may be normalized with respect to second normal form but
not with respect to third. The normalized design enhances the integrity of the data, by minimizing
redundancy and inconsistency, but at some possible performance cost for certain retrieval applications.
Consider an application that wants the addresses of all warehouses stocking a certain part. In the
unnormalized form, the application searches one record type. With the normalized design, the application
has to search two record types, and connect the appropriate pairs.
Third Normal Form
in this way, replacing unnormalized records with normalized records, the process is referred to as
normalization. The term "normalization" is sometimes used relative to a particular normal form. Thus a set
of records may be normalized with respect to second normal form but not with respect to third. The
normalized design enhances the integrity of the data, by minimizing redundancy and inconsistency, but at
some possible performance cost for certain retrieval applications. Consider an application that wants the
addresses of all warehouses stocking a certain part. In the unnormalized form, the application searches
one record type. With the normalized design, the application has to search two record types, and connect
the appropriate pairs.
Third Normal Form
Third normal form is violated when a non-key field is a fact about another non-key field, as in
------------------------------------
| EMPLOYEE | DEPARTMENT | LOCATION |
============------------------------
The EMPLOYEE field is the key. If each department is located in one place, then the LOCATION field is a
fact about the DEPARTMENT -- in addition to being a fact about the EMPLOYEE. The problems with this
design are the same as those caused by violations of second normal form:
The department's location is repeated in the record of every employee assigned to that
department.
If the location of the department changes, every such record must be updated.
Because of the redundancy, the data might become inconsistent, with different records showing
different locations for the same department.
If a department has no employees, there may be no record in which to keep the department's
location.
To satisfy third normal form, the record shown above should be decomposed into the two records:
------------------------- -------------------------
| EMPLOYEE | DEPARTMENT | | DEPARTMENT | LOCATION |
============------------- ==============-----------
To summarize, a record is in second and third normal forms if every field is either part of the key or
provides a (single-valued) fact about exactly the whole key and nothing else.
Functional Dependencies
In relational database theory, second and third normal forms are defined in terms of functional
dependencies, which correspond approximately to our single-valued facts. A field Y is "functionally
dependent" on a field (or fields) X if it is invalid to have two records with the same X-value but different Y-
values. That is, a given X-value must always occur with the same Y-value. When X is a key, then all fields
are by definition functionally dependent on X in a trivial way, since there can't be two records having the
same X value.
There is a slight technical difference between functional dependencies and single-valued facts as we
have presented them. Functional dependencies only exist when the things involved have unique and
singular identifiers (representations). For example, suppose a person's address is a single-valued fact,
i.e., a person has only one address. If we don't provide unique identifiers for people, then there will not be
a functional dependency in the data:
----------------------------------------------
| PERSON | ADDRESS |
-------------+--------------------------------
| John Smith | 123 Main St., New York |
| John Smith | 321 Center St., San Francisco |
----------------------------------------------
Although each person has a unique address, a given name can appear with several different addresses.
Hence we do not have a functional dependency corresponding to our single-valued fact.
Similarly, the address has to be spelled identically in each occurrence in order to have a functional
dependency. In the following case the same person appears to be living at two different addresses, again
precluding a functional dependency.
---------------------------------------
| PERSON | ADDRESS |
-------------+-------------------------
| John Smith | 123 Main St., New York |
| John Smith | 123 Main Street, NYC |
---------------------------------------
We are not defending the use of non-unique or non-singular representations. Such practices often lead to
data maintenance problems of their own. We do wish to point out, however, that functional dependencies
and the various normal forms are really only defined for situations in which there are unique and singular
identifiers. Thus the design guidelines as we present them are a bit stronger than those implied by the
formal definitions of the normal forms.
For instance, we as designers know that in the following example there is a single-valued fact about a
non-key field, and hence the design is susceptible to all the update anomalies mentioned earlier.
------------------------------------------------------
| EMPLOYEE | FATHER | FATHER'S-ADDRESS
|
|============------------+----------------------------
| Art Smith | John Smith | 123 Main St., New York
|
| Bob Smith | John Smith | 123 Main Street, NYC
|
| Cal Smith | John Smith | 321 Center St., San Francisco
|
------------------------------------------------------
However, in formal terms, there is no functional dependency here between FATHER'S-ADDRESS and
FATHER, and hence no violation of third normal form.
Fourth and fifth normal forms deal with multi-valued facts. The multi-valued fact may correspond to a
many-to-many relationship, as with employees and skills, or to a many-to-one relationship, as with the
children of an employee (assuming only one parent is an employee). By "many-to-many" we mean that an
employee may have several skills, and a skill may belong to several employees.
Note that we look at the many-to-one relationship between children and fathers as a single-valued fact
about a child but a multi-valued fact about a father.
In a sense, fourth and fifth normal forms are also about composite keys. These normal forms attempt to
minimize the number of fields involved in a composite key, as suggested by the examples to follow.
Fourth Normal Form
Under fourth normal form, a record type should not contain two or more independent multi-valued facts
about an entity. In addition, the record must satisfy third normal form.
Consider employees, skills, and languages, where an employee may have several skills and several
languages. We have here two many-to-many relationships, one between employees and skills, and one
between employees and languages. Under fourth normal form, these two relationships should not be
represented in a single record such as
-------------------------------
| EMPLOYEE | SKILL | LANGUAGE |
===============================
-------------------- -----------------------
| EMPLOYEE | SKILL | | EMPLOYEE | LANGUAGE |
==================== =======================
Note that other fields, not involving multi-valued facts, are permitted to occur in the record, as in the case
of the QUANTITY field in the earlier PART/WAREHOUSE example.
The main problem with violating fourth normal form is that it leads to uncertainties in the maintenance
policies. Several policies are possible for maintaining two independent multi-valued facts in one record:
(1) A disjoint format, in which a record contains either a skill or a language, but not both:
-------------------------------
| EMPLOYEE | SKILL | LANGUAGE |
|----------+-------+----------|
| Smith | cook | |
| Smith | type | |
| Smith | | French |
| Smith | | German |
| Smith | | Greek |
-------------------------------
This is not much different from maintaining two separate record types. (We note in passing that such a
format also leads to ambiguities regarding the meanings of blank fields. A blank SKILL could mean the
person has no skill, or the field is not applicable to this employee, or the data is unknown, or, as in this
case, the data may be found in another record.)
-------------------------------
| EMPLOYEE | SKILL | LANGUAGE |
|----------+-------+----------|
| Smith | cook | French |
| Smith | type | German |
| Smith | | Greek |
-------------------------------
(c) Unrestricted:
-------------------------------
| EMPLOYEE | SKILL | LANGUAGE |
|----------+-------+----------|
| Smith | cook | French |
| Smith | type | |
| Smith | | German |
| Smith | type | Greek |
-------------------------------
(3) A "cross-product" form, where for each employee, there must be a record for every possible pairing of
one of his skills with one of his languages:
-------------------------------
| EMPLOYEE | SKILL | LANGUAGE |
|----------+-------+----------|
| Smith | cook | French |
| Smith | cook | German |
| Smith | cook | Greek |
| Smith | type | French |
| Smith | type | German |
| Smith | type | Greek |
-------------------------------
Other problems caused by violating fourth normal form are similar in spirit to those mentioned earlier for
violations of second or third normal form. They take different variations depending on the chosen
maintenance policy:
If there are repetitions, then updates have to be done in multiple records, and they could become
inconsistent.
Insertion of a new skill may involve looking for a record with a blank skill, or inserting a new
record with a possibly blank language, or inserting multiple records pairing the new skill with
some or all of the languages.
Deletion of a skill may involve blanking out the skill field in one or more records (perhaps with a
check that this doesn't leave two records with the same language and a blank skill), or deleting
one or more records, coupled with a check that the last mention of some language hasn't also
been deleted.
1 Independence
We mentioned independent multi-valued facts earlier, and we now illustrate what we mean in terms of the
example. The two many-to-many relationships, employee: skill and employee: language, are
"independent" in that there is no direct connection between skills and languages. There is only an indirect
connection because they belong to some common employee. That is, it does not matter which skill is
paired with which language in a record; the pairing does not convey any information. That's precisely why
all the maintenance policies mentioned earlier can be allowed.
In contrast, suppose that an employee could only exercise certain skills in certain languages. Perhaps
Smith can cook French cuisine only, but can type in French, German, and Greek. Then the pairings of
skills and languages becomes meaningful, and there is no longer an ambiguity of maintenance policies. In
the present case, only the following form is correct:
-------------------------------
| EMPLOYEE | SKILL | LANGUAGE |
|----------+-------+----------|
| Smith | cook | French |
| Smith | type | French |
| Smith | type | German |
| Smith | type | Greek |
-------------------------------
Thus the employee: skill and employee: language relationships are no longer independent. These records
do not violate fourth normal form. When there is interdependence among the relationships, then it is
acceptable to represent them in a single record.
2 Multivalued Dependencies
For readers interested in pursuing the technical background of fourth normal form a bit further, we
mention that fourth normal form is defined in terms of multivalued dependencies, which correspond to our
independent multi-valued facts. Multivalued dependencies, in turn, are defined essentially as relationships
which accept the "cross-product" maintenance policy mentioned above. That is, for our example, every
one of an employee's skills must appear paired with every one of his languages. It may or may not be
obvious to the reader that this is equivalent to our notion of independence: since every possible pairing
must be present, there is no "information" in the pairings. Such pairings convey information only if some
of them can be absent, that is, only if it is possible that some employee cannot perform some skill in some
language. If all pairings are always present, then the relationships are really independent.
We should also point out that multivalued dependencies and fourth normal form apply as well to
relationships involving more than two fields. For example, suppose we extend the earlier example to
include projects, in the following sense:
Fifth Normal Form
Fifth normal form deals with cases where information can be reconstructed from smaller pieces of
information that can be maintained with less redundancy. Second, third, and fourth normal forms also
serve this purpose, but fifth normal form generalizes to cases not covered by the others.
We will not attempt a comprehensive exposition of fifth normal form, but illustrate the central concept with
a commonly used example, namely one involving agents, companies, and products. If agents represent
companies, companies make products, and agents sell products, then we might want to keep a record of
which agent sells which product for which company. This information could be kept in one record type
with three fields:
-----------------------------
| AGENT | COMPANY | PRODUCT |
|-------+---------+---------|
| Smith | Ford | car |
| Smith | GM | truck |
-----------------------------
This form is necessary in the general case. For example, although agent Smith sells cars made by Ford
and trucks made by GM, he does not sell Ford trucks or GM cars. Thus we need the combination of three
fields to know which combinations are valid and which are not.
But suppose that a certain rule was in effect: if an agent sells a certain product, and he represents a
company making that product, then he sells that product for that company.
-----------------------------
| AGENT | COMPANY | PRODUCT |
|-------+---------+---------|
| Smith | Ford | car |
| Smith | Ford | truck |
| Smith | GM | car |
| Smith | GM | truck |
| Jones | Ford | car |
-----------------------------
In this case, it turns out that we can reconstruct all the true facts from a normalized form consisting of
three separate record types, each containing two fields:
---------------------
These three record types are in fifth normal form, whereas the corresponding three-field record shown
previously is not.
Roughly speaking, we may say that a record type is in fifth normal form when its information content
cannot be reconstructed from several smaller record types, i.e., from record types each having fewer
fields than the original record. The case where all the smaller records have the same key is excluded. If a
record type can only be decomposed into smaller records which all have the same key, then the record
type is considered to be in fifth normal form without decomposition. A record type in fifth normal form is
also in fourth, third, second, and first normal forms.
Fifth normal form does not differ from fourth normal form unless there exists a symmetric constraint such
as the rule about agents, companies, and products. In the absence of such a constraint, a record type in
fourth normal form is always in fifth normal form.
One advantage of fifth normal form is that certain redundancies can be eliminated. In the normalized
form, the fact that Smith sells cars is recorded only once; in the un-normalized form it may be repeated
many times.
It should be observed that although the normalized form involves more record types, there may be fewer
total record occurrences. This is not apparent when there are only a few facts to record, as in the example
shown above. The advantage is realized as more facts are recorded, since the size of the normalized files
increases in an additive fashion, while the size of the un-normalized file increases in a multiplicative
fashion. For example, if we add a new agent who sells x products for y companies, where each of these
companies makes each of these products, we have to add x+y new records to the normalized form, but
xy new records to the un-normalized form.
It should be noted that all three record types are required in the normalized form in order to reconstruct
the same information. From the first two record types shown above we learn that Jones represents Ford
and that Ford makes trucks. But we can't determine whether Jones sells Ford trucks until we look at the
third record type to determine whether Jones sells trucks at all.
The following example illustrates a case in which the rule about agents, companies, and products is
satisfied, and which clearly requires all three record types in the normalized form. Any two of the record
types taken alone will imply something untrue.
-----------------------------
| AGENT | COMPANY | PRODUCT |
|-------+---------+---------|
| Smith | Ford | car |
| Smith | Ford | truck |
| Smith | GM | car |
| Smith | GM | truck |
| Jones | Ford | car |
| Jones | Ford | truck |
| Brown | Ford | car |
| Brown | GM | car |
| Brown | Toyota | car |
| Brown | Toyota | bus |
-----------------------------
------------------- --------------------- -------------------
Observe that:
Jones sells cars and GM makes cars, but Jones does not represent GM.
Brown represents Ford and Ford makes trucks, but Brown does not sell trucks.
Brown represents Ford and Brown sells buses, but Ford does not make buses.
Fourth and fifth normal forms both deal with combinations of multivalued facts. One difference is that the
facts dealt with under fifth normal form are not independent, in the sense discussed earlier. Another
difference is that, although fourth normal form can deal with more than two multivalued facts, it only
recognizes them in pair wise groups. We can best explain this in terms of the normalization process
implied by fourth normal form. If a record violates fourth normal form, the associated normalization
process decomposes it into two records, each containing fewer fields than the original record. Any of this
violating fourth normal form is again decomposed into two records, and so on until the resulting records
are all in fourth normal form. At each stage, the set of records after decomposition contains exactly the
same information as the set of records before decomposition.
In the present example, no pairwise decomposition is possible. There is no combination of two smaller
records which contains the same total information as the original record. All three of the smaller records
are needed. Hence an information-preserving pairwise decomposition is not possible, and the original
record is not in violation of fourth normal form. Fifth normal form is needed in order to deal with the
redundancies in this case.
UNAVOIDABLE REDUNDANCIES
Normalization certainly doesn't remove all redundancies. Certain redundancies seem to be unavoidable,
particularly when several multivalued facts are dependent rather than independent. In the example
shown, it seems unavoidable that we record the fact that "Smith can type" several times. Also, when the
rule about agents, companies, and products is not in effect, it seems unavoidable that we record the fact
that "Smith sells cars" several times.
INTER-RECORD REDUNDANCY
The normal forms discussed here deal only with redundancies occurring within a single record type. Fifth
normal form is considered to be the "ultimate" normal form with respect to such redundancies.
Other redundancies can occur across multiple record types. For the example concerning employees,
departments, and locations, the following records are in third normal form in spite of the obvious
redundancy:
------------------------- -------------------------
| EMPLOYEE | DEPARTMENT | | DEPARTMENT | LOCATION |
============------------- ==============-----------
-----------------------
| EMPLOYEE | LOCATION |
============-----------
In fact, two copies of the same record type would constitute the ultimate in this kind of undetected
redundancy.
Inter-record redundancy has been recognized for some time , and has recently been addressed in terms
of normal forms and normalization .
CONCLUSION
While we have tried to present the normal forms in a simple and understandable way, we are by no
means suggesting that the data design process is correspondingly simple. The design process involves
many complexities which are quite beyond the scope of this paper. In the first place, an initial set of data
elements and records has to be developed, as candidates for normalization. Then the factors affecting
normalization have to be assessed:
Single-valued vs. multi-valued facts.
Dependency on the entire key.
Independent vs. dependent facts.
The presence of mutual constraints.
The presence of non-unique or non-singular representations.
And, finally, the desirability of normalization has to be assessed, in terms of its performance impact on
retrieval applications.
You communicate with Personal Oracle through Oracle's version of the Structured Query Language
(SQL, usually pronounced sequel). SQL is a nonprocedural language; unlike C or COBOL in which you
must describe exactly how to access and manipulate data, SQL specifies what to do. Internally, Oracle
determines how to perform the request. SQL exists as an ANSI standard as well as an industry standard.
Oracle's implementation of SQL adheres to Level 2 of the ANSI X3.135-1989/ISO 9075-1989 standard
with full implementation of the Integrity Enhancement Feature. Oracle (as well as other database vendors)
provides many extensions to ANSI SQL.
The Oracle DBMS provides a number of different data types for the storage of the
different forms of data in a manner that is most suitable to the manipulations that are
likely to be performed. As part of the Data Modeling process it is important that the most
appropriate data types are identified. Where there is any doubt (e.g. storing a year as a
number or as a date), the data type chosen should reflect the way the data will be used
most often. It is possible to inter-convert between data types, but this will reduce the
efficiency of queries. The use of these data types in table design is "Manipulating Oracle
Tables".
The available Oracle data types the most widely used are these:
VARCHAR2
CHAR
NUMBER
DATE
Long
VARCHAR2
This data type should be used for any columns which may contain characters. This
includes alphabetic letters, together with _, -, !, ?, + etc. and numbers as a character
representation (i.e. they look like numbers when displayed, but they cannot be
subjected to arithmetic manipulation. Similarly an attempt to do a greater than (>), or
less than (<) operation will not have the correct arithmetic result- VARCHAR2 values
are compared character by character up to the first character that differs. Whichever
value has the "greater" character in that position is considered "greater". Characters are
compared via the ASCII character set with the largest ASCII value (i.e. 255) is
considered greatest.) VARCHAR2columns may be up to 2000 characters wide.
CHAR
This data type is now effectively replaced by VARCHAR2 - CHAR should no longer be
used, although is still recognized and valid. The CHAR data type is fixed length up to a
maximum of 255 characters, whereas VARCHAR2 is of variable length up to 2000
characters. The variable length of VARCHAR2 gives it significant storage and
performance advantages over CHAR.
NUMBER
Any kind of number; positive, negative, integer or real. Up to 22 digits may be entered.
For comparison, a larger value is considered greater than a smaller value, with all
negative values smaller than all positive values.
DATE
A special data type with some of the characteristics of both Character and Number
Data-types. It is used for the storage of date and timeinformation. The operators =, > or
<>
the date value, on the right, must appear in quotes. For comparison, a later date is
considered greater than an earlier date. Oracle dates must lie in the range 1st January,
4712 BC to 31st December, 4712 AD.
Digit Day followed by
Letter Month and
Digit Year
LONG
Importantly, there are significant restrictions on the use of character information stored in LONG data
types. While the INSERT and UPDATEstatements can be used to insert and modify LONG data types,
and the SELECT statement to retrieve this data, no functions can be used to manipulate a retrieved a
LONG data type and a LONG column can never be used in a WHERE clause.
When retrieving LONG data types the display length of the column is set to 80 characters, and anything
stored in the column beyond this will be truncated. The following command is required to extend the
display length of a LONG field, within SQL*Plus:
Above are the general use data types in oracle, after similar with the data Types of oracle the next step in
sense of RDBMS is that to clear knowledge of Creating the tables In Oracle
DBMS Vs RDBMS
The characteristic that differentiates a DBMS from an RDBMS is that the RDBMS provides a set-oriented
database language. For most RDBMSs, this set-oriented database language is SQL. Set oriented means
that SQL processes sets of data in groups. Two standards organizations, the American National
Standards Institute (ANSI) and the International Standards Organization (ISO), currently promote SQL
standards to industry. The ANSI-92 standard is the standard for the SQL used throughout this book.
Although these standard-making bodies prepare standards for database system designers to follow, all
database products differ from the ANSI standard to some degree. In addition, most systems provide some
proprietary extensions to SQL that extend the language into a true procedural language.
Starting in the late 1980s, several deficiencies in relational DBMS products began receiving a lot of
attention. The first deficiency is that the dominant relational language, SQL-92, is limiting in several
important respects. For instance, SQL-92 supports a restricted set of built-in types that accommodate
only numbers and strings, but many database applications began to include deal with complex objects
such as geographic points, text, and digital signal data. A related problem concerns how this data is used.
Conceptually simple questions involving complex data structures turn into lengthy SQL-92 queries.
Object-Oriented DBMS
Object-Oriented Database Management Systems (OODBMS) are an extension of OO
programming language techniques into the field of persistent data management. For
many applications, the performance, flexibility, and development cost of OODBMS are
significantly better than RDBMS or ORDBMS. The chief advantage of OODBMS lies in
the way they can achieve a tighter integration between OO languages and the DBMS.
Indeed, the main standards body for OODBMS, the Object Database Management
Group (ODMG) defines an OODBMS as a system that integrates database capabilities
with object-oriented programming language capabilities. The idea behind this is that so
far as an application developer is concerned, it would be useful to ignore not only
questions of how an object is implemented behind its interface, but also how it is stored
and retrieved.
Regrettably, much of the considerable energy of the OODBMS community has been
expended relearning the lessons of twenty years ago. First, OODBMS vendors have
rediscovered the difficulties of tying database design too closely to application design.
Maintaining and evolving an OODBMS-based information system is an arduous
undertaking. Second, they relearned that declarative languages such as SQL-92 bring
such tremendous productivity gains that organizations will pay for the additional
computational resources they require. You can always buy hardware, but not time.
Third, they re-discovered the fact that a lack of a standard data model leads to design
errors and inconsistencies. In spite of these shortcomings OODBMS technology
provides effective solutions to a range of data management problems. Many ideas
pioneered by OODBMS have proven themselves to be very useful and are also found in
ORDBMS. Object-relational systems include features such as complex object
extensibility, encapsulation, inheritance, and better interfaces to OO languages.
ORDBMS
(Object Relational Database Management System)
ORDBMS Evolution from RDBMS
One of the most popular sort algorithms is a called insertion sort. Insertion sort is an O
(N2) algorithm. Roughly speaking, the time it takes to sort an array of records increases
with the square of the number of records involved, so it should only be used for record
sets containing less than about 25 rows. But insertion sort is extremely efficient when
the input is almost in sorted order, such as when you are sorting the result of an index
scan. List presents an implementation of the basic insertion sort algorithm for an array
of integers.
{ integer nTypeTemp;
nSizeArray = arTypeInput[].getSize();
vTypeTemp = arTypeInput[nOuter];
nInner = nOuter - 1;
arTypeInput[nInner+1] =
arTypeInput[nInner];
nInner--;
arTypeInput[nInner+1] = vTypeTemp;
}}
Sorting algorithms such as this can be generalized to make them work with any data
type. A generalized version of this insert sort algorithm appears in list, as shown above.
All this algorithm requires is logic to compare two type instances. If one value is greater,
the algorithm swaps the two values. In the generalized version of the algorithm,
a function pointer is passed into the sort as an argument. A function pointer is simply a
reference to a memory address where the Compare() function’s actual implementation
can be found. At the critical point, when the algorithm decides whether to swap two
values, it passes the data to this function and makes its decision based on the function’s
return result.
Type vTypeTemp;
nSizeArray = arTypeInput[].getSize();
vTypeTemp = arTypeInput[nOuter];
nInner = nOuter - 1;
Compare(arTypeInput[nInner],vTypeTemp) > 0 )
{
Swap(arTypeInput[nInner+1],arTypeInput[nInner]);
nInner--;
Swap(arTypeInput[nInner+1], vTypeTemp);
}}
Note how the functionality of the Swap() operation is something that can be handled by
IDS without requiring a type specific routine. The ORDBMS knows how big the object
being swapped is, and whether the object is in memory or is written out to disk. To use
the generalized algorithm in the List, all that IDS needs is the Compare() logic. The
ORDBMS handles the looping, branching, and exception handling. Almost all sorting
algorithms involve looping and branching around a Compare(), as does B-Tree indexing
and aggregate algorithms such as MIN() and MAX(). Part of the process of extending
the ORDBMS framework with new data types involves creating functions such as
Compare() that IDS can use to manage instances of the type. All data management
operations implemented in the ORDBMS is generalized in this way. In an RDBMS, the
number of data types was small so that the Compare() routines for each could be hard-
coded within the engine.
switch(pRecKey->typenum)
case IFX_INT_TYPE:
case IFX_CHAR_TYPE:
InsertSort(parRecords, IntCompare);
break;
case IFX_FLOAT_TYPE:
InsertSort(parRecords, DoubleCompare);
break;
default:
break;
}}
To turn an RDBMS into an ORDBMS, you need to modify the code shown in List to
allow the engine to access Compare() routines other than the ones it has built-in. If the
data type passed as the second argument is one of the SQL-92 types, the sorting
function proceeds as it did before. But if the data type is not one of the SQL-92 types,
the ORDBMS assumes it is being asked to sort an extended type Every user-defined
function embedded within the ORDBMS is recorded in its system catalogs, which are
tables that the DBMS uses to store information about databases. When asked to sort a
pile of records using a user-defined type, the ORDBMS looks in these system catalogs
to find a user-defined function called compare that takes two instances of the type and
returns an INTEGER. If such a function exists, the ORDBMS uses it in place of a built-in
logic. If the function is not found, IDS generates an exception. The listing below shows
the modified sorting facility.
switch(pRecKey->typenum)
{
case IFX_INT_TYPE:
case IFX_CHAR_TYPE:
InsertSort(parRecords, IntCompare);
break;
case IFX_FLOAT_TYPE:
break;
..
if( (pUDRCompare=udr_lookup(pRecKey->typenum,
"Compare")) ==NULL)
else
break;
}}
Using the ORDBMS generalization of the sort utility has several implications:
When you take a database application developed to use the previous generation
of RDBMS technology and upgrade to an ORDBMS, you should see no changes
at all. The size of the ORDBMS executable is slightly increased, and there are
some new directories and install options, but if all you do is to run the RDBMS
application on the ORDBMS, the new code and facilities are never invoked.
This scheme implies extensive modifications to the RDBMS code. You not only
need to find every place in the engine that such modifications are necessary, but
also need to provide the infrastructure in the engine to allow the extensions to
execute within it.
Finally, you should know how general such an extensibility mechanism is. As long
as you can specify the structure of your data type, and define an algorithm to
compare one instance of the type with another, you can use the engine’s facilities
to sort or index instances of the type and you can use OR-SQL to specify how
you want this done. In practice, renovating an RDBMS in this manner is an
incredibly complex and demanding undertaking.
Features of ORDBMS
ORDBMS synthesize the features of RDBMS with the best ideas of OODBMS.
Although ORDBMS reuse the relational model as SQL interprets it, the OR data
model is opened up in novel ways. New data types and functions can be
implemented using general-purpose languages such as C and Java. In other
words, ORDBMS allow developers to embed new classes of data objects into the
relational data model abstraction.
);
An object-relational table is structurally very similar to its relational counterpart and the
same data integrity and physical organization rules can be enforced over it. The
difference between object-relational and relational tables can be seen in the section
stipulating column types. In the object-relational table, readers familiar with RDBMS
should recognize the DATE type, but the other column types are completely new. From
an object-oriented point of view, these types correspond to class names, which are
software modules that encapsulate state and (as we shall see) behavior.
SELECT E.Pager_Number,
E.Pass_Code,
Print(E.Name) ||
job'
FROM Temporary_Employees E
'60 miles')),
E.LivesAt )
Administrator')
E.Booked );
In a SQL-92 DBMS, SendPage could be only a table. The effect of this query would
then be to insert some rows into the SendPage table. However, in an ORDBMS,
SendPage might actually be an active table, which is an interface to the
communications system used to send electronic messages. The effect of this query
would then be to communicate with the matching temporary workers!
Motivation
Not as good for other kinds of data (e. g., multimedia, networks, CAD).
Inheritance.
Object- Relational
Complex Types
Setof ( foo)
arrayof( foo)
listof( foo)
Can be nested:
Even these types have simple methods associated with them (math, LIKE, etc.)
ORDBMS: User can define new atomic types (& methods) if a type cannot be naturally
defined in terms of the built- in types:
Need input & output methods for types. e. g., Convert from text to internal type and
back.
In most ORDBMS, every object has an OID. So, can “point” to objects --reference types!
ref ( theater_ t)
mytheater row( tno integer, name text, address text, phone integer)
Theaterref( theater_ t)
Parsing
Query Rewriting
UnionsCollection hierarchy’
Optimization
ORDBMS Advantages
ORDBMS technology improves upon what came before it in three ways. The first
improvement is that can enhance a system’s overallperformance. The IDS product can
achieve higher levels of data throughput or better response times than is possible using
RDBMS technology and external logic because ORDBMS extensibility makes it possible
to move logic to where the data resides, instead of always moving data to where the
logic is running. This effect is particularly pronounced for data intensive applications
such as decision support systems and in situations in which the objects in your
application are large, such as digital signal or time series data. But in general, any
application can benefit. The second improvement relates to the way that integrating
functional aspects of the information system within the framework provided by the
ORDBMS improves the flexibility of the overall system. Multiple unrelated object
definitions can be combined within a single ORDBMS database. At runtime, they can be
mingled within a query expression created to answer some high-level question. Such
flexibility is very important because it reduces the costs associated with information
system development and ongoing maintenance. The third benefit of an ORDBMS
relates to the way information systems are built and managed. An ORDBMS system
catalogs become a metadata repository that records information about the modules of
programming logic integrated into the ORDBMS. Over time, as new functionality is
added to the application and as the programming staff changes, the system’s catalogs
can be used to determine the extent of the current system’s functionality and how it all
fits together. The fourth benefit is that the IDS product’s features make it possible to
incorporate into the database data sets that are stored outside it. This allows you to
build federated databases. From within single servers, you can access data distributed
over multiple places.
Explaining flexibility is more difficult because the advantages of flexibility are harder to
quantify. But it is probably more valuable than performance over the long run. To
understand why, let’s continue with the financial company example.
As it turned out, the more profound result of the integration effort undertaken by our
financial firm was that the CalcValue() operation was liberated from its procedural
setting. Before, developers had to write and compile (and debug) a procedural C
program every time they wanted to use CalcValue() to answer a business question.
With the ORDBMS, they found that they could simply write a new OR-SQL query
instead. For example, the query in which is Listed below is showing a join involving a
table that, while in the database, was beyond the scope of the original (portfolio)
problem. “What is the SUM() of the values of instruments in our portfolio grouped by region
and sector?”
SELECT IS.Country,
IS.Region,
With the addition of a small Web front end, end users could use the database to find out
which the most valuable instrument in their portfolio was or which issuer’s instruments
performed most poorly, and so on. It was known that the system had the data
necessary to answer all these questions before. The problem was that the cost of
answering them using C and SQL-92 was simply too high.
Maintainability
After some investigation, the developers discovered that, over time, there had been
several versions of the CalcValue() function. Also, once the CalcValue() algorithm was
explained to end users, they had suggestions that might improve it. With the previous
system, such speculative changes were extremely difficult to implement. They required
a recode-recompile relink redistribute cycle. But with the ORDBMS, the alternative
algorithms could be integrated with ease. In fact, none of the components of the overall
information system had to be brought down. The developers simply wrote the
alternative function in C, compiled it into a shared library, and dynamically linked it into
the engine with another name. What all of this demonstrates is that the ORDBMS
permitted the financial services company to react to changing business requirements far
more rapidly than it could before. By embedding the most valuable modules of logic into
a declarative language, they reduced the amount of code they had to write and the
amount of time required to implement new system functionality. None of this was
interesting when viewed through the narrow perspective of system performance. But is
made a tremendous difference to the business’s operational efficiency.
Text and documents: - Simple cases permit you to find all documents that include
some word or phrase. More complex uses would include creating a network that
reflected similarity between documents.
Digital asset management: - The ORDBMS can manage digital media such as
video, audio, and still images. In this context, manage means more than store and
retrieve. It also means “convert format,” “detect scene changes in video and extract first
frame from new scene,” and even “What MP3 audio tracks do I have that include this
sound?”
Geographic data: - For traditional applications, this might involve “Show me the
lat/long coordinates corresponding to this street address.” This might be extended to
answer requests such as “Show me all house and contents policy holders within a
quarter mile of a tidal water body.” For next-generation applications, with a GPS device
integrated with a cellular phone, it might even be able to answer the perpetual puzzler
“Car 54, where are you?”
Bio-medical: - Modern medicine gathers lots of digital signal data such as CAT scans
and ultrasound imagery. In the simplest case, you can use these images to filter out “all
cell cultures with probable abnormality.” In the more advanced uses, you can also
answer questions such as “show me all the cardiograms which are ‘like’ this
cardiogram.”
Data Model
A data model is a way of thinking about data, and the object-relational data model
amounts to objects in a relational framework. An ORDBMS chief task is to provide a
flexible framework for organizing and manipulating software objects corresponding to
real-world phenomenon.
The object-relational data model can be broken into three areas:
Structural Features. This aspect of the data model deals with how a database’s
data can be structured or organized.
Manipulation. Because a single data set often needs to support multiple user
views, and because data values need to be continually updated to reflect the
state of the world, the data model provides a means to manipulate data.
Integrity and Security. A DBMS data model allows the developers to declare rules
that ensure the correctness of the data values in the database. In the first two
chapters of this book, we introduce and describe the features of an ORDBMS
that developers use to build information systems.
An OR database consists of a group of tables made up of rows. All rows in a table are
structurally identical in that they all consist of a fixed number of values of specific data
types stored in columns that are named as part of the table’s definition. The most
important distinction between SQL-92 tables and object relational database tables is the
way that ORDBMS columns are not limited to a
standardized set of data types. Figure illustrates what an object-relational table looks
like. The first thing to note about this table is the way that its column headings consist of
both a name and a data type. Second, note how several columns have internal
structure. In a SQL-92 DBMS, such structure would be broken up into several separate
columns, and operations over a data value such as Employee’s Name would need to list
every component column. Third, this table contains several instances of unconventional
data types. Lives At is a geographic point, which is a latitude/longitude pair that
describes a position on the globe. Resume contains documents, which is a kind of
Binary Large Object (BLOB). In addition to defining the structure of a table, you can
include integrity constraints in its definition. Tables should all have a key, which is a
subset of attributes whose data values can never be repeated in the table. Keys are not
absolutely required as part of the table’s definition, but they are a very good idea. A
table can have several keys, but only one of these is granted the title of primary key. In
our example table, the combination of the Name and DOB columns contains data
values that are unique within the table. On balance, it is far more likely that an end user
made a data entry mistake than two employees share names and dates of birth.
Another difference between relational DBMS and ORDBMS is the way in which object-
relational database schema supports features co-opted from object-oriented
approaches to software engineering. We have already seen that an object-relational
table can contain exotic data types. In addition, object-relational tables can be
organized into new kinds of relationships, and a table’s columns can contain sets of
data objects. In an ORDBMS, tables can be typed; that is, developers can create a table
with a record structure that corresponds to the definition of a data type. The type system
includes a notion of inheritance in which data types can be organized into hierarchies.
This naturally supplies a mechanism whereby tables can be arranged into hierarchies
too. In the figure illustrates how the Employees table might look as part of such a
hierarchy. In most object-oriented development environments, the concept of
inheritance is limited to the structure and behavior of object classes. However, in an
object-relational database, queries can address data values through the hierarchy.
When you write an OR-SQL statement that addresses a table, all the records in its
subtables become involved in the query too.
Almost all RDBMS allow you to create database procedures that implement business
processes. This allows developers to move considerable portions of an information
system’s total functionality into the DBMS. Although centralizing CPU and memory
requirements on a single machine can limit scalability, in many situations it can improve
the system’s overall throughput and simplify its management. By implementing
application objects within the server, using Java, for examples, it becomes possible,
though not always desirable, to push code implementing one of an application-level
object’s behaviors into the ORDBMS. The interface in the external program simply
passes the work back into the IDS engine. Figure represents the contrasting
approaches. An important point to remember is that with Java, the same logic can be
deployed either within the ORDBMS or within
an external program without changing the code in any way, or even recompiling it. The
novel idea that the ORDBMS can be used to implement many of the operational
features of certain kinds of middleware. Routine extensibility, and particularly the way it
can provide the kind of functionality illustrated in Figure, is a practical application of
these ideas. But making such system scalable requires using other features of the
ORDBMS: the distributed database functionality, commercially available gateways, and
the open storage manager (introduced below). Combining these facilities provides the
kind of location transparency necessary for the development of distributed information
systems.
Storage Management
Traditionally the main purpose of a DBMS was to centralize and organize data storage.
A DBMS program ran on a single, large machine. It would take blocks of that machine’s
disk space under its control and store data in them. Over time, RDBMS products came
to include ever more sophisticated data structures and ever more efficient techniques
for memory caching, data scanning, and storage of large data objects. In spite of these
improvements, only a fraction of an organization’s data can ever be brought together
into one physical location. Data is often distributed among many systems, which is the
consequence of figure. Routine Extensibility and the ORDBMS as the Object-Server
Architecture autonomous information systems development using a variety of
technologies, or through organizational mergers, or because the data is simply not
suitable for storage in any DBMS. To address this, the IDS product adds a new
extensible storage management facility.
that allows a central service to poll all trucks and to have them “phone in” their current
location. With the ORDBMS, you can embed Java code that activates the paging and
response service to implement a virtual table, and then write queries over this new
table, as shown in Listing below.“Find repair trucks and drivers within ’50 miles’ of ‘Alan
Turing’ who are qualified to repair the ‘T-20’ air conditioning unit.”
This functionality should fundamentally change the way you think about a DBMS. In an
object-relational DBMS, SQL becomes the medium used to combine components to
perform operations on data. Most data will be stored on a local disk under the control of
the DBMS, but that is not necessarily the case. The Virtual Table Interface tutorial
describes these features in more detail.
Motivation
Using an RDBMS, on one hand, requires to overcome the well known impedance
mismatch , i.e., performing the non-trivial task of mapping complex object structures and
navigational data processing (at the OOPL layer) to the set-oriented, descriptive query
language (SQL92), which supports just a simple, flat data model. Despite this
considerable mapping overhead, mature RDBMS technology (index structures,
optimization, integrity control, etc.), on the other hand, contributes to keep the overall
system performance acceptable. Several commercial systems mapping object-oriented
structures onto the relational data model are currently available. Such systems are often
referred to as Persistent Object System built on Relation (shortly: POS).
The object-relational wave in database technology has decisively reduced the gap
between RDBMS and OOPL. Although object-relational DBMS (ORDBMS) are able to
(internally) manage object-oriented structures the required seamless coupling of OOPL
and ORDBMS is not yet possible, be-cause (as in SQL92) results of SQL:1999 queries
are rather (sets of data) topples than (desired sets of) objects. In summary, the gap
between OOPL and ORDBMS can be traced back to a whole bunch of modeling and
operational aspects, as we will detail in the following sections. Furthermore, the SQL:
1999 standard and the commercially available ORDBMS differ very much in their object-
oriented features. Thus, it is by no means clear, how a given object-oriented design can
be mapped to a given ORDBMS (most) efficiently, or which features should be offered
by ORDBMS in general in order to enable an efficient mapping of object-oriented
structures, respectively. Our long-term objective is to influence the further development
of ORDBMS to-wards a better support of object-oriented software development
(minimal mapping overhead). Thus, we have proposed a new benchmark approach in
allowing to assess a given ORDBMS by taking into account both, its own performance
as well as the required mapping overhead.
Conceptual Considerations
There is a multiplicity of object data models, for example ODMG, UML, COM, C++ and
Java. All these models support the basic concepts of the OO paradigm; however, there
are certain differences. Independently from the modeling language used in the OO
software development (e. g., UML), SQL: 1999 must be coupled with a concrete OOPL.
In accordance to their overall relevance and conceptual vicinity, we concentrate on the
object model of C++ and ODMG and compare it with the SQL: 1999 standard.
Modeling Aspects
Object Orientation in SQL: 1999 . While the relational data model (SQL2) did not
support semantic modeling concepts sufficiently, in SQL: 1999 the fundamental
extension supporting object-orientation is the structured user-defined data type (UDT).
UDT, which can be considered as object types, can be treated in the same way as pre-
defined data types (built-in data types). Consequently, similarly to the type system of
OOPL the type system of SQL: 1999 is extensible. UDT may be complex structured
and, therefore, may not only contain predefined data types but also set-valued
attributes (collection types) and even other UDT (aggregation) or
references (associations). Obviously, UDT are comparable to the classes of the OO
paradigm. However, according to the SQL: 1999 standard a UDT must be associated
with a table. The notion of typed table, also referred to as object table, allows to
persistently managing instances of a certain UDT within a table. Each topple of such a
table represents an instance (object) of a particular UDT and is identified by a unique
object identifier (OID) which can be sys-tem- generated or user-defined. Besides
instantiable UDT, SQL: 1999 also supports non-instantiable UDT, which conforms to the
notion of abstract classes in OOPL. In addition, UDT may have methods (behavior)
which are either system-generated or implemented by users. They may participate in
type hierarchies, in which more specialized types (subtypes) inherit structure and
behavior from more general types (super-types), but may specialize corresponding
definitions. Thus, SQL: 1999 supports polymorphism and substitutability, however,
multiple-inheritance is not supported. Due to the association of UDT with tables SQL:
1999 does not support encapsulation and, consequently, there is nothing like the
degree of encapsulation known from OOPL (public, protected, private).
Operational Aspects
Besides the fundamental modeling aspects discussed so far, we also have to examine
operational aspects in order to figure out the conceptual distance adequately. The
following aspects are most relevant to our consideration:
Object Behaviour: -. Of course, the operational aspects also encompass the object
behaviour implemented in the database. Because of special implementation aspects
these methods (UDF) can almost exclusively be executed at the server side, or, if these
UDF or special client invokable pendants are executed at the client side, it cannot be
guaranteed that these pendants perform the original semantics. For example, there may
be complex dependencies between UDF and integrity constraints, e. g., referential
integrity constraints and triggers, which are implemented by using SQL and are
automatically ensured by the DBMS. Thus, it is almost impossible to support calling
UDF at the OOPL level in the same (‘natural’) way as usually object methods can be
called. There-fore, we do not consider a mapping of object methods in this paper and
restrict our considerations to navigational and set-oriented access.
Mapping Rules
In the previous section, we outlined the conceptual distance between the OO paradigm
and SQL: 1999. Considering an individual ORDBMS, its OO features determine the
overhead which has at least to be spent in order to bridge this distance. Nevertheless,
in theory there is an entire spectrum of possibilities to design the required mapping
layer. At this point, we want to mention that there are some more aspects of ORDBMS,
which OO applications may benefit from, but which cannot be captured in this paper, e.
g., and facilities for integrating external data sources into database processing. The
other hand, on how far the OO features are to be exploited. Regarding the first point
(‘natural’ coding), we demand that the programmer must not be burdened by having to
take data management aspects into account. Thus, programming must be independent
of the database as well as the mapping layer design. Regarding the second point
(degree of exploiting OO features), we want to outline the two extremes of the
mentioned spectrum, i.e., pure relational mapping and full exploitation of the OO
features offered by the considered ORDBMS
Pure Relational Mapping: - As mentioned before, there are several commercial
POS mapping OO structures to relational tables. Objects are represented by table rows.
Since RDBMS do not support set-valued attributes, user-defined data types, and object
references, additional tables are required to store corresponding data and to connect
them with the corresponding class tables via foreign keys. Thus, several tables may be
required to map a given class. Principally, there are several ways of representing a
class hierarchy in the relational model. After studying pros and cons, we decided to use
the horizontal partitioning approach, since it provides good performance in most cases,
and is also used in most commercial POS.
Performance Evaluation
Our discussion shows that there is only a small difference between the OO and OR
paradigms w. r. t. modelling aspects, but a considerable distance w. r. t. the operational
aspects and the application semantics. In order to further evaluate this distance as well
as to quantify the overhead required for bridging this gap, we propose a configurable
benchmark approach [18, 26]. Remind, we do not consider OODBMS, but ORDBMS,
because we more and more have to face the situation that people are using OOPL for
software development and (O) RDBMS for data management purposes so that there is
a need for a more detailed examination of the efficiency of possible coupling
mechanisms. Consequently, the OO-Benchmark representing an important standard for
benchmarking OO systems is not appropriate for our purposes. The performance of
RDBMS or ORDBMS has traditionally been evaluated in isolation by applying a
standard benchmark directly at the DBMS interface. Sample benchmarks are the
Wisconsin benchmark, the TPC benchmark as well as the Bucky benchmark [6]. These
benchmarks are very suitable for comparing different DBMS with each other. However,
none of these benchmarks helps to assess the contributions of a DBMS to OO software
development. Consequently, these other approaches do not take the typical application
server architecture and the fact that the DBMS capabilities determine the overhead of
the required application/mapping layer into account. Furthermore, data types as well as
operations of the applications we consider may differ significantly (double-edged
sword), so that a standard benchmark can not cover the entire spectrum. Therefore, we
propose an open, configurable benchmark system allowing examining the entire system
(incl. mapping layer) w. r. t. to its typical applications. Such a system will also help us to
get results transcending those reported on in this paper (see succeeding sections), e.
g., and more detailed examinations of navigational support. In the following, we outline
our first prototype.
Benchmark System
New query templates can be easily added, if the existing templates do not reflect
application characteristics sufficiently. Based on these user selections/ specifications
the load generator creates a set of queries which is passed to the query executor,
which, in turn, serves as a kind of driver for measurements. Users can also specify
which kinds of measurement data are to be collected by the system, i. e., amount of
time spent at the DB or the mapping layer for query transformation, or the time spent for
SQL query evaluation, data loading, and/or result set construction. Corresponding
values are collected by the data collector during execution of the query set and after-
wards stored in the DBS for further evaluations. As explained in more detail in , the
special challenges of this benchmark approach are, on one hand, to properly take into
account the requirements of OO system development, and, on the other hand, to
guarantee an optimal mapping w. r. t. the particular capabilities of an individual
ORDBMS.
2. Which additional overhead has to be spent at the mapping layer in order to bridge the
gap between the OO and OR paradigms and how does it behave facing different query
types?
3. To which extent is the system performance influenced by the capabilities of the (O)
RDBMS API?
In order to be able to answer the first two questions, we have selected a set of typical
benchmarking queries according to a long-term study of a leading software company.
These queries represent a wide spectrum of typical operations in the target applications
of ORDBMS. We have compared a purely relational mapping with an object-relational
one (by means of exploiting its OO modelling power) by using a currently available
commercial ORDBMS. This way we ‘measured’ how OO software development can
leverage from the OO extensions offered by ORDBMS (e. g., structured UDT,
references, etc.). The operations considered for that purpose are implemented as query
tem-plates and grouped in following categories:
Navigation operations: Navigation operations, such as GetObject(OID), are not
directly supported by almost all currently available ORDBMS. Considering such
operations helps us to assess the performance of ORDBMS in supporting navigational
processing. We hope that corresponding results ‘help’ ORDBMS vendors to make
ORDBMS as efficient as OODBMS are in this concern.
Queries with predicates on UDT: This group contains queries with simple predicates
(a single comparison operation) on attributes of structured, non-atomic data types.
Thus, it mainly serves for assessing the efficiency of mapping UDT to (O) RDBMS.
Queries with complex predicates: Queries of this group contain complex predicates
challenging both query transformation as well as query optimization.
Queries on the class hierarchy: While all other queries exclusively deliver direct in-
stances of a single queried class, queries of this group deliver transitive instances as
well. Predicates conform to those of the second category. This group of queries allows
to evaluate the efficiency of the ORDBMS in handling class hierarchies (inheritance).
The comparison with the relational mapping has been expected to demonstrate the
advantages of ORDBMS. The third question posed at the beginning of this section deals
with the capabilities of the DB interface especially w. r. t. support for complex structured
objects and navigational access. In order to examine these aspects, we performed
measurements on two different (commercially successful) ORDBMS. One of these
systems offers the more traditional interface, whereas the second one provides some
basic means of supporting complex structures objects. We performed our
measurements 4 on a benchmarking database with 100 classes and 250000 instances
(configuration medium). In order to use a representatively structured class hierarchy, we
studied typical application scenarios of a renowned vendor of business standard
software and parameterized our population algorithm accordingly. We measured the
database time (DB time) and the total system time (TS time). The DB time of SQL
queries is the time elapsed between delegating the queries to the DBMS and receiving
back the results (open cursors, traverse iterators). It includes the time for client/ server
communication, the time for evaluating the queries within the DBS and the time for
loading the complete result sets. This has to be taken into account, when analyzing the
measurement results. The TS time is defined as the total elapsed time from issuing a
query operation at the OOPL level until having received the complete result set. It
contains the time spent within the mapping layer as well as the DB time. We think that
these 3 questions have to be answered before we can think about, how OR technology
can be improved in order to support OOPL better and more efficiently. In the following
section, we report on our measurement results.
In the first test series, we have compared a purely relational mapping with an OR
mapping by using one of the leading currently available commercial ORDBMS. The
hardware and the soft-ware configurations are left unspecified, to avoid the usual legal
and competitive problems with publishing performance numbers for commercial
products.
All performance measurements are averages of multiple trials, with more trials for
higher variance measurements. For each DBMS tested, we put much effort in
optimization (e.g., indexes) and mapping layer design in order to achieve the
The DB time of set-oriented queries shows only a slight ascent with increasing result
sets, while the additional mapping overhead increases rapidly. When retrieving 1250
objects, the time spent at the mapping layer even exceeds 86% of the total system time.
This observation can be explained as follows. In the early days of ORDBMS, these
systems comparable to RDBMS were not very successful in supporting navigational
access, but excellent in processing set-oriented access (as they are still today).
Unfortunately, OO applications can hardly benefit from this advantage, because the
ORDBMS API is ‘inherited’ from traditional RDBMS and, therefore, still only supports
simple, flat data. In lack of anextensible DB API which may generically support complex
data types defined by the user, complex objects in an ORDBMS have to be first
‘disassembled’ into scalar values, and afterwards reconstructed (at the mapping layer)
to objects of a certain class in the particular OOPL. This kind of overhead gets dramatic
with increasing result set cardinality and impairs the en-tire system efficiency
significantly. Regarding these measurement results, we can draw the following
conclusions. In order to be able to support navigational access better, the ORDB API
should directly sup-port navigational operations like GetObject(Ref), so that the costs of
transforming navigational operations to SQL queries and for evaluating these queries
can be avoided. Furthermore, it should also support the notion of complex objects
directly and offer the possibility of retrieving complex objects as units. According to our
examinations, such improvements can increase the entire system efficiency by up to
400%.
As already mentioned before, the lack of direct support for complex objects and
navigational access, the API level extremely impairs the overall system efficiency.
Fortunately, a leading ORDBMS vendor already offers an extended call level interface,
which, as we can see later in this section, directly supports navigational access as well
as retrieval of complex objects as units, and, in addition, even retrieval of complex
object graphs as units. Navigation is enabled by the possibility of autonomously
retrieving complex structured objects (by OID) as instances of C structures. This
simplifies the mapping to OOPL, such as C++, considerably and, therefore, is
undoubtedly the first step into the right direction, although this mechanism does not yet
supports the actually wanted seamless coupling (transparent transformation from a
database object to an instance of an OOPL class). The mentioned support for complex
objects at the level of the DB API allows to directly retrieving a complex object’s data
from the database into the main memory by specifying its OID or a predicate. Therefore,
the expensive query processing strategy described earlier, that can be avoided. Remind
that the measurements described in section earlier, that have been performed on an
ORDBMS that does not possess a DB API as the one described in this section. To
show the importance of and the corresponding demand on a suitable support for
complex objects at the DB API level, we repeated the measurements described earlier,
that on the ORDBMS referred to in this section and providing the mentioned complex
object support at its API. Illustrates the measurement results. Obviously, the additional
overhead spent at the mapping layer is now independent from the cardinality of the
query result sets. Thus, the direct supports of complex objects at the DB API results in a
clear performance gain (up to 400%). The direct support for navigational access at the
DB API level mentioned in this section, allowing to directly access objects by calling a
function like Get-Object( Ref), avoids expensive processes (query transformation, data
types conversion and object re-construction). This obviously contributes to improve
performance significantly. Fig. 6a shows a comparison between a query strategy
(transforming a navigational operation to an SQL query) and a navigational strategy
(directly calling a GetObject(OID)function at the DB API). The advantages of the
navigational strategy are obvious. With a direct support of navigational access the entire
system efficiency increases by approximately 200%. OO applications often want a set of
objects interconnected by object references (object graph) to be retrieved completely
within just a single database interaction. Fig. 6b shows a comparison of two strategies
for retrieving complex object graphs. In this measurement, we used the ORDBMS
directly supporting navigation as well as retrieval of complex object graphs. It can be
seen clearly that strategy I exploiting the ability of retrieving object graphs exhibits a
performance gain of about 100% already at a result set cardinality of 13 objects. The
design of a new DB API, which directly supports complex structured objects, is by no
means an easy job and requires generic design methods, because user-defined data
types can be arbitrarily structured, e. g., contain other complex data types, such as
UDT, references and set-valued attributes. Furthermore, a DB API has always to be
multi-lingual requiring to support all common programming languages simultaneously
and, therefore, making it very difficult to offer the best of both worlds (DBMS and OOPL)
without any compromises.
Conclusions
Bibliography
www.yahoo.com
www.askjeeves.com
www.hotmail.com
www.oracle.com
References
1. Bernhard, R., Flehmig, M., Mahdoui, A., Ritter, N., Steiert, H.-P., Zhang, W.P.:
2. Bernstein, P.A., Harry, B., Sanders, P.J., Shutt, D., Zander, J.: “The Microsoft Re-pository”,
3. Bernstein, P.A., Pal, S., Shutt, D.: “Context-Based Prefetch for Implementing Ob-jects
6. Carey, M.J., DeWitt, D.J., Naughton, J.F., Asgarian, M., Brown, P., Gehrke, J.E.,
pp. 135-146
7. Carey, M.J., DeWitt, D.J., Kant, C., Naughton, J.F.: “A Status Report on the OO7
9. Cattell, R.G.G., Barry, D., Bartels, D., et al: “The Object Database Stand-ard:
10. Gray, J.: “The Benchmark Handbook for Database and Transaction Processing
11. Gulutzan, P., Pelzer, T.: “SQL-99 Complete, Really”, R&D Publications, 1999
12. Keller, A., Jensen, R., Agrawal, S.: “Persistence Software: Bridging Object-Ori-ented
pp. 523-528
13. Mahnke, W., Steiert, H.-P.: “The Application Protential of ORDBMS in the Design
15. Poet Object Server, POET Software, POET SQL Object Factory, http://poet.com/
http://www.roguewav3e.com/products/dbtools/
REACTIONS:
LABELS: COMPUTER SCIENCE AND INFORMATION TECHNOLOGY
0 COMMENTS:
POST A COMMENT
Create a Link
Note: Hundreds of Free MBA Reports are available on this website, yet the front page doesn't show all.
Please type in the right-side box to get your desired reports.
CATEGORIES
Agriculture
Articles
Automobiles Cars Vehicles
Bangladesh
Banks and Financial Institutions
Business Communication
Business Research Methods BRM
Case Studies
Chemicals Paint and Coatings Industry
Compensation System Development
Computer Science and Information Technology
Construction and Building Business
Consumer Products
Current Affairs and Politics
Customer Relation Management
Defense and Weapons
Economics
Education Learning and Training
Electrical and Electronics Business
Engineering
Entrepreneurship
Faisalabad
Fashion Industry
Finance
Financial Accounting
Financial Management
Food and Drinks Industry
Garments and Clothes
General Business Topics
Hotels and Recreational Business
Human Resource Management (HRM)
Import and Export
India
Internship and Apprenticeship
Live Stock and Animals Feed
Living and Household Products
Management
Managerial
Managerial Accounting Management Accounting
Marketing
Multan
Multinational Corporation and International Companies
National Investment
Natural Gas and Oil Business Reports
Natural Resources
Non Profit Not-for-Profit Organizations
Organizational Behavior (OB)
Pakistan
Plastic Industry
Psychology
Public Service and NGOs
Rankings and Ratings
Ratio Analysis Financial Analysis
Shipping and Transportation
Small Industries
Socio-Economic
Telecommunication
Textile Mills
USA
Video Lectures
If you have any query, want a report but can't find it or you need any other help, then write in this
message box to notify admin. You can also talk to other users here.
POPULAR REPORTS
STATS