Você está na página 1de 74

History of Database

Database
It is often said that we live in an information society and that information is a very
valuable resource (or, as some people say, information is power). In this information
society, the term database has become a rather common term although its meaning
seems to have become somewhat vague as the importance of database systems has
grown. Some people use the term database of an organization to mean all the data in
the organization (whether computerized or not). Other people use the term to mean the
software that manages the data. We would use it to mean a collection of computerized
information such that it is available to many people for various uses.

“A database is a well organized collection of data that are related in a meaningful way
which can be accessed in different logical orders but are stored only once. The data in
the database is therefore integrated, structured, and shared.”

Or

“A Database can be considered an organized collection of information.”

For example, it is easy to find a book in a library because books are arranged in an
organized manner. The tools that help us

 Store Data

 Adding new Information

 Changing the existing information

 Deleting the existing information

 Viewing the information

The data is managed by a Database Management System.

The phone book is a good example of a simple database. For example, to find the
phone number of Mr. Jonathan Smith, living at No.5 4 thAvenue Apt 15, New York, NY,
one may use the following process: Get the phone book for the city of New York. Flip to
the section where the last name is ‘Smith’. More than one Mr. Smith may be listed. In
that case, look for Mr. Smith whose first name is Jonathan. Look for Jonathan Smith
living at No. 5 4 th Avenue Apt 15, New York, NY to get the corresponding phone
number. In the example above, the phone number can be found because the
information in the phone book was organized in a certain manner.

In the early days of computing, computers were mainly used for solving numerical
problems. Even in those days, it was observed that some of the tasks were common in
many problems that were being solved. It therefore was considered desirable to build
special subroutines which perform frequently occurring computing tasks such as
computing Sin(x). This lead to the development of mathematical subroutine libraries that
are now considered integral part of any computer system.

By the late fifties storage, maintenance and retrieval of non-numeric data had become
very important. Again, it was observed that some of the data processing tasks occurred
quite frequently e.g. sorting. This lead to the development of generalized routines for
some of the most common and frequently used tasks. A general routine by its nature
tends to be somewhat less efficient than a routine designed for a specific problem. In
the late fifties and early sixties, when the hardware costs were high, use of general
routines was not very popular since it became a matter of trade-off between hardware
and software costs. In the last two decades, hardware costs have gone down
dramatically while the cost of building software has gone up. It is no more a trade-off
between hardware and software costs since hardware is relatively cheap and building
reliable software is expensive.

Database management systems evolved from generalized routines for file processing
as the users demanded more extensive and more flexible facilities for managing data.
Database technology has undergone major changes during the 1970's. An interesting
history of database systems is presented by Fry and Sibley (1976).

Types of Databases
There are many techniques and technologies for managing

 DBMS (Database Management Systems)

 dBase, Fox Pro

 HDBMS (Hierarchical DBMS)

 IMS

 NDBMS (Network DBMS)

 IDMS
 RDBMS (Relational DBMS)

 Oracle, SQL Server, Sybase, Informix, etc.

 ORDBMS (Object Relational DBMS)

 Oracle

DBMS: - Database Management Systems have been used for quite a while. Software
packages like dBase; Clipper, etc had facilities for managing data. The user had to take
care of two aspects – specify what data was to be retrieved and how to go and get the
data.

HDBMS: - Managed the data in a hierarchical fashion. IBM’s product IMS is a


Hierarchical Database. It is used in mainframe computers. It is still in use in some
installations. Over a period of time the demands of the users prompted a search for
alternate approaches.

NDBMS: - Managed the data according to network models. IBM’s product IDMS is a
Network Database. It is also used in mainframe computers. Like the HDBMS (IMS), it is
still used in a few installations. Over a period of time disadvantages of the network
database technology prompted the search for an alternate solution. With more powerful
personal computer systems and the inherent limitations of the Hierarchical and Network
database models, the popularity of the Relational Database Model increased.

RDBMS: - Was invented by the team lead by Dr. Edmund F. Codd and funded by IBM
in the early 1970’s. The Relational Model is based on the principles of relational
algebra. This model also known as the Relational Database Management System is
very popular and is in use by a majority of the Fortune 500 companies. This model will
be used in great detail in this course. Many vendors sell products that conform to this
model. Some of these vendors are: Oracle, SQL Server, Sybase, Informix, Ingress,
Gupta SQL, DB2, and Microsoft Access.

ORDBMS: - This model has been proposed and is presently being implemented by
the Oracle Corporation. This model addresses the shortcomings of the RDBMS model.
The Object Relational Database Management System is not purely object oriented.
However, it is implemented via the same relational engine that drives the Relational
Database Management System.

Data is stored in tables. A table is the combination of columns and rows.


Each column contains a certain type of data about a person. For example, the first
column contains the last name, the second column contains the first name and the third
column contains the address. Eachrow contains all types of data about a person.
he Client computer sends requests to the server. The Server processes the requests and
delivers results to the client.

Server: - This is usually a high powered computer. Typically it could consist of several
processors (2, 3 or 4), a very large amount of RAM (2GB or 4GB), a very large amount of hard
disk storage space (possibly a few hundred GB), high speed network data transfer rates, etc.
The server need not be from a specific vendor. It could be purchased from any vendor. The
server platform may be UNIX, Windows NT, AIX, Sun Solaris, Novell, etc.

Client: - These are usually personal computers. They have processing power of their own.
The client operating system may be Windows 95, Windows 98, Windows 2000, Windows NT
Workstation, Sun Sparc Station, Mac OS, etc.

Operation: -  Operation of the client server network is as follows:

• The server is powered on and is ready to receive connections / requests. The network
administrator configures the server.

• Each client is powered on and is required to provide user identification and authentication
information (such as a user ID and a password) to connect to the server. In a multiple server
environment, the client computer may also be required to specify the location of the server to
which it must connect. This setup creates the infrastructure for other programs to be used on
this configuration. In most environments, databases, e-mail servers, web servers are run on a
server computer. Clients use the resources offered by these databases, websites, etc.

DBMS
(Database Management System)
As discussed earlier, a database is a well organized collection of data. To be able to carry out
operations like insertion, deletion and retrieval, the database needs to be managed by a
substantial package of software. This software is usually called a Database Management
System (DBMS). The primary purpose of a DBMS is to allow a user to store, update and
retrieve data in abstract terms and thus make it easy to maintain and retrieve information from a
database. A DBMS relieves the user from having to know about exact physical representations
of data and having to specify detailed algorithms for storing, updating and retrieving data.

A DBMS is usually a very large software package that carries out many different tasks including
the provision of facilities to enable the user to access and modify information in the database.
The database is an intermediate link between the physical database, the computer and the
operating system, and on the other hand, the users. To provide the various facilities to different
types of users, a DBMS normally provides one or more specialized programming languages
often called Database Languages. Different DBMS provide different database languages
although a language called SQL has recently taken on the role of a de facto standard. Database
languages come in different forms. A language is needed to describe the database to the DBMS
as well as provide facilities for changing the database and for defining and changing physical
data structures. Another language is needed for manipulating and retrieving data stored in the
DBMS. These languages are called Data Description Languages (DDL) and Data Manipulation
Languages (DML) respectively.

To summarize, a database system consists of

1. The database (data)


2. A DBMS (software)
3. A DDL and a DML (Part of the DBMS)Application programs

Some DBMS packages are marketed by computer manufacturers that will run only on that manufacturer's
machines (e.g. IBM's IMS) but increasingly independent software houses are designing and selling DBMS
software that would run on several different types of machines (e.g. ORACLE).

Evolution of Database Management Systems


The history of Database Management Systems (DBMS) is an evolutionary history. Each
successive wave of innovation can be seen as a response either to some limiting aspect of the
technology preceding it or to demands from new application domains. In economic terms,
successful information technology lowers the costs incurred when implementing a solution to a
particular problem. Lowering development costs increases the range of information
management problems that can be dealt with in a cost-effective way, leading to new waves of
systems development.

As an example of changing development costs, consider that early mainframe information


systems became popular because they could make numerical calculations with unprecedented
speed and accuracy (by the standards of the time). This had a major impact on how businesses
managed financial assets and how engineering projects were undertaken. It took another
evolutionary step—the invention of higher level languages such as COBOL—to get to the point
that the economic return from a production management system justified the development cost.
It took still another innovation—the adoption of relational DBMS before customer management
and human resource systems could be automated. Today it is difficult to imagine how you could
use COBOL and ISAM to build the number and variety of e-commerce Web sites that have
sprung up over the last three years. And looking ahead, as we move into an era of ubiquitous,
mobile computers and the growing importance of digital media, it is difficult to see how RDBMS
technology can cope.

Early Information Systems


Pioneering information systems of the late 1960s and early 1970s were constrained by the
capacity of the computers on which they ran. Hardware was relatively expensive, and by today’s
standards, quite limited. Consequently, considerable human effort was devoted to optimizing
programming code. Diverting effort from functionality to efficiency constrained the ambition of
early developers, but the diversion of effort was necessary because it was the only way to
develop an information system that ran with adequate performance. Information systems of this
era were implemented as monolithic programs combining user-interface, data processing logic,
and data management operations into a single executable on a single machine. Such programs
intermingled low-level descriptions of data structure details, how records were laid out on disk,
how records were interrelated, with user-interface management code. User-interface
management code would dictate, for example, where an element from a disk record went on the
screen. Maintaining and evolving these systems, even to perform tasks as simple as adding
new data fields and indexes, required programmers to make changes involving most aspects of
the system. Experience has shown that development is the least expensive stage of an
information system’s life cycle. Programming and debugging take time and money, but by far
the largest percentage of the total cost of an information system is incurred after it goes into
production. As the business environment changes, there is considerable pressure to change
systems with it. Changing a pre-relational information system, either to fix bugs or to make
enhancements, is extremely difficult. Yet, there is always considerable pressure to do so. In
addition, hardware constantly gets faster and cheaper. A second problem with these early
systems is that as their complexity increased, it became increasingly important to detect errors
in the design stage. The most pervasive design problem was redundancy, which occurs when
storage structure designs record an item of information twice.

DIFFERENT DBMS ARCHITECTURES


There are three types of Architectures in the Database. They are as under

Hierarchical Model

The hierarchical model is the basis of the oldest database management systems, which grew
out of early attempts to organize data for the U.S space program. Since these databases were
ad-hoc solution to immediate problems, they were created without the strong theoretical
foundations that later systems had. Their designers were familiar with file organizations and
data structures, and used these concepts to solve the immediate data representation problems
of users. The hierarchical model uses the tree as its basic data structure. Nodes of the trees in
the hierarchical model represent data records or record segments, which are the portions of the
data records. Relationships are represented as links or pointers between nodes.

The Network Model

The network uses a network or plex structure, which is data structure consisting of nodes and
branches. Unlike a tree, a plex structure allows a node to have more than one parent. The
nodes of the network represent records of various types. Relationships between records are
represented as links, which become pointers in the implementation.

Relational Model

The relational model was proposed by Codd in 1970 and continues to be the subject of much
research. It is now widely used by both mainframe and microcomputers-based DBMSs because
of its simplicity from user’s point of view and its power. The relation model uses the theory of
relations from mathematics and adapts it for use in database theory. In relation model both
entities and relationships are represented by relations which are physically represented as
tables or two-dimensional arrays, and attributes as columns of those tables.

Before going into the detail of the Relational and Object Relational Database Management
Systems we should have a knowledge of the Relational Design.

RDBMS
(Relational Database Management System)
In 1970 an IBM researcher named Ted Codd wrote a paper that described a new approach to
the management of “large shared data banks.” In his paper Codd identifies two objectives for
managing shared data. The first of these is data independence, which dictates that applications
using a database be maintained independent of the physical details by which the database
organizes itself. Second, he describes a series of rules to ensure that the shared data
is consistent by eliminating any redundancy in the database’s design. Codd’s paper deliberately
ignores any consideration of how his model might be implemented. He was attempting to define
an abstraction of the problem of information management: a general framework for thinking
about and working with information. The Relational Model Codd described had three parts: a
data model,

a means of expressing operations in a high-level language, and a set of design principles that
ensured the elimination of certain kinds of redundant data problems. Codd’s relational model
views data as being stored in tables containing a variable number of rows (or records), each of
which has a fixed number of columns. Something like a telephone directory or a registry of
births, deaths, and marriages, is a good analogy for a table. Each entry contains different
information but is identical in its form to all other entries of the same kind. The relational model
also describes a number of logical operations that could be performed over the data. These
operations included a means of getting a subset of rows (all names in the telephone directory
with the surname “Brown”), a subset of columns (just the name and number), or data from a
combination of tables (a person who is married to a person with a particular phone number). By
borrowing mathematical techniques from predicate logic, Codd was able to derive a series of
design principles that could be used to guarantee that the database’s structure was free of the
kinds of redundancy so problematic in other systems. Greatly expanded by later writers, these
ideas formed the basis of the theory of normal forms. Properly applied, the system of normal
form rules can ensure that the database’s logical design is free of redundancy and, therefore,
any possibility of anomalies in the data.

Elements of a Relational Database


There are four basic elements which are necessary for the Relational Database. Those are given below.

 Relations

 Attributes

 Domains

 The Relational Operators.

Codd describe a data storage system that possessed three characteristics that were sorely needed at that
time:

1. Logical data independence: - This desirable characteristic means that changes


made to an attribute (column), for example, an increase or decrease in size have no
perceivable effect on other attributes for the same relation (table). Logical data
independence was attractive to data processing organizations because it could
substantially reduce the cost of software maintenance.

2. Referential and data integrity: -  Unlike other database systems, a relational


database would relieve the application software of the burden of enforcing integrity
constraints. Codd described two characteristics that would be maintained by a relational
database, referential integrity and data integrity.

3. Ad hoc query: - This characteristic would enable the user to tell the database which
data to retrieve without indicating how to accomplish the task.

Some time passed before a commercial product actually implemented some of the relational database
features that Codd described. Today many vendors sell Relational Database Management
Systems; some of the more well-known vendors are Oracle, Sybase, IBM, Informix, Microsoft, and
Computer Associates. Of these vendors, Oracle has emerged as the leader. The Oracle RDBMS engine
has been ported to more platforms than any other database product. Because of Oracle's multiplatform
support, many application software vendors have chosen Oracle as their database platform. And now,
Oracle Corporation has ported the same RDBMS engine to the desktop environment with its release of
Personal Oracle.
Relational Database Implementation
Today's relational databases implement a number of extremely useful features that
Codd did not mention in his original article. However, as of this writing, no commercially
available database is a full implementation of Codd's rules for relational databases.

During the early 1970s, several parallel research projects set out to implement a working
RDBMS. This turned out to be very hard. It wasn’t until the late 1980s that RDBMS products
worked acceptably in the kinds of high-end, online transactions processing applications served
so well by earlier technologies. Despite the technical shortcomings RDBMS technology
exploded in popularity because even the earliest products made it cheaper, quicker, and easier
to build information systems. For an increasing number of applications, economics favored
spending more on hardware and less on people. RDBMS technology made it possible to
develop information systems that, while desirable from a business management point of view,
had been deemed too expensive. To emphasize the difference between the relational and pre-
relational approaches, a four hundred line C program can be replaced by the SQL-92
expression in Listing below.

CREATE TABLE Employees (

Name VARCHAR(128),

DOB DATE,

Salary DECIMAL(10,2),

Address VARCHAR(128)

);

The code in list implements considerably more functionality than a C program because RDBMS
provide transactional guarantees for data changes. They automatically employ locking, logging,
and backup and recovery facilities to guarantee the integrity of the data they store. Also,
RDBMS provide elaborate security features. Different tables in the same database can be made
accessible only by different groups of users. All of this built-in functionality means that
developers focus more of their effort on their system’s functionality and less on complex
technical details.

Relational Database Theory


With relational database, systems can be considered complete unless it discusses the basic concepts of
relational database theory. In a nutshell:
 Every entity has a set of attributes that uniquely define an instance of that entity.

 No part of the primary key can be null.

 Every entity has a set of attributes that uniquely identify each row in that entity.

 A table cannot have duplicate rows.

 Tables are related to one another.

 The order of the rows in a table is arbitrary.

 The order of the columns in a table is arbitrary.

Architecture of RDBMS
There are three standard levels for Relational Database management system

 Conceptual level

 External level

 Internal level

External Level

The external level consist of many different external views or external models of the database. Each user
has a model of the real world represented in a form that is suitable for that user. A particular user interacts
with only certain entities in the real world and is interested in only some of their attributes and
relationships. Therefore, that user’s view contain only information about those aspects of the real world.

Conceptual Level

The conceptual level consists of base tables, which are physical stored tables. These tables are
created by Database Administrator using a CREATE Table command. Once the table is created
the DBA can create “VIEW” for Users. A view may be a subset of single base table or may be
created by combining base tables or performing other Operations on them.

Internal Level

The level which covers the physical implementation of the database. It includes the data
structure and file organization used to store data on physical storage devices. The internal
schema, written in DDL, is a complete description of the internal model. It includes such items
as how data is represented, how records are sequenced, what index exist, what pointers exist.
An internal record is a single stored record. It is the unit that is passed up to the internal level.

Primary Key

Every entity has a set of attributes that uniquely define an instance of that entity. This set of attributes is
called the primary key. The primary key may be composed of a single attribute

No Part of the Primary Key Is Null

A basic tenet of relational theory is that no part of the primary key can be null. If you think about that idea
for a moment, it seems intuitive: The primary key must uniquely identify each row in an entity; therefore, if
the primary key (or a part of it) is null, it wouldn't be able to identify anything.

Data Integrity

According to relational theory, every entity has a set of attributes that uniquely identify each row in that
entity. Relational theory also states that no duplicate rows can exist in a table.

Referential Integrity

Tables are related to one another through foreign keys. A foreign key is one table column for which the
set of possible values is found in the primary key of a second table. Referential integrity is achieved when
the set of values in a foreign key column is restricted to the primary key that it references or to the null
value. Once the database designer declares primary and foreign keys, enforcing data and referential
integrity is the responsibility of the DBMS.

Five Normal Forms in Relational Database Theory

INTRODUCTION

The normal forms defined in relational database theory represent guidelines for record design. The
guidelines corresponding to first through fifth normal forms are presented here, in terms that do not
require an understanding of relational theory. The design guidelines are meaningful even if one is not
using a relational database system. We present the guidelines without referring to the concepts of the
relational model in order to emphasize their generality, and also to make them easier to understand. Our
presentation conveys an intuitive sense of the intended constraints on record design, although in its
informality it may be imprecise in some technical details. A comprehensive treatment of the subject is
provided by Date.

The normalization rules are designed to prevent update anomalies and data inconsistencies. With respect
to performance tradeoffs, these guidelines are biased toward the assumption that all non-key fields will be
updated frequently. They tend to penalize retrieval, since data which may have been retrievable from one
record in an un-normalized design may have to be retrieved from several records in the normalized form.
There is no obligation to fully normalize all records when actual performance requirements are taken into
account.

FIRST NORMAL FORM

First normal form deals with the "shape" of a record type. Under first normal form, all occurrences of a
record type must contain the same number of fields. First normal form excludes variable repeating fields
and groups. This is not so much a design guideline as a matter of definition. Relational database theory
doesn't deal with records having a variable number of fields.

SECOND AND THIRD NORMAL FORMS

Second and third normal forms deal with the relationship between non-key and key fields. Under second
and third normal forms, a non-key field must provide a fact about the key, us the whole key, and nothing
but the key. In addition, the record must satisfy first normal form. We deal now only with "single-valued"
facts. The fact could be a one-to-many relationship, such as the department of an employee, or a one-to-
one relationship, such as the spouse of an employee. Thus the phrase "Y is a fact about X" signifies a
one-to-one or one-to-many relationship between Y and X. In the general case, Y might consist of one or
more fields, and so might X. In the following example, QUANTITY is a fact about the combination of
PART and WAREHOUSE.

Second Normal Form

Second normal form is violated when a non-key field is a fact about a subset of a key. It is only relevant
when the key is composite, i.e., consists of several fields. Consider the following inventory record:

---------------------------------------------------
| PART | WAREHOUSE | QUANTITY | WAREHOUSE-ADDRESS |
====================-------------------------------

The key here consists of the PART and WAREHOUSE fields together, but WAREHOUSE-ADDRESS is a
fact about the WAREHOUSE alone. The basic problems with this design are:

 The warehouse address is repeated in every record that refers to a part stored in that warehouse.
 If the address of the warehouse changes, every record referring to a part stored in that
warehouse must be updated.
 Because of the redundancy, the data might become inconsistent, with different records showing
different addresses for the same warehouse.
 If at some point in time there are no parts stored in the warehouse, there may be no record in
which to keep the warehouse's address.
To satisfy second normal form, the record shown above should be decomposed into (replaced by) the two
records:

-------------------------------
---------------------------------
| PART | WAREHOUSE | QUANTITY | | WAREHOUSE |
WAREHOUSE-ADDRESS |
====================-----------
=============--------------------

When a data design is changed in this way, replacing unnormalized records with normalized records, the
process is referred to as normalization. The term "normalization" is sometimes used relative to a
particular normal form. Thus a set of records may be normalized with respect to second normal form but
not with respect to third. The normalized design enhances the integrity of the data, by minimizing
redundancy and inconsistency, but at some possible performance cost for certain retrieval applications.
Consider an application that wants the addresses of all warehouses stocking a certain part. In the
unnormalized form, the application searches one record type. With the normalized design, the application
has to search two record types, and connect the appropriate pairs.

Third Normal Form

in this way, replacing unnormalized records with normalized records, the process is referred to as
normalization. The term "normalization" is sometimes used relative to a particular normal form. Thus a set
of records may be normalized with respect to second normal form but not with respect to third. The
normalized design enhances the integrity of the data, by minimizing redundancy and inconsistency, but at
some possible performance cost for certain retrieval applications. Consider an application that wants the
addresses of all warehouses stocking a certain part. In the unnormalized form, the application searches
one record type. With the normalized design, the application has to search two record types, and connect
the appropriate pairs.

Third Normal Form

Third normal form is violated when a non-key field is a fact about another non-key field, as in

------------------------------------
| EMPLOYEE | DEPARTMENT | LOCATION |
============------------------------
The EMPLOYEE field is the key. If each department is located in one place, then the LOCATION field is a
fact about the DEPARTMENT -- in addition to being a fact about the EMPLOYEE. The problems with this
design are the same as those caused by violations of second normal form:

 The department's location is repeated in the record of every employee assigned to that
department.
 If the location of the department changes, every such record must be updated.
 Because of the redundancy, the data might become inconsistent, with different records showing
different locations for the same department.
 If a department has no employees, there may be no record in which to keep the department's
location.

To satisfy third normal form, the record shown above should be decomposed into the two records:

------------------------- -------------------------
| EMPLOYEE | DEPARTMENT | | DEPARTMENT | LOCATION |
============------------- ==============-----------

To summarize, a record is in second and third normal forms if every field is either part of the key or
provides a (single-valued) fact about exactly the whole key and nothing else.

Functional Dependencies

In relational database theory, second and third normal forms are defined in terms of functional
dependencies, which correspond approximately to our single-valued facts. A field Y is "functionally
dependent" on a field (or fields) X if it is invalid to have two records with the same X-value but different Y-
values. That is, a given X-value must always occur with the same Y-value. When X is a key, then all fields
are by definition functionally dependent on X in a trivial way, since there can't be two records having the
same X value.

There is a slight technical difference between functional dependencies and single-valued facts as we
have presented them. Functional dependencies only exist when the things involved have unique and
singular identifiers (representations). For example, suppose a person's address is a single-valued fact,
i.e., a person has only one address. If we don't provide unique identifiers for people, then there will not be
a functional dependency in the data:
----------------------------------------------
| PERSON | ADDRESS |
-------------+--------------------------------
| John Smith | 123 Main St., New York |
| John Smith | 321 Center St., San Francisco |
----------------------------------------------

Although each person has a unique address, a given name can appear with several different addresses.
Hence we do not have a functional dependency corresponding to our single-valued fact.

Similarly, the address has to be spelled identically in each occurrence in order to have a functional
dependency. In the following case the same person appears to be living at two different addresses, again
precluding a functional dependency.

---------------------------------------
| PERSON | ADDRESS |
-------------+-------------------------
| John Smith | 123 Main St., New York |
| John Smith | 123 Main Street, NYC |
---------------------------------------

We are not defending the use of non-unique or non-singular representations. Such practices often lead to
data maintenance problems of their own. We do wish to point out, however, that functional dependencies
and the various normal forms are really only defined for situations in which there are unique and singular
identifiers. Thus the design guidelines as we present them are a bit stronger than those implied by the
formal definitions of the normal forms.

For instance, we as designers know that in the following example there is a single-valued fact about a
non-key field, and hence the design is susceptible to all the update anomalies mentioned earlier.

------------------------------------------------------
| EMPLOYEE | FATHER | FATHER'S-ADDRESS
|
|============------------+----------------------------
| Art Smith | John Smith | 123 Main St., New York
|
| Bob Smith | John Smith | 123 Main Street, NYC
|
| Cal Smith | John Smith | 321 Center St., San Francisco
|
------------------------------------------------------

However, in formal terms, there is no functional dependency here between FATHER'S-ADDRESS and
FATHER, and hence no violation of third normal form.

FOURTH AND FIFTH NORMAL FORMS

Fourth and fifth normal forms deal with multi-valued facts. The multi-valued fact may correspond to a
many-to-many relationship, as with employees and skills, or to a many-to-one relationship, as with the
children of an employee (assuming only one parent is an employee). By "many-to-many" we mean that an
employee may have several skills, and a skill may belong to several employees.

Note that we look at the many-to-one relationship between children and fathers as a single-valued fact
about a child but a multi-valued fact about a father.

In a sense, fourth and fifth normal forms are also about composite keys. These normal forms attempt to
minimize the number of fields involved in a composite key, as suggested by the examples to follow.

Fourth Normal Form

Under fourth normal form, a record type should not contain two or more independent multi-valued facts
about an entity. In addition, the record must satisfy third normal form.

The term "independent" will be discussed after considering an example.

Consider employees, skills, and languages, where an employee may have several skills and several
languages. We have here two many-to-many relationships, one between employees and skills, and one
between employees and languages. Under fourth normal form, these two relationships should not be
represented in a single record such as
-------------------------------
| EMPLOYEE | SKILL | LANGUAGE |
===============================

Instead, they should be represented in the two records

-------------------- -----------------------
| EMPLOYEE | SKILL | | EMPLOYEE | LANGUAGE |
==================== =======================

Note that other fields, not involving multi-valued facts, are permitted to occur in the record, as in the case
of the QUANTITY field in the earlier PART/WAREHOUSE example.

The main problem with violating fourth normal form is that it leads to uncertainties in the maintenance
policies. Several policies are possible for maintaining two independent multi-valued facts in one record:

(1) A disjoint format, in which a record contains either a skill or a language, but not both:

-------------------------------
| EMPLOYEE | SKILL | LANGUAGE |
|----------+-------+----------|
| Smith | cook | |
| Smith | type | |
| Smith | | French |
| Smith | | German |
| Smith | | Greek |
-------------------------------

This is not much different from maintaining two separate record types. (We note in passing that such a
format also leads to ambiguities regarding the meanings of blank fields. A blank SKILL could mean the
person has no skill, or the field is not applicable to this employee, or the data is unknown, or, as in this
case, the data may be found in another record.)

(2) A random mix, with three variations:

(a) Minimal number of records, with repetitions:


-------------------------------
| EMPLOYEE | SKILL | LANGUAGE |
|----------+-------+----------|
| Smith | cook | French |
| Smith | type | German |
| Smith | type | Greek |
-------------------------------

(b) Minimal number of records, with null values:

-------------------------------
| EMPLOYEE | SKILL | LANGUAGE |
|----------+-------+----------|
| Smith | cook | French |
| Smith | type | German |
| Smith | | Greek |
-------------------------------

(c) Unrestricted:

-------------------------------
| EMPLOYEE | SKILL | LANGUAGE |
|----------+-------+----------|
| Smith | cook | French |
| Smith | type | |
| Smith | | German |
| Smith | type | Greek |
-------------------------------

(3) A "cross-product" form, where for each employee, there must be a record for every possible pairing of
one of his skills with one of his languages:

-------------------------------
| EMPLOYEE | SKILL | LANGUAGE |
|----------+-------+----------|
| Smith | cook | French |
| Smith | cook | German |
| Smith | cook | Greek |
| Smith | type | French |
| Smith | type | German |
| Smith | type | Greek |
-------------------------------

Other problems caused by violating fourth normal form are similar in spirit to those mentioned earlier for
violations of second or third normal form. They take different variations depending on the chosen
maintenance policy:

 If there are repetitions, then updates have to be done in multiple records, and they could become
inconsistent.
 Insertion of a new skill may involve looking for a record with a blank skill, or inserting a new
record with a possibly blank language, or inserting multiple records pairing the new skill with
some or all of the languages.
 Deletion of a skill may involve blanking out the skill field in one or more records (perhaps with a
check that this doesn't leave two records with the same language and a blank skill), or deleting
one or more records, coupled with a check that the last mention of some language hasn't also
been deleted.

Fourth normal form minimizes such update problems.

1 Independence

We mentioned independent multi-valued facts earlier, and we now illustrate what we mean in terms of the
example. The two many-to-many relationships, employee: skill and employee: language, are
"independent" in that there is no direct connection between skills and languages. There is only an indirect
connection because they belong to some common employee. That is, it does not matter which skill is
paired with which language in a record; the pairing does not convey any information. That's precisely why
all the maintenance policies mentioned earlier can be allowed.

In contrast, suppose that an employee could only exercise certain skills in certain languages. Perhaps
Smith can cook French cuisine only, but can type in French, German, and Greek. Then the pairings of
skills and languages becomes meaningful, and there is no longer an ambiguity of maintenance policies. In
the present case, only the following form is correct:

-------------------------------
| EMPLOYEE | SKILL | LANGUAGE |
|----------+-------+----------|
| Smith | cook | French |
| Smith | type | French |
| Smith | type | German |
| Smith | type | Greek |
-------------------------------

Thus the employee: skill and employee: language relationships are no longer independent. These records
do not violate fourth normal form. When there is interdependence among the relationships, then it is
acceptable to represent them in a single record.

2 Multivalued Dependencies

For readers interested in pursuing the technical background of fourth normal form a bit further, we
mention that fourth normal form is defined in terms of multivalued dependencies, which correspond to our
independent multi-valued facts. Multivalued dependencies, in turn, are defined essentially as relationships
which accept the "cross-product" maintenance policy mentioned above. That is, for our example, every
one of an employee's skills must appear paired with every one of his languages. It may or may not be
obvious to the reader that this is equivalent to our notion of independence: since every possible pairing
must be present, there is no "information" in the pairings. Such pairings convey information only if some
of them can be absent, that is, only if it is possible that some employee cannot perform some skill in some
language. If all pairings are always present, then the relationships are really independent.

We should also point out that multivalued dependencies and fourth normal form apply as well to
relationships involving more than two fields. For example, suppose we extend the earlier example to
include projects, in the following sense:

 An employee uses certain skills on certain projects.


 An employee uses certain languages on certain projects.
If there is no direct connection between the skills and languages that an employee uses on a project, then
we could treat this as two independent many-to-many relationships of the form EP:S and EP:L, where
"EP" represents a combination of an employee with a project. A record including employee, project, skill,
and language would violate fourth normal form. Two records, containing fields E,P,S and E,P,L,
respectively, would satisfy fourth normal form.

Fifth Normal Form

Fifth normal form deals with cases where information can be reconstructed from smaller pieces of
information that can be maintained with less redundancy. Second, third, and fourth normal forms also
serve this purpose, but fifth normal form generalizes to cases not covered by the others.

We will not attempt a comprehensive exposition of fifth normal form, but illustrate the central concept with
a commonly used example, namely one involving agents, companies, and products. If agents represent
companies, companies make products, and agents sell products, then we might want to keep a record of
which agent sells which product for which company. This information could be kept in one record type
with three fields:

-----------------------------
| AGENT | COMPANY | PRODUCT |
|-------+---------+---------|
| Smith | Ford | car |
| Smith | GM | truck |
-----------------------------

This form is necessary in the general case. For example, although agent Smith sells cars made by Ford
and trucks made by GM, he does not sell Ford trucks or GM cars. Thus we need the combination of three
fields to know which combinations are valid and which are not.

But suppose that a certain rule was in effect: if an agent sells a certain product, and he represents a
company making that product, then he sells that product for that company.

-----------------------------
| AGENT | COMPANY | PRODUCT |
|-------+---------+---------|
| Smith | Ford | car |
| Smith | Ford | truck |
| Smith | GM | car |
| Smith | GM | truck |
| Jones | Ford | car |
-----------------------------

In this case, it turns out that we can reconstruct all the true facts from a normalized form consisting of
three separate record types, each containing two fields:

------------------- --------------------- -------------------

| AGENT | COMPANY | | COMPANY | PRODUCT | | AGENT | PRODUCT |

|-------+---------| |---------+---------| |------+---------|

| Smith | Ford | | Ford | car | | Smith | car |

| Smith | GM | | Ford | truck | | Smith | truck |

| Jones | Ford | | GM | car | | Jones | car |

------------------- | GM | truck | -------------------

---------------------

These three record types are in fifth normal form, whereas the corresponding three-field record shown
previously is not.

Roughly speaking, we may say that a record type is in fifth normal form when its information content
cannot be reconstructed from several smaller record types, i.e., from record types each having fewer
fields than the original record. The case where all the smaller records have the same key is excluded. If a
record type can only be decomposed into smaller records which all have the same key, then the record
type is considered to be in fifth normal form without decomposition. A record type in fifth normal form is
also in fourth, third, second, and first normal forms.

Fifth normal form does not differ from fourth normal form unless there exists a symmetric constraint such
as the rule about agents, companies, and products. In the absence of such a constraint, a record type in
fourth normal form is always in fifth normal form.
One advantage of fifth normal form is that certain redundancies can be eliminated. In the normalized
form, the fact that Smith sells cars is recorded only once; in the un-normalized form it may be repeated
many times.

It should be observed that although the normalized form involves more record types, there may be fewer
total record occurrences. This is not apparent when there are only a few facts to record, as in the example
shown above. The advantage is realized as more facts are recorded, since the size of the normalized files
increases in an additive fashion, while the size of the un-normalized file increases in a multiplicative
fashion. For example, if we add a new agent who sells x products for y companies, where each of these
companies makes each of these products, we have to add x+y new records to the normalized form, but
xy new records to the un-normalized form.

It should be noted that all three record types are required in the normalized form in order to reconstruct
the same information. From the first two record types shown above we learn that Jones represents Ford
and that Ford makes trucks. But we can't determine whether Jones sells Ford trucks until we look at the
third record type to determine whether Jones sells trucks at all.

The following example illustrates a case in which the rule about agents, companies, and products is
satisfied, and which clearly requires all three record types in the normalized form. Any two of the record
types taken alone will imply something untrue.

-----------------------------
| AGENT | COMPANY | PRODUCT |
|-------+---------+---------|
| Smith | Ford | car |
| Smith | Ford | truck |
| Smith | GM | car |
| Smith | GM | truck |
| Jones | Ford | car |
| Jones | Ford | truck |
| Brown | Ford | car |
| Brown | GM | car |
| Brown | Toyota | car |
| Brown | Toyota | bus |
-----------------------------
------------------- --------------------- -------------------

| AGENT | COMPANY | | COMPANY | PRODUCT | | AGENT | PRODUCT |

|-------+---------| |---------+---------| |-------+---------|

| Smith | Ford | | Ford | car | | Smith | car | Fifth

| Smith | GM | | Ford | truck | | Smith | truck | Normal

| Jones | Ford | | GM | car | | Jones | car | Form

| Brown | Ford | | GM | truck | | Jones | truck |

| Brown | GM | | Toyota | car | | Brown | car |

| Brown | Toyota | | Toyota | bus | | Brown | bus |

------------------- --------------------- -------------------

Observe that:

 Jones sells cars and GM makes cars, but Jones does not represent GM.
 Brown represents Ford and Ford makes trucks, but Brown does not sell trucks.
 Brown represents Ford and Brown sells buses, but Ford does not make buses.

Fourth and fifth normal forms both deal with combinations of multivalued facts. One difference is that the
facts dealt with under fifth normal form are not independent, in the sense discussed earlier. Another
difference is that, although fourth normal form can deal with more than two multivalued facts, it only
recognizes them in pair wise groups. We can best explain this in terms of the normalization process
implied by fourth normal form. If a record violates fourth normal form, the associated normalization
process decomposes it into two records, each containing fewer fields than the original record. Any of this
violating fourth normal form is again decomposed into two records, and so on until the resulting records
are all in fourth normal form. At each stage, the set of records after decomposition contains exactly the
same information as the set of records before decomposition.

In the present example, no pairwise decomposition is possible. There is no combination of two smaller
records which contains the same total information as the original record. All three of the smaller records
are needed. Hence an information-preserving pairwise decomposition is not possible, and the original
record is not in violation of fourth normal form. Fifth normal form is needed in order to deal with the
redundancies in this case.

UNAVOIDABLE REDUNDANCIES
Normalization certainly doesn't remove all redundancies. Certain redundancies seem to be unavoidable,
particularly when several multivalued facts are dependent rather than independent. In the example
shown, it seems unavoidable that we record the fact that "Smith can type" several times. Also, when the
rule about agents, companies, and products is not in effect, it seems unavoidable that we record the fact
that "Smith sells cars" several times.

INTER-RECORD REDUNDANCY

The normal forms discussed here deal only with redundancies occurring within a single record type. Fifth
normal form is considered to be the "ultimate" normal form with respect to such redundancies.

Other redundancies can occur across multiple record types. For the example concerning employees,
departments, and locations, the following records are in third normal form in spite of the obvious
redundancy:

------------------------- -------------------------
| EMPLOYEE | DEPARTMENT | | DEPARTMENT | LOCATION |
============------------- ==============-----------
-----------------------
| EMPLOYEE | LOCATION |
============-----------

In fact, two copies of the same record type would constitute the ultimate in this kind of undetected
redundancy.

Inter-record redundancy has been recognized for some time , and has recently been addressed in terms
of normal forms and normalization .

CONCLUSION

While we have tried to present the normal forms in a simple and understandable way, we are by no
means suggesting that the data design process is correspondingly simple. The design process involves
many complexities which are quite beyond the scope of this paper. In the first place, an initial set of data
elements and records has to be developed, as candidates for normalization. Then the factors affecting
normalization have to be assessed:
 Single-valued vs. multi-valued facts.
 Dependency on the entire key.
 Independent vs. dependent facts.
 The presence of mutual constraints.
 The presence of non-unique or non-singular representations.

And, finally, the desirability of normalization has to be assessed, in terms of its performance impact on
retrieval applications.

RDBMS & The Oracle Product Line


As the world's leading vendor of relational database software, Oracle Corporation supports its flagship
product, the Oracle RDBMS, on more than 90 platforms. The release of Personal Oracle for Windows
extends Oracle's reach to the most popular desktop operating system; Microsoft Windows.

The Oracle RDBMS is available in the following three configurations:

 The Oracle Universal Server can support many users on highly scaleable


platforms such as Sun, HP, Pyramid, and Sequent. The various options that are
available with this configuration include the distributed option, by which several
Oracle databases on separate computers can function as a single logical
database. The Oracle Enterprise Server is available for a wide variety of
operating systems and hardware configurations. Oracle Web Server ;an
integrated system for dynamically generating HTML output from the content of an
Oracle database; is also included with the Universal Server.

 The Oracle Workgroup Server is designed for workgroups and is available on


NetWare, Windows NT, SCO UNIX, and UnixWare. The Oracle Workgroup
Server is a cost-effective and low-maintenance solution for supporting small
groups of users. Oracle Web Server is also available as an option with the Oracle
Workgroup Server.

 Personal Oracle is a Windows-based version of the Oracle database engine that


offers the same functionality that exists in the Oracle Universal Server and the
Oracle Workgroup Server. Even though Personal Oracle cannot function as a
database server by supporting multiple users, it still provides an excellent
environment for experimentation and prototyping.

We are discussed here generally about personal oracle As in case of relational


database management system

Oracle with Structured Query Language (SQL)

You communicate with Personal Oracle through Oracle's version of the Structured Query Language
(SQL, usually pronounced sequel). SQL is a nonprocedural language; unlike C or COBOL in which you
must describe exactly how to access and manipulate data, SQL specifies what to do. Internally, Oracle
determines how to perform the request. SQL exists as an ANSI standard as well as an industry standard.
Oracle's implementation of SQL adheres to Level 2 of the ANSI X3.135-1989/ISO 9075-1989 standard
with full implementation of the Integrity Enhancement Feature. Oracle (as well as other database vendors)
provides many extensions to ANSI SQL.

In addition, Oracle's implementation of SQL adheres to the U.S. government standard as described in the


Federal Information Processing Standard Publication (FIPS PUB) 127, entitled Database Language SQL.

Oracle Data Types

The Oracle DBMS provides a number of different data types for the storage of the
different forms of data in a manner that is most suitable to the manipulations that are
likely to be performed. As part of the Data Modeling process it is important that the most
appropriate data types are identified. Where there is any doubt (e.g. storing a year as a
number or as a date), the data type chosen should reflect the way the data will be used
most often. It is possible to inter-convert between data types, but this will reduce the
efficiency of queries. The use of these data types in table design is "Manipulating Oracle
Tables".

The available Oracle data types the most widely used are these:

 VARCHAR2

 CHAR
 NUMBER

 DATE

 Long

VARCHAR2

This data type should be used for any columns which may contain characters. This
includes alphabetic letters, together with _, -, !, ?, + etc. and numbers as a character
representation (i.e. they look like numbers when displayed, but they cannot be
subjected to arithmetic manipulation. Similarly an attempt to do a greater than (>), or
less than (<) operation will not have the correct arithmetic result- VARCHAR2 values
are compared character by character up to the first character that differs. Whichever
value has the "greater" character in that position is considered "greater". Characters are
compared via the ASCII character set with the largest ASCII value (i.e. 255) is
considered greatest.) VARCHAR2columns may be up to 2000 characters wide. 

CHAR

This data type is now effectively replaced by VARCHAR2 - CHAR should no longer be
used, although is still recognized and valid. The CHAR data type is fixed length up to a
maximum of 255 characters, whereas VARCHAR2 is of variable length up to 2000
characters. The variable length of VARCHAR2 gives it significant storage and
performance advantages over CHAR.

NUMBER

Any kind of number; positive, negative, integer or real. Up to 22 digits may be entered.
For comparison, a larger value is considered greater than a smaller value, with all
negative values smaller than all positive values. 
DATE

A special data type with some of the characteristics of both Character and Number
Data-types. It is used for the storage of date and timeinformation. The operators =, > or
<>

WHERE HIREDATE = '01-DEC-95'

the date value, on the right, must appear in quotes. For comparison, a later date is
considered greater than an earlier date. Oracle dates must lie in the range 1st January,
4712 BC to 31st December, 4712 AD.

The standard Oracle date format is:

 Digit Day followed by

 Letter Month and

 Digit Year

All should be separated by hyphens e.g. '01-JAN-95'

LONG

Can hold strings of characters up to 2 gigabytes long. Only one column of LONG data


type can be defined per table. Binary data can only be inserted into these using the
programming language interface (an example of binary data storage in a LONG data
type would be storing a picture in an oracle table).

Importantly, there are significant restrictions on the use of character information stored in LONG data
types. While the INSERT and UPDATEstatements can be used to insert and modify LONG data types,
and the SELECT statement to retrieve this data, no functions can be used to manipulate a retrieved a
LONG data type and a LONG column can never be used in a WHERE clause.
When retrieving LONG data types the display length of the column is set to 80 characters, and anything
stored in the column beyond this will be truncated. The following command is required to extend the
display length of a LONG field, within SQL*Plus:

SQL> SET LONG xxxx

The value xxxx should be replaced with a number representing thelongest value likely to


be stored in the LONG column.

Above are the general use data types in oracle, after similar with the data Types of oracle the next step in
sense of RDBMS is that to clear knowledge of Creating the tables In Oracle

DBMS Vs RDBMS

The characteristic that differentiates a DBMS from an RDBMS is that the RDBMS provides a set-oriented
database language. For most RDBMSs, this set-oriented database language is SQL. Set oriented means
that SQL processes sets of data in groups. Two standards organizations, the American National
Standards Institute (ANSI) and the International Standards Organization (ISO), currently promote SQL
standards to industry. The ANSI-92 standard is the standard for the SQL used throughout this book.
Although these standard-making bodies prepare standards for database system designers to follow, all
database products differ from the ANSI standard to some degree. In addition, most systems provide some
proprietary extensions to SQL that extend the language into a true procedural language.

Problems with RDBMS

Starting in the late 1980s, several deficiencies in relational DBMS products began receiving a lot of
attention. The first deficiency is that the dominant relational language, SQL-92, is limiting in several
important respects. For instance, SQL-92 supports a restricted set of built-in types that accommodate
only numbers and strings, but many database applications began to include deal with complex objects
such as geographic points, text, and digital signal data. A related problem concerns how this data is used.
Conceptually simple questions involving complex data structures turn into lengthy SQL-92 queries.

The second deficiency is that the relational model suffers from certain structural


shortcomings. Relational tables are flat and do not provide good support for nested
structures, such as sets and arrays. Also, certain kinds of relationships, such as sub
typing, between database objects are hard to represent in the model. (Subtyping occurs
when we say that one kind of thing, such as a SalesPerson, is a subtype of another kid
of thing, such as an Employee.) SQL-92 supports only independent tables of rows and
columns. The third deficiency is that RDBMS technology did not take advantage of
object-oriented (OO) approaches to software engineering which have gained
widespread acceptance in industry. OO techniques reduce development costs and
improve information system quality by adopting an object-centric view of software
development. This involves integrating the data and behavior of a real-world entity into a
single software module or component. A complex data structure or algorithmically
sophisticated operation can be hidden behind a set of interfaces. This allows another
programmer to make use of the complex functionality without having to understand how
it is implemented. The relational model did a pretty good job handling most information
management problems. But for an emerging class of problems RDBMS technology
could be improved upon.

Object-Oriented DBMS
Object-Oriented Database Management Systems (OODBMS) are an extension of OO
programming language techniques into the field of persistent data management. For
many applications, the performance, flexibility, and development cost of OODBMS are
significantly better than RDBMS or ORDBMS. The chief advantage of OODBMS lies in
the way they can achieve a tighter integration between OO languages and the DBMS.
Indeed, the main standards body for OODBMS, the Object Database Management
Group (ODMG) defines an OODBMS as a system that integrates database capabilities
with object-oriented programming language capabilities. The idea behind this is that so
far as an application developer is concerned, it would be useful to ignore not only
questions of how an object is implemented behind its interface, but also how it is stored
and retrieved.

All developers have to do is implement their application using their favorite OO


programming language, such as C++, Smalltalk, or Java, and the OODBMS takes care
of data caching, concurrency control, and disk storage. In addition to this seamless
integration, OODBMS possess a number of interesting and useful features derived
mainly from the object model. In order to solve the finite type system problem that
constrains SQL-92, most OODBMS feature an extensible type system. Using this
technique, an OODBMS can take the complex objects that are part of the application
and store them directly. An OODBMS can be used to invoke methods on these objects,
either through a direct call to the object or through a query interface. And finally, many
of the structural deficiencies in SQL-92 are overcome by the use of OO ideas such as
inheritance and allowing sets and arrays. OODBMS products saw a surge of academic
and commercial interest in the early 1990s, and today the annual global market for
OODBMS products runs at about $50 million per year. In many application domains,
most notably computer-aided design or manufacturing (CAD/CAM), the expense of
building a monolithic system to manage both database and application is balanced by
the kinds of performance such systems deliver.

Problems with OODBMS

Regrettably, much of the considerable energy of the OODBMS community has been
expended relearning the lessons of twenty years ago. First, OODBMS vendors have
rediscovered the difficulties of tying database design too closely to application design.
Maintaining and evolving an OODBMS-based information system is an arduous
undertaking. Second, they relearned that declarative languages such as SQL-92 bring
such tremendous productivity gains that organizations will pay for the additional
computational resources they require. You can always buy hardware, but not time.
Third, they re-discovered the fact that a lack of a standard data model leads to design
errors and inconsistencies. In spite of these shortcomings OODBMS technology
provides effective solutions to a range of data management problems. Many ideas
pioneered by OODBMS have proven themselves to be very useful and are also found in
ORDBMS. Object-relational systems include features such as complex object
extensibility, encapsulation, inheritance, and better interfaces to OO languages.

ORDBMS
(Object Relational Database Management System)
ORDBMS Evolution from RDBMS
One of the most popular sort algorithms is a called insertion sort. Insertion sort is an O
(N2) algorithm. Roughly speaking, the time it takes to sort an array of records increases
with the square of the number of records involved, so it should only be used for record
sets containing less than about 25 rows. But insertion sort is extremely efficient when
the input is almost in sorted order, such as when you are sorting the result of an index
scan. List presents an implementation of the basic insertion sort algorithm for an array
of integers.

InsertSort( integer arTypeInput[] )

{ integer nTypeTemp;

integer InSizeArray, nOuter, nInner;

nSizeArray = arTypeInput[].getSize();

for ( nOuter = 2; nOuter <= nSizeArray; nOuter ++)

vTypeTemp = arTypeInput[nOuter];

nInner = nOuter - 1;

while (( nInner > 0 ) &&

(arTypeInput[nInner] > vTypeTemp)) {

arTypeInput[nInner+1] =

arTypeInput[nInner];

nInner--;

arTypeInput[nInner+1] = vTypeTemp;
}}

Sorting algorithms such as this can be generalized to make them work with any data
type. A generalized version of this insert sort algorithm appears in list, as shown above.
All this algorithm requires is logic to compare two type instances. If one value is greater,
the algorithm swaps the two values. In the generalized version of the algorithm,
a function pointer is passed into the sort as an argument. A function pointer is simply a
reference to a memory address where the Compare() function’s actual implementation
can be found. At the critical point, when the algorithm decides whether to swap two
values, it passes the data to this function and makes its decision based on the function’s
return result.

InsertSort( Type arTypeInput[],

(int) Compare( Type, Type) )

Type vTypeTemp;

integer InSizeArray, nOuter, nInner;

nSizeArray = arTypeInput[].getSize();

for ( nOuter = 2; nOuter <= nSizeArray; nOuter ++)

vTypeTemp = arTypeInput[nOuter];

nInner = nOuter - 1;

while (( nInner > 0 ) &&

Compare(arTypeInput[nInner],vTypeTemp) > 0 )

{
Swap(arTypeInput[nInner+1],arTypeInput[nInner]);

nInner--;

Swap(arTypeInput[nInner+1], vTypeTemp);

}}

Note how the functionality of the Swap() operation is something that can be handled by
IDS without requiring a type specific routine. The ORDBMS knows how big the object
being swapped is, and whether the object is in memory or is written out to disk. To use
the generalized algorithm in the List, all that IDS needs is the Compare() logic. The
ORDBMS handles the looping, branching, and exception handling. Almost all sorting
algorithms involve looping and branching around a Compare(), as does B-Tree indexing
and aggregate algorithms such as MIN() and MAX(). Part of the process of extending
the ORDBMS framework with new data types involves creating functions such as
Compare() that IDS can use to manage instances of the type. All data management
operations implemented in the ORDBMS is generalized in this way. In an RDBMS, the
number of data types was small so that the Compare() routines for each could be hard-
coded within the engine.

Sort( void ** parRecords, IFX_TYPE * pRecKey ... )

switch(pRecKey->typenum)

case IFX_INT_TYPE:

case IFX_CHAR_TYPE:

InsertSort(parRecords, IntCompare);
break;

case IFX_FLOAT_TYPE:

InsertSort(parRecords, DoubleCompare);

break;

// etc for each SQL-92 type

default:

ifx_internal_raiseerror('No Such Type');

break;

}}

To turn an RDBMS into an ORDBMS, you need to modify the code shown in List to
allow the engine to access Compare() routines other than the ones it has built-in. If the
data type passed as the second argument is one of the SQL-92 types, the sorting
function proceeds as it did before. But if the data type is not one of the SQL-92 types,
the ORDBMS assumes it is being asked to sort an extended type Every user-defined
function embedded within the ORDBMS is recorded in its system catalogs, which are
tables that the DBMS uses to store information about databases. When asked to sort a
pile of records using a user-defined type, the ORDBMS looks in these system catalogs
to find a user-defined function called compare that takes two instances of the type and
returns an INTEGER. If such a function exists, the ORDBMS uses it in place of a built-in
logic. If the function is not found, IDS generates an exception. The listing below shows
the modified sorting facility.

Sort( void ** parRecords, IFX_TYPE * pRecKey . . . )

switch(pRecKey->typenum)
{

case IFX_INT_TYPE:

case IFX_CHAR_TYPE:

InsertSort(parRecords, IntCompare);

break;

case IFX_FLOAT_TYPE:

InsertSort (parRecords, DoubleCompare);

break;

// etc for each SQL-92 type

..

default: // ah! Not SQL-92. Must be user-defined.

if( (pUDRCompare=udr_lookup(pRecKey->typenum,

"Compare")) ==NULL)

ifx_internal_error("No Such Function.');

else

InsertSort (parRecords, pUDRCompare );

break;

}}

Using the ORDBMS generalization of the sort utility has several implications:
 When you take a database application developed to use the previous generation
of RDBMS technology and upgrade to an ORDBMS, you should see no changes
at all. The size of the ORDBMS executable is slightly increased, and there are
some new directories and install options, but if all you do is to run the RDBMS
application on the ORDBMS, the new code and facilities are never invoked.

 This scheme implies extensive modifications to the RDBMS code. You not only
need to find every place in the engine that such modifications are necessary, but
also need to provide the infrastructure in the engine to allow the extensions to
execute within it.

 Finally, you should know how general such an extensibility mechanism is. As long
as you can specify the structure of your data type, and define an algorithm to
compare one instance of the type with another, you can use the engine’s facilities
to sort or index instances of the type and you can use OR-SQL to specify how
you want this done. In practice, renovating an RDBMS in this manner is an
incredibly complex and demanding undertaking.

ORDBS or Object Relational Database Management System is already defined. Now


we are looking forward to its features which are giving different facilities to their users.

Features of ORDBMS

ORDBMS synthesize the features of RDBMS with the best ideas of OODBMS.

 Although ORDBMS reuse the relational model as SQL interprets it, the OR data
model is opened up in novel ways. New data types and functions can be
implemented using general-purpose languages such as C and Java. In other
words, ORDBMS allow developers to embed new classes of data objects into the
relational data model abstraction.

 ORDBMS schema has additional features not present in RDBMS schema.


Several OO structural features such as inheritance and polymorphism are part of
the OR data model.
 ORDBMS adopt the RDBMS query-centric approach to data management. All
data access in an ORDBMS is handled with declarative SQL statements. There
is no procedural, or object-at-a-time, navigational interface. ORDBMS persist
with the idea of a data language that is fundamentally declarative, and therefore
mismatched with procedural OO host languages. This significantly affects the
internal design of the ORDBMS engine, and it has profound implications for the
interfaces developers use to access the database.

 From a systems architecture point of view, ORDBMS are implemented as a


central server program rather than as distributed data architectures typical in
OODBMS products. However, ORDBMS extend the functionality of DBMS
products significantly, and an information system using an ORDBMS can be
deployed over multiple machines. To see what this looks such as, consider again
the Employees example. Using an ORDBMS, you would represent this data as
shown below.

CREATE TABLE Employees (

Name PersonName NOT NULL,

DOB DATE NOT NULL,

Salary Currency NOT NULL,

Address StreetAddress NOT NULL

PRIMARY KEY ( Name, DOB )

);

An object-relational table is structurally very similar to its relational counterpart and the
same data integrity and physical organization rules can be enforced over it. The
difference between object-relational and relational tables can be seen in the section
stipulating column types. In the object-relational table, readers familiar with RDBMS
should recognize the DATE type, but the other column types are completely new. From
an object-oriented point of view, these types correspond to class names, which are
software modules that encapsulate state and (as we shall see) behavior.

INSERT INTO SendPage

( Pager_Number, Pass_Code, Message )

SELECT E.Pager_Number,

E.Pass_Code,

Print(E.Name) ||

': Call 1-800-TEMPS-R-US for immediate INFORMIX DBA

job'

FROM Temporary_Employees E

WHERE Contains (GeoCircle('(-122.514, 37.221)',

'60 miles')),

E.LivesAt )

AND DocContains ( E.Resume,

'INFORMIX and Database

Administrator')

AND NOT IsBooked ( Period(TODAY, TODAY + 7),

E.Booked );

In a SQL-92 DBMS, SendPage could be only a table. The effect of this query would
then be to insert some rows into the SendPage table. However, in an ORDBMS,
SendPage might actually be an active table, which is an interface to the
communications system used to send electronic messages. The effect of this query
would then be to communicate with the matching temporary workers!

Motivation

Relational model (70’s): Clean and simple.

 Great for administrative data.

 Not as good for other kinds of data (e. g., multimedia, networks, CAD).

Object- Oriented models (80’s): Complicated, but some influential ideas.

 Complex data types.

 Object identity/ references.

 ADTs (encapsulation, behavior goes with data).

 Inheritance.

Idea: Build DBMS based on OO model.

Object- Relational

Idea: Add OO features to the type system of

SQL. I. e. “plain old SQL”, but...

 Columns can be of new types (ADTs)

 User- defined methods on ADTs

 Columns can be of complex types

 Reference types and “deref”

 Inheritance and collection inheritance


 Old SQL schemas still work! (Backwards compatibility)

Complex Types

User can use type constructors to generate new types:

 Setof ( foo)

 arrayof( foo)

 listof( foo)

 Row (n1 t1... nk tk)

Can be nested:

 Setof (arrayof (int))

ADTs: User- Defined Atomic Types

Built- in SQL types (int, float, text, etc.) are limited.

 Even these types have simple methods associated with them (math, LIKE, etc.)

ORDBMS: User can define new atomic types (& methods) if a type cannot be naturally
defined in terms of the built- in types:

create type jpeg (internallength = variable,

input = jpeg_ in, output = jpeg_ out);

Need input & output methods for types. e. g., Convert from text to internal type and
back.

Reference Types & Deref.

In most ORDBMS, every object has an OID. So, can “point” to objects --reference types!
 ref ( theater_ t)

Don’t confuse reference and complex types!

 mytheater row( tno integer, name text, address text, phone integer)

 Theaterref( theater_ t)

Both look same at output, but are different!!

 Deletion, update, “sharing”

 Similar to “by value” vs. “by reference” in PL

Modifications to support ORDBMS

Parsing

 Type- checking for methods pretty complex.

Query Rewriting

 Often useful to turn path expression into joins!

 UnionsCollection hierarchy’

Optimization

 New algebra operators needed for complex types.

o Must know how to integrate them into optimization.

 WHERE clause expression can be expensive!

ORDBMS Advantages

ORDBMS technology improves upon what came before it in three ways. The first
improvement is that can enhance a system’s overallperformance. The IDS product can
achieve higher levels of data throughput or better response times than is possible using
RDBMS technology and external logic because ORDBMS extensibility makes it possible
to move logic to where the data resides, instead of always moving data to where the
logic is running. This effect is particularly pronounced for data intensive applications
such as decision support systems and in situations in which the objects in your
application are large, such as digital signal or time series data. But in general, any
application can benefit. The second improvement relates to the way that integrating
functional aspects of the information system within the framework provided by the
ORDBMS improves the flexibility of the overall system. Multiple unrelated object
definitions can be combined within a single ORDBMS database. At runtime, they can be
mingled within a query expression created to answer some high-level question. Such
flexibility is very important because it reduces the costs associated with information
system development and ongoing maintenance. The third benefit of an ORDBMS
relates to the way information systems are built and managed. An ORDBMS system
catalogs become a metadata repository that records information about the modules of
programming logic integrated into the ORDBMS. Over time, as new functionality is
added to the application and as the programming staff changes, the system’s catalogs
can be used to determine the extent of the current system’s functionality and how it all
fits together. The fourth benefit is that the IDS product’s features make it possible to
incorporate into the database data sets that are stored outside it. This allows you to
build federated databases. From within single servers, you can access data distributed
over multiple places.

Flexibility and Functionality

Explaining flexibility is more difficult because the advantages of flexibility are harder to
quantify. But it is probably more valuable than performance over the long run. To
understand why, let’s continue with the financial company example.

As it turned out, the more profound result of the integration effort undertaken by our
financial firm was that the CalcValue() operation was liberated from its procedural
setting. Before, developers had to write and compile (and debug) a procedural C
program every time they wanted to use CalcValue() to answer a business question.
With the ORDBMS, they found that they could simply write a new OR-SQL query
instead. For example, the query in which is Listed below is showing a join involving a
table that, while in the database, was beyond the scope of the original (portfolio)
problem. “What is the SUM() of the values of instruments in our portfolio grouped by region
and sector?”

SELECT IS.Country,

IS.Region,

SUM (CalcValue ( I.Issuer_Class, I.Principal, I.Issue_Date, I.Rate ))

FROM Instruments I, Issuers IS

WHERE I.Portfolio = :Portfolio_Argument

AND I.Issuer = IS.Name

GROUP BY IS.Country, IS.Region;

With the addition of a small Web front end, end users could use the database to find out
which the most valuable instrument in their portfolio was or which issuer’s instruments
performed most poorly, and so on. It was known that the system had the data
necessary to answer all these questions before. The problem was that the cost of
answering them using C and SQL-92 was simply too high.

Maintainability

After some investigation, the developers discovered that, over time, there had been
several versions of the CalcValue() function. Also, once the CalcValue() algorithm was
explained to end users, they had suggestions that might improve it. With the previous
system, such speculative changes were extremely difficult to implement. They required
a recode-recompile relink redistribute cycle. But with the ORDBMS, the alternative
algorithms could be integrated with ease. In fact, none of the components of the overall
information system had to be brought down. The developers simply wrote the
alternative function in C, compiled it into a shared library, and dynamically linked it into
the engine with another name. What all of this demonstrates is that the ORDBMS
permitted the financial services company to react to changing business requirements far
more rapidly than it could before. By embedding the most valuable modules of logic into
a declarative language, they reduced the amount of code they had to write and the
amount of time required to implement new system functionality. None of this was
interesting when viewed through the narrow perspective of system performance. But is
made a tremendous difference to the business’s operational efficiency.

Object-Relational DBMS Applications

Extensible databases provide a significant boost for developers building traditional


business data processing applications. By implementing a database that constitutes a
better model of the application’s problem domain the information system can be made
to provide more flexibility and functionality at lower development cost. Business
questions such as the one in the List might be answered in systems using an SQL-92
and C. Doing so, however, involves a more complex implementation and requires
considerably more effort. The more important effect of the technology is that it makes it
possible to build information systems to address data management problems usually
considered to be too difficult. One way to characterize applications in which an object-
relational DBMS is the best platform is to focus on the kind of data involved. For thirty
years, software engineers have used the term “data entry” to describe the process by
which information enters the system. Human users enter data using a keyboard. Today
many information systems employ electronics to capture information. Video cameras,
environmental sensors, and specialized monitoring equipment record data in rich media
systems, industrial routing applications, and medical imaging systems. Object-relational
DBMS technology excels at this kind of application. It would be a mistake to say that
ORDBMS are only good for digital content applications. As we shall see in this book OR
techniques provide considerable advantages over more low-level RDBMS approaches
even in traditional business data processing applications. But as other technology
changes move us towards applications in which data is recordedrather than entered,
ORDBMS will become increasingly necessary.

ORDBMS Concepts and Terminology

Application Domain & Description


Complex data analysis: -  You can integrate sophisticated statistical and special
purpose analytic algorithms into the ORDBMS and use them in knowledge discovery or
data mining applications. For example, it is possible to answer questions such as
“Which attribute of my potential customers indicates most strongly that they will spend
money with me?”

Text and documents: -  Simple cases permit you to find all documents that include
some word or phrase. More complex uses would include creating a network that
reflected similarity between documents.

Digital asset management: - The ORDBMS can manage digital media such as
video, audio, and still images. In this context, manage means more than store and
retrieve. It also means “convert format,” “detect scene changes in video and extract first
frame from new scene,” and even “What MP3 audio tracks do I have that include this
sound?”

Geographic data: -  For traditional applications, this might involve “Show me the
lat/long coordinates corresponding to this street address.” This might be extended to
answer requests such as “Show me all house and contents policy holders within a
quarter mile of a tidal water body.” For next-generation applications, with a GPS device
integrated with a cellular phone, it might even be able to answer the perpetual puzzler
“Car 54, where are you?”

Bio-medical: - Modern medicine gathers lots of digital signal data such as CAT scans
and ultrasound imagery. In the simplest case, you can use these images to filter out “all
cell cultures with probable abnormality.” In the more advanced uses, you can also
answer questions such as “show me all the cardiograms which are ‘like’ this
cardiogram.”

Data Model

A data model is a way of thinking about data, and the object-relational data model
amounts to objects in a relational framework. An ORDBMS chief task is to provide a
flexible framework for organizing and manipulating software objects corresponding to
real-world phenomenon.
The object-relational data model can be broken into three areas:

 Structural Features. This aspect of the data model deals with how a database’s
data can be structured or organized.

 Manipulation. Because a single data set often needs to support multiple user
views, and because data values need to be continually updated to reflect the
state of the world, the data model provides a means to manipulate data.

 Integrity and Security. A DBMS data model allows the developers to declare rules
that ensure the correctness of the data values in the database. In the first two
chapters of this book, we introduce and describe the features of an ORDBMS
that developers use to build information systems.

Enhanced Table Structures

An OR database consists of a group of tables made up of rows. All rows in a table are
structurally identical in that they all consist of a fixed number of values of specific data
types stored in columns that are named as part of the table’s definition. The most
important distinction between SQL-92 tables and object relational database tables is the
way that ORDBMS columns are not limited to a
standardized set of data types. Figure illustrates what an object-relational table looks
like. The first thing to note about this table is the way that its column headings consist of
both a name and a data type. Second, note how several columns have internal
structure. In a SQL-92 DBMS, such structure would be broken up into several separate
columns, and operations over a data value such as Employee’s Name would need to list
every component column. Third, this table contains several instances of unconventional
data types. Lives At is a geographic point, which is a latitude/longitude pair that
describes a position on the globe. Resume contains documents, which is a kind of
Binary Large Object (BLOB). In addition to defining the structure of a table, you can
include integrity constraints in its definition. Tables should all have a key, which is a
subset of attributes whose data values can never be repeated in the table. Keys are not
absolutely required as part of the table’s definition, but they are a very good idea. A
table can have several keys, but only one of these is granted the title of primary key. In
our example table, the combination of the Name and DOB columns contains data
values that are unique within the table. On balance, it is far more likely that an end user
made a data entry mistake than two employees share names and dates of birth.

Object-Oriented Schema Features

Another difference between relational DBMS and ORDBMS is the way in which object-
relational database schema supports features co-opted from object-oriented
approaches to software engineering. We have already seen that an object-relational
table can contain exotic data types. In addition, object-relational tables can be
organized into new kinds of relationships, and a table’s columns can contain sets of
data objects. In an ORDBMS, tables can be typed; that is, developers can create a table
with a record structure that corresponds to the definition of a data type. The type system
includes a notion of inheritance in which data types can be organized into hierarchies.
This naturally supplies a mechanism whereby tables can be arranged into hierarchies
too. In the figure illustrates how the Employees table might look as part of such a
hierarchy. In most object-oriented development environments, the concept of
inheritance is limited to the structure and behavior of object classes. However, in an
object-relational database, queries can address data values through the hierarchy.
When you write an OR-SQL statement that addresses a table, all the records in its
subtables become involved in the query too.

Extensibility: - User-Defined Types and User-Defined


Functions

The concept of extensibility is a principal innovation of ORDBMS technology One of the


problems you encounter developing information systems using SQL-92 is that modeling
complex data structures and implementing complex functions can be difficult. One way
to deal with this problem is for the DBMS vendor to build more data types and functions
into their products. Because the number of interesting new data types is very large,
however, this is not a reasonable solution. A better approach is to build the DBMS
engine so that it can accept the addition of new, application specific functionality.
Developers can specialize or extend many of the features of an ORDBMS: the data
types, OR-SQL expressions, the aggregates, and so on. In fact, it is useful to think of
the core ORDBMS as being a kind of software backplane, which is a framework into
which specialized software modules are embedded.

Database Stored Procedures

Almost all RDBMS allow you to create database procedures that implement business
processes. This allows developers to move considerable portions of an information
system’s total functionality into the DBMS. Although centralizing CPU and memory
requirements on a single machine can limit scalability, in many situations it can improve
the system’s overall throughput and simplify its management. By implementing
application objects within the server, using Java, for examples, it becomes possible,
though not always desirable, to push code implementing one of an application-level
object’s behaviors into the ORDBMS. The interface in the external program simply
passes the work back into the IDS engine. Figure represents the contrasting
approaches. An important point to remember is that with Java, the same logic can be
deployed either within the ORDBMS or within
an external program without changing the code in any way, or even recompiling it. The
novel idea that the ORDBMS can be used to implement many of the operational
features of certain kinds of middleware. Routine extensibility, and particularly the way it
can provide the kind of functionality illustrated in Figure, is a practical application of
these ideas. But making such system scalable requires using other features of the
ORDBMS: the distributed database functionality, commercially available gateways, and
the open storage manager (introduced below). Combining these facilities provides the
kind of location transparency necessary for the development of distributed information
systems.

Storage Management

Traditionally the main purpose of a DBMS was to centralize and organize data storage.
A DBMS program ran on a single, large machine. It would take blocks of that machine’s
disk space under its control and store data in them. Over time, RDBMS products came
to include ever more sophisticated data structures and ever more efficient techniques
for memory caching, data scanning, and storage of large data objects. In spite of these
improvements, only a fraction of an organization’s data can ever be brought together
into one physical location. Data is often distributed among many systems, which is the
consequence of figure. Routine Extensibility and the ORDBMS as the Object-Server
Architecture  autonomous information systems development using a variety of
technologies, or through organizational mergers, or because the data is simply not
suitable for storage in any DBMS. To address this, the IDS product adds a new
extensible storage management facility.

the figure, we illustrate this Virtual Table Interface concept. ORDBMS possess storage


manager facilities similar to RDBMS. Disk space is taken under the control of the
ORDBMS, and data is written into it according to whatever administrative rules are
specified. All the indexing, query processing, and cache management techniques that
are part of an RDBMS are also used in an ORDBMS. Further, distributed database
techniques can be adapted to incorporate user-defined types and functions. However,
all of these mechanisms must be re-implemented to generalize them so that they can
work for user-defined types. For example, page management is generalized to cope
with variable length OPAQUE type objects. You can also integrate code into the engine
to implement an entirely new storage manager. Developers still use OR-SQL as the
primary interface to this data, but instead of relying on internal storage, the ORDBMS
can use the external file system to store data. Any data set that can be represented as a
table can be accessed using this technique. Developers can also use this technique to
get a snapshot of the current state of live information. It is possible to represent the
current state of the operating system’s processes or the current state of the file system
as a table. You can imagine, for example, an application intended to help manage a
fleet of trucks and employees servicing air conditioners or elevators. Each truck has a
device combining a Global Positioning System (GPS) with a cellular phone

that allows a central service to poll all trucks and to have them “phone in” their current
location. With the ORDBMS, you can embed Java code that activates the paging and
response service to implement a virtual table, and then write queries over this new
table, as shown in Listing below.“Find repair trucks and drivers within ’50 miles’ of ‘Alan
Turing’ who are qualified to repair the ‘T-20’ air conditioning unit.”

SELECT T.Location, T.DriversName

FROM Trucks T, Customers C

WHERE Distance (T.Location, C.Location) < '50 miles'

AND C.Name = 'Alan Turing'

AND DocContains ( T.DriverQualifications,

'Repair for T-20');

This functionality should fundamentally change the way you think about a DBMS. In an
object-relational DBMS, SQL becomes the medium used to combine components to
perform operations on data. Most data will be stored on a local disk under the control of
the DBMS, but that is not necessarily the case. The Virtual Table Interface tutorial
describes these features in more detail.

The Real Benefits of Object-Relational DB-Technology for Object-


Oriented Software Development
Abstract: - Object-oriented programming languages (OOPLs like C++, Java, etc.)
have established themselves in the development of complex software systems for more
than a decade. With the integration of object-oriented concepts, object-relational
database management systems (ORDBMS) aim at supporting new generation software
systems better and more efficiently. Facing the situation that nowadays more and more
software development teams use OOPLs ‘on top of’ (O) RDBMS, i.e. access (object-)
relational databases from applications developed in OOPL, this benefit is on our
investigations on assessing the contribution of object-relational database technology to
object-oriented software development. First, a conceptual examination shows that there
is still a considerable gap between the object-relational paradigm (as represented by the
SQL: 1999 standard) and the object-oriented paradigm. Second, empirical studies
(performed by using our new benchmark approach) point at mechanisms, which are not
part of SQL: 1999 but would allow reducing the mentioned gap. Thus, we encourage the
integration of such mechanisms, e. g., support for navigation and complex objects
(structured query results), into ORDBMS in order to be really beneficial for new
generation software systems.

Motivation

Object-oriented programming languages (OOPL), such as C++, Java, SmallTalk, etc.,


have established themselves in the development of complex software system for more
than a decade. Both, the structure of these systems as well as the structure of objects
managed by these systems have become very complex. Object-oriented concepts
offered by OOPL are well suited for managing complex structured objects. However,
there are additional requirements, such as persistence and transaction-protected
manipulation, which can only be fulfilled efficiently by integrating a database
management system (DBMS). Consequently, database technology becomes one of the
core technologies of modern software systems. In times of the ‘breakthrough’ of object-
oriented system development, two kinds of DBMS were of practical relevance: relational
DBMS (RDBMS) and object-oriented DBMS (OODBMS). Using OODBMS has proved
inefficient/inflexible for reasons.

Using an RDBMS, on one hand, requires to overcome the well known impedance
mismatch , i.e., performing the non-trivial task of mapping complex object structures and
navigational data processing (at the OOPL layer) to the set-oriented, descriptive query
language (SQL92), which supports just a simple, flat data model. Despite this
considerable mapping overhead, mature RDBMS technology (index structures,
optimization, integrity control, etc.), on the other hand, contributes to keep the overall
system performance acceptable. Several commercial systems mapping object-oriented
structures onto the relational data model are currently available. Such systems are often
referred to as Persistent Object System built on Relation (shortly: POS).

The object-relational wave in database technology has decisively reduced the gap
between RDBMS and OOPL. Although object-relational DBMS (ORDBMS) are able to
(internally) manage object-oriented structures the required seamless coupling of OOPL
and ORDBMS is not yet possible, be-cause (as in SQL92) results of SQL:1999 queries
are rather (sets of data) topples than (desired sets of) objects. In summary, the gap
between OOPL and ORDBMS can be traced back to a whole bunch of modeling and
operational aspects, as we will detail in the following sections. Furthermore, the SQL:
1999 standard and the commercially available ORDBMS differ very much in their object-
oriented features. Thus, it is by no means clear, how a given object-oriented design can
be mapped to a given ORDBMS (most) efficiently, or which features should be offered
by ORDBMS in general in order to enable an efficient mapping of object-oriented
structures, respectively. Our long-term objective is to influence the further development
of ORDBMS to-wards a better support of object-oriented software development
(minimal mapping overhead). Thus, we have proposed a new benchmark approach in
allowing to assess a given ORDBMS by taking into account both, its own performance
as well as the required mapping overhead.

Conceptual Considerations

There is a multiplicity of object data models, for example ODMG, UML, COM, C++ and
Java. All these models support the basic concepts of the OO paradigm; however, there
are certain differences. Independently from the modeling language used in the OO
software development (e. g., UML), SQL: 1999 must be coupled with a concrete OOPL.
In accordance to their overall relevance and conceptual vicinity, we concentrate on the
object model of C++ and ODMG and compare it with the SQL: 1999 standard.
Modeling Aspects

Object Orientation in OOPL. The concept object represents the foundation of the


OO paradigm. An object is the encapsulation of data representing a semantic unit w. r.
t. its structures/values and its behavior. It conforms to a particular class. In fact, a class
implements an object type (classification) which is characterized by a name as well as a
set of attributes and methods. Each attribute conforms to a certain data type and is
either single-valued or set-valued (collection types). Furthermore, a data type can be
scalar (e. g., integer, Boolean, string, etc.) or complex. In the latter case corresponding
values can be references (association) or objects of other classes (aggregation) so that
complex structures can be modeled. A class may implement methods (behavior) which
can be invoked in order to change the object’s state. Classes may be arranged within
class hierarchies. A class inherits structures and behavior from its super
classes (inheritance), but may refine these definitions (specialization). Due to space
restrictions we do not give a more detailed description of the OO paradigm, but discuss
how the OR data model conforms to OO concepts.

Object Orientation in SQL: 1999 . While the relational data model (SQL2) did not
support semantic modeling concepts sufficiently, in SQL: 1999 the fundamental
extension supporting object-orientation is the structured user-defined data type (UDT).
UDT, which can be considered as object types, can be treated in the same way as pre-
defined data types (built-in data types). Consequently, similarly to the type system of
OOPL the type system of SQL: 1999 is extensible. UDT may be complex structured
and, therefore, may not only contain predefined data types but also set-valued
attributes (collection types) and even other UDT (aggregation) or
references (associations). Obviously, UDT are comparable to the classes of the OO
paradigm. However, according to the SQL: 1999 standard a UDT must be associated
with a table. The notion of typed table, also referred to as object table, allows to
persistently managing instances of a certain UDT within a table. Each topple of such a
table represents an instance (object) of a particular UDT and is identified by a unique
object identifier (OID) which can be sys-tem- generated or user-defined. Besides
instantiable UDT, SQL: 1999 also supports non-instantiable UDT, which conforms to the
notion of abstract classes in OOPL. In addition, UDT may have methods (behavior)
which are either system-generated or implemented by users. They may participate in
type hierarchies, in which more specialized types (subtypes) inherit structure and
behavior from more general types (super-types), but may specialize corresponding
definitions. Thus, SQL: 1999 supports polymorphism and substitutability, however,
multiple-inheritance is not supported. Due to the association of UDT with tables SQL:
1999 does not support encapsulation and, consequently, there is nothing like the
degree of encapsulation known from OOPL (public, protected, private).

Operational Aspects

Besides the fundamental modeling aspects discussed so far, we also have to examine
operational aspects in order to figure out the conceptual distance adequately. The
following aspects are most relevant to our consideration:

Descriptive Queries vs. Navigational Processing: -   While OOPL processing is


inherently navigational, SQL supports a set-oriented, descriptive query language. Both
navigational and set-oriented query processing are important to modern software
systems. Therefore, ORDBMS should also directly facilitate navigational processing to
fulfill this requirement of OO applications. Direct support of navigational access by the
DBMS would mean that a database object referred to by its OID can be provided as in-
stance of an OOPL class. However, to the best of our knowledge none of the currently
available ORDBMS directly supports this notion of navigation. A naive coupling of
OOPL with descriptive SQL requires to issue one or several corresponding SQL queries
to the database for processing a dereferencing operation, e. g. GetObject(Ref), and
retrieving the requested object from the database server. Such a processing scheme
will surely lead to bad runtime behaviour of the entire system, since the costs of
transforming a navigational operation to SQL queries, of evaluating these queries in the
DBS, and of the client/ server communication can be very high. Obviously, the lack of
DB APIs in directly supporting navigational access impairs the system efficiency badly.
Thus, either direct support for navigation 1 must be provided or efficient perfecting
mechanisms exploit-1. In section 5 we will see that one of the commercially available
ORDBMS provides some basic means for a direct access to objects by OID.
Measurement results show that this is at least a step into the right direction set-oriented
database access and, thereby, reducing the number of database round-trips must be
applied in order to effectively couple OOPLs with ORDBMS.
Structured Query Results: -  As already mentioned several times before, OOPL
support complex structured object types, especially by the possibility of nesting complex
data types as well as using collection types and references. We have also mentioned
previously that these facilities of modelling complex structured objects have been
integrated into SQL by the SQL: 1999 standard. Unfortunately, because of the
traditional basic concepts of SQL, complex structures (actually supported both in OOPL
and ORDBMS) get lost at the DBMS interface, since only (sets of) simply structured, flat
data tuples can be retrieved. Therefore, if we want to couple an OOPL with an
(O)RDBMS, it is necessary to separately retrieve simple fragments of complex objects
by issuing several SQL queries, and then rebuild complex object structures at the
programming language level (see Fig. 1). The mentioned problem even gets worse, if
not individual complex objects, but complex structures (object graphs) containing
numerous related objects interconnected by object references are to be selected as
units. Obviously, the lack of direct support for complex structured objects at the DB API
reveals a bottleneck between the two paradigms, and prevents new generation
software systems from exploiting the potential power of ORDBMS most effectively.

Object Behaviour: -. Of course, the operational aspects also encompass the object
behaviour implemented in the database. Because of special implementation aspects
these methods (UDF) can almost exclusively be executed at the server side, or, if these
UDF or special client invokable pendants are executed at the client side, it cannot be
guaranteed that these pendants perform the original semantics. For example, there may
be complex dependencies between UDF and integrity constraints, e. g., referential
integrity constraints and triggers, which are implemented by using SQL and are
automatically ensured by the DBMS. Thus, it is almost impossible to support calling
UDF at the OOPL level in the same (‘natural’) way as usually object methods can be
called. There-fore, we do not consider a mapping of object methods in this paper and
restrict our considerations to navigational and set-oriented access.

Mapping Rules

In the previous section, we outlined the conceptual distance between the OO paradigm
and SQL: 1999. Considering an individual ORDBMS, its OO features determine the
overhead which has at least to be spent in order to bridge this distance. Nevertheless,
in theory there is an entire spectrum of possibilities to design the required mapping
layer. At this point, we want to mention that there are some more aspects of ORDBMS,
which OO applications may benefit from, but which cannot be captured in this paper, e.
g., and facilities for integrating external data sources into database processing. The
other hand, on how far the OO features are to be exploited. Regarding the first point
(‘natural’ coding), we demand that the programmer must not be burdened by having to
take data management aspects into account. Thus, programming must be independent
of the database as well as the mapping layer design. Regarding the second point
(degree of exploiting OO features), we want to outline the two extremes of the
mentioned spectrum, i.e., pure relational mapping and full exploitation of the OO
features offered by the considered ORDBMS

Pure Relational Mapping: - As mentioned before, there are several commercial
POS mapping OO structures to relational tables. Objects are represented by table rows.
Since RDBMS do not support set-valued attributes, user-defined data types, and object
references, additional tables are required to store corresponding data and to connect
them with the corresponding class tables via foreign keys. Thus, several tables may be
required to map a given class. Principally, there are several ways of representing a
class hierarchy in the relational model. After studying pros and cons, we decided to use
the horizontal partitioning approach, since it provides good performance in most cases,
and is also used in most commercial POS.

Object-Relational Mapping: - Exploiting the OO features of ORDBMS is commonly


argued to be more promising, but the real benefits in comparison to the pure relational
mapping are not very well studied yet. This paper will give some performance
evaluations later on. Before outlining general mapping rules exploiting OO features of
ORDBMS, we have to re-emphasize the following point. Our benchmark approach,
which will be out-lined in the subsequent section, assesses a certain ORDBMS by
taking the required mapping overhead into account. In order to be fair, the mapping
layer used throughout the measurements must be designed in an optimal way w. r. t.
the capabilities of the ORDBMS considered. Therefore, the design of the mapping layer
may differ with the ORDBMS to be assessed. In the following, we just outline general
mapping rules, which are based on the SQL: 1999 concepts, in order to provide some
basic understanding on how a mapping layer can be designed. A C++ class maps to a
UDT in SQL: 1999. Non-instantiable UDT corresponds to abstract C++ classes. A UDT
is associated with exactly one table (typed table) to initialize its instances. Each tuple in
this table represents a persistent instance (object) of a particular class and is associated
with a system-generated OID. Embedded objects (aggregation) entirely belong to their
top-level object and, therefore, do not own an OID. Extents are mapped to the list
constructor of C++ STL. Keys are managed at the mapping layer by applying the map
constructor of C++ STL. A C++ class hierarchy maps to a hierarchy of structured UDT.
However, SQL: 1999 only supports single-inherit-. So that multi-inheritance has to be
simulated at the mapping layer. SQL references are mapped to C++ pointers. Since
SQL: 1999 does not support diametric references, a relationship type is broken down
into two separate primary-key/foreign-key connections and the mapping layer maintains
the referential integrity. Except for mutator and observer methods, which are generated
by the mapping layer w. r. t. the constraints de-fined in the user database schema,
object behaviour is not yet considered in our performance investigation. Navigation is
supported by offering the function GetObject(Ref) which, in the case that the DB API
does not directly support OID-based object fetching, is implicitly transformed into an
SQL query. After having discussed (modelling and operational) discrepancies between
ORDBMS and OOPL as well as the mapping rules needed to bridge the gap, we
proceed with our performance evaluations.

Performance Evaluation

Our discussion shows that there is only a small difference between the OO and OR
paradigms w. r. t. modelling aspects, but a considerable distance w. r. t. the operational
aspects and the application semantics. In order to further evaluate this distance as well
as to quantify the overhead required for bridging this gap, we propose a configurable
benchmark approach [18, 26]. Remind, we do not consider OODBMS, but ORDBMS,
because we more and more have to face the situation that people are using OOPL for
software development and (O) RDBMS for data management purposes so that there is
a need for a more detailed examination of the efficiency of possible coupling
mechanisms. Consequently, the OO-Benchmark representing an important standard for
benchmarking OO systems is not appropriate for our purposes. The performance of
RDBMS or ORDBMS has traditionally been evaluated in isolation by applying a
standard benchmark directly at the DBMS interface. Sample benchmarks are the
Wisconsin benchmark, the TPC benchmark as well as the Bucky benchmark [6]. These
benchmarks are very suitable for comparing different DBMS with each other. However,
none of these benchmarks helps to assess the contributions of a DBMS to OO software
development. Consequently, these other approaches do not take the typical application
server architecture and the fact that the DBMS capabilities determine the overhead of
the required application/mapping layer into account. Furthermore, data types as well as
operations of the applications we consider may differ significantly (double-edged
sword), so that a standard benchmark can not cover the entire spectrum. Therefore, we
propose an open, configurable benchmark system allowing examining the entire system
(incl. mapping layer) w. r. t. to its typical applications. Such a system will also help us to
get results transcending those reported on in this paper (see succeeding sections), e.
g., and more detailed examinations of navigational support. In the following, we outline
our first prototype.

Benchmark System

An open, configurable benchmark system is not necessarily difficult to be applied, as


our approach proves. Indeed, our current prototype offers predefined configurations,
which w. r. t. data-base size are small, medium and large in order to be sufficiently
scalable. Both, structures (data type and type hierarchies) as well as complexity of data
in the standard configurations are determined in cooperation with one of the eading
software vendors for business standard software and, thus, represent a wide spectrum
of typical application domains. In addition, our benchmark system can be simply
configured according to the particular properties of a concrete application. Among other
possibilities, users can directly control the generation process of the benchmark
database, e. g., specify the database size, the complexity of the class hierarchy and the
complexity of the individual objects, in order to take care that the special requirements
of the application in mind are taken into account. After having generated the benchmark
database the user may select from a given set of query templates and indicate how
many times each template is to be instantiated.

New query templates can be easily added, if the existing templates do not reflect
application characteristics sufficiently. Based on these user selections/ specifications
the load generator creates a set of queries which is passed to the query executor,
which, in turn, serves as a kind of driver for measurements. Users can also specify
which kinds of measurement data are to be collected by the system, i. e., amount of
time spent at the DB or the mapping layer for query transformation, or the time spent for
SQL query evaluation, data loading, and/or result set construction. Corresponding
values are collected by the data collector during execution of the query set and after-
wards stored in the DBS for further evaluations. As explained in more detail in , the
special challenges of this benchmark approach are, on one hand, to properly take into
account the requirements of OO system development, and, on the other hand, to
guarantee an optimal mapping w. r. t. the particular capabilities of an individual
ORDBMS.

Benchmark and Measurements

In order to get a complete performance evaluation, we concentrate on answering the


following questions: 1. which performance gains offer ORDBMS in comparison to
RDBMS regarding their usage in OO software development?

2. Which additional overhead has to be spent at the mapping layer in order to bridge the
gap between the OO and OR paradigms and how does it behave facing different query
types?

3. To which extent is the system performance influenced by the capabilities of the (O)
RDBMS API?

In order to be able to answer the first two questions, we have selected a set of typical
benchmarking queries according to a long-term study of a leading software company.
These queries represent a wide spectrum of typical operations in the target applications
of ORDBMS. We have compared a purely relational mapping with an object-relational
one (by means of exploiting its OO modelling power) by using a currently available
commercial ORDBMS. This way we ‘measured’ how OO software development can
leverage from the OO extensions offered by ORDBMS (e. g., structured UDT,
references, etc.). The operations considered for that purpose are implemented as query
tem-plates and grouped in following categories:
Navigation operations:  Navigation operations, such as GetObject(OID), are not
directly supported by almost all currently available ORDBMS. Considering such
operations helps us to assess the performance of ORDBMS in supporting navigational
processing. We hope that corresponding results ‘help’ ORDBMS vendors to make
ORDBMS as efficient as OODBMS are in this concern.

Queries with simple predicates on scalar attributes: Queries of this category have


simple predicates just containing a single comparison operation on a scalar attribute.
This group mainly serves to provide a performance baseline that can be helpful when
interpreting results of more complex queries.

Queries with predicates on UDT:  This group contains queries with simple predicates
(a single comparison operation) on attributes of structured, non-atomic data types.
Thus, it mainly serves for assessing the efficiency of mapping UDT to (O) RDBMS.

Queries with predicates on set-valued attributes: This group contains queries with


simple predicates (a single IN operation) on nested sets. ORDBMS directly support set-
valued data types. In the relational mapping, several tables (according to the degree of
nesting), which are connected by primary/foreign-keys, are necessary.

Queries with path predicates: This group contains queries evaluating simple


predicates after path traversals. These queries allow to evaluate the efficiency of
processing dereferencing operations (path traversals) in ORDBMS.

Queries with complex predicates: Queries of this group contain complex predicates
challenging both query transformation as well as query optimization.

Queries on the class hierarchy: While all other queries exclusively deliver direct in-
stances of a single queried class, queries of this group deliver transitive instances as
well. Predicates conform to those of the second category. This group of queries allows
to evaluate the efficiency of the ORDBMS in handling class hierarchies (inheritance).
The comparison with the relational mapping has been expected to demonstrate the
advantages of ORDBMS. The third question posed at the beginning of this section deals
with the capabilities of the DB interface especially w. r. t. support for complex structured
objects and navigational access. In order to examine these aspects, we performed
measurements on two different (commercially successful) ORDBMS. One of these
systems offers the more traditional interface, whereas the second one provides some
basic means of supporting complex structures objects. We performed our
measurements 4 on a benchmarking database with 100 classes and 250000 instances
(configuration medium). In order to use a representatively structured class hierarchy, we
studied typical application scenarios of a renowned vendor of business standard
software and parameterized our population algorithm accordingly. We measured the
database time (DB time) and the total system time (TS time). The DB time of SQL
queries is the time elapsed between delegating the queries to the DBMS and receiving
back the results (open cursors, traverse iterators). It includes the time for client/ server
communication, the time for evaluating the queries within the DBS and the time for
loading the complete result sets. This has to be taken into account, when analyzing the
measurement results. The TS time is defined as the total elapsed time from issuing a
query operation at the OOPL level until having received the complete result set. It
contains the time spent within the mapping layer as well as the DB time. We think that
these 3 questions have to be answered before we can think about, how OR technology
can be improved in order to support OOPL better and more efficiently. In the following
section, we report on our measurement results.

Measurement Results and Observations

ORDBMS vs. RDBMS

In the first test series, we have compared a purely relational mapping with an OR
mapping by using one of the leading currently available commercial ORDBMS. The
hardware and the soft-ware configurations are left unspecified, to avoid the usual legal
and competitive problems with publishing performance numbers for commercial
products.
All performance measurements are averages of multiple trials, with more trials for
higher variance measurements. For each DBMS tested, we put much effort in
optimization (e.g., indexes) and mapping layer design in order to achieve the

best performance possible. Vestigation aims at quantifying the benefits of OO


extensions offered by ORDBMS in more detail. Due to space restrictions, it is not
possible to analyze all results in detail. It can be ob-served that the OR mapping
outperforms the purely relational mapping in all query categories. Although the OR
mapping shows only tiny ad-vantages in retrieving small result sets, it provides
performance gains of up to 40% in retrieving large result sets, or processing queries on
class hierarchies. The reasons are twofold. First, the OO features provided by the
ORDBMS contribute to keep the complexity of the mapping layer low (better query
evaluation strategy, less overhead for synthesizing the result set) and to reduce client/
server communication (less queries). Second, the implementation of the ORDBMS (we
used) enables performance improvements, (even) if using OO extensions. In the purely
relational case, it is not possible to map a (complex) class to exactly one table. Mapping
set-valued attributes, aggregations and (m:n)-relationships properly demands several
tables interconnected by primary/foreign keys.
DB Time

Performance Characteristics of the Mapping Layer

In order to characterize the performance of the mapping layer adequately, we have


investigated simple queries with different selectivity. The results are presented in
Figure. As already mentioned in section 2, the DB APIs of almost all currently available
ORDBMS do not support navigational access directly. Hence, navigational operation,
such as GetObject (Ref), must be transformed to a database query (SQL: 1999), such
as "Select * from... Where OID = Ref", by the mapping layer. Such a query strategy,
especially when intensively dealing with navigational operations as usually required by
most OO applications, obviously leads to high processing overhead spent in the
database system as well as very high communication costs (over 70% of the entire
system time). The (probably not very astonishing) observation is that the traditional
query strategy is not adequate for supporting navigational access.

The DB time of set-oriented queries shows only a slight ascent with increasing result
sets, while the additional mapping overhead increases rapidly. When retrieving 1250
objects, the time spent at the mapping layer even exceeds 86% of the total system time.
This observation can be explained as follows. In the early days of ORDBMS, these
systems comparable to RDBMS were not very successful in supporting navigational
access, but excellent in processing set-oriented access (as they are still today).
Unfortunately, OO applications can hardly benefit from this advantage, because the
ORDBMS API is ‘inherited’ from traditional RDBMS and, therefore, still only supports
simple, flat data. In lack of anextensible DB API which may generically support complex
data types defined by the user, complex objects in an ORDBMS have to be first
‘disassembled’ into scalar values, and afterwards reconstructed (at the mapping layer)
to objects of a certain class in the particular OOPL. This kind of overhead gets dramatic
with increasing result set cardinality and impairs the en-tire system efficiency
significantly. Regarding these measurement results, we can draw the following
conclusions. In order to be able to support navigational access better, the ORDB API
should directly sup-port navigational operations like GetObject(Ref), so that the costs of
transforming navigational operations to SQL queries and for evaluating these queries
can be avoided. Furthermore, it should also support the notion of complex objects
directly and offer the possibility of retrieving complex objects as units. According to our
examinations, such improvements can increase the entire system efficiency by up to
400%.

Supports for Complex Objects

As already mentioned before, the lack of direct support for complex objects and
navigational access, the API level extremely impairs the overall system efficiency.
Fortunately, a leading ORDBMS vendor already offers an extended call level interface,
which, as we can see later in this section, directly supports navigational access as well
as retrieval of complex objects as units, and, in addition, even retrieval of complex
object graphs as units. Navigation is enabled by the possibility of autonomously
retrieving complex structured objects (by OID) as instances of C structures. This
simplifies the mapping to OOPL, such as C++, considerably and, therefore, is
undoubtedly the first step into the right direction, although this mechanism does not yet
supports the actually wanted seamless coupling (transparent transformation from a
database object to an instance of an OOPL class). The mentioned support for complex
objects at the level of the DB API allows to directly retrieving a complex object’s data
from the database into the main memory by specifying its OID or a predicate. Therefore,
the expensive query processing strategy described earlier, that can be avoided. Remind
that the measurements described in section earlier, that have been performed on an
ORDBMS that does not possess a DB API as the one described in this section. To
show the importance of and the corresponding demand on a suitable support for
complex objects at the DB API level, we repeated the measurements described earlier,
that on the ORDBMS referred to in this section and providing the mentioned complex
object support at its API. Illustrates the measurement results. Obviously, the additional
overhead spent at the mapping layer is now independent from the cardinality of the
query result sets. Thus, the direct supports of complex objects at the DB API results in a
clear performance gain (up to 400%). The direct support for navigational access at the
DB API level mentioned in this section, allowing to directly access objects by calling a
function like Get-Object( Ref), avoids expensive processes (query transformation, data
types conversion and object re-construction). This obviously contributes to improve
performance significantly. Fig. 6a shows a comparison between a query strategy
(transforming a navigational operation to an SQL query) and a navigational strategy
(directly calling a GetObject(OID)function at the DB API). The advantages of the
navigational strategy are obvious. With a direct support of navigational access the entire
system efficiency increases by approximately 200%. OO applications often want a set of
objects interconnected by object references (object graph) to be retrieved completely
within just a single database interaction. Fig. 6b shows a comparison of two strategies
for retrieving complex object graphs. In this measurement, we used the ORDBMS
directly supporting navigation as well as retrieval of complex object graphs. It can be
seen clearly that strategy I exploiting the ability of retrieving object graphs exhibits a
performance gain of about 100% already at a result set cardinality of 13 objects. The
design of a new DB API, which directly supports complex structured objects, is by no
means an easy job and requires generic design methods, because user-defined data
types can be arbitrarily structured, e. g., contain other complex data types, such as
UDT, references and set-valued attributes. Furthermore, a DB API has always to be
multi-lingual requiring to support all common programming languages simultaneously
and, therefore, making it very difficult to offer the best of both worlds (DBMS and OOPL)
without any compromises.

Conclusions

We have emphasized the importance of assessing ORDBMS w. r. t. their capabilities of


supporting OOPL. We have first qualitatively considered ORDBMS and OOPL regarding
modelling and operational aspects relevant for object-oriented software development.
As the object-relational (SQL: 1999) and the object-oriented data model are essentially
coming together, the operational distance between these two paradigms is still
considerable so that an additional mapping layer is necessary to over-come this gap.
Regrettably, such an additional layer impairs the performance of the overall system
considerably. Additionally, we performed quantitative examinations (measurements) in
order to assess ORDBMS in their capabilities of supporting OOPL. Indirectly, these
measurements are supposed to contribute to promoting the optimal utilization of
currently available ORDBMS in object-oriented system development and to guide the
future development of ORDBMS in a way that the support for OOPL is improved.
Regarding our performance examinations, we have motivated the necessity of an open,
configurable benchmark approach, because not only the performance of ORDBMS
themselves but also the additional overhead, which is necessary for bridging the
conceptual and operational distance between ORDBMS and OOPL, have to be taken
into account and, therefore, properly characterized. It has been clearly illustrated by our
performance measurements that object-relational database technology gets more and
more mature, not only conceptually (data model, query processing), but also w. r. t.
performance. The model facilities contribute to keep the mapping layer ‘thin’ in contrast
to RDBMS. This, on one hand, reduces the implementation efforts, and, on the other
hand, increases the entire system efficiency. Despite the available object-oriented ex-
tensions, which entail an unambiguous gain in comparison to RDBMS, the potential
benefit of object-relational database technology in our opinion is not yet exhausted,
since the traditional DB API is so far not capable of successfully supporting object-
oriented principles. The DB API of almost all ORDBMS still can not support navigational
operations and complex object structures directly so that new generation software
systems can not take advantage of object-relational database technology in an optimal
way. It can be called the ‘bottleneck’ between the object-relational and object-oriented
paradigms. Our examinations have shown clearly that a new DB API directly supporting
navigational operations and complex objects is necessary. Obviously, it is not easy to
equip ORDBMS with such a new interface. For example, such an interface must be
multi-lingual. For answering the question, how user-defined data types can be
effectively represented at the OOPL level, further research efforts are required.
Altogether, the problem of a seamless and effective mapping of SQL: 1999 to OOPL,
such as C++ and Java, has to be worked on further. Our future work will mainly be
looking for possible solutions. Furthermore, we plan intensive studies of ORDBMS
capabilities for supporting navigational access, because ORDBMS are still behind
OODBMS in this concern. Generally, after having characterized the performance
aspects in more details, our long-term objective is to develop applicable concepts
contributing to increase the performance of ORDBMS.

Bibliography
www.yahoo.com

www.askjeeves.com

www.hotmail.com
www.oracle.com

References

1. Bernhard, R., Flehmig, M., Mahdoui, A., Ritter, N., Steiert, H.-P., Zhang, W.P.:

“Building a Persistent Class System on top of (O)RDBMS - Concepts and Evalua-tions”,

Internal Report, University of Kaiserslautern, 1999

2. Bernstein, P.A., Harry, B., Sanders, P.J., Shutt, D., Zander, J.: “The Microsoft Re-pository”,

Proc. VLDB Conf., 1997, pp. 3-12

3. Bernstein, P.A., Pal, S., Shutt, D.: “Context-Based Prefetch for Implementing Ob-jects

on Relations”, Proc. VLDB Conf., 1999, pp. 327-338

4. Bitton, D., DeWitt, D.J., Turbyfill, C.: “Benchmarking Database Systems: A Sys-tematic

Approach”, Proc. VLDB Conf., 1983, pp. 8-19

5. Carey, M.J., DeWitt, D.J.: “Of Objects and Databases: A Decade of Turmoil”,

Proc. VLDB Conf., 1996, pp. 3-14

6. Carey, M.J., DeWitt, D.J., Naughton, J.F., Asgarian, M., Brown, P., Gehrke, J.E.,

Shah, D.N.: “The Bucky Object-Relational Benchmark”, Proc. VLDB Conf., 1996,

pp. 135-146

7. Carey, M.J., DeWitt, D.J., Kant, C., Naughton, J.F.: “A Status Report on the OO7

OODBMS Benchmarking Effort”, Proc. ACM OOPSLA, 1994, pp. 414-426

8. Carey, M.J., Doole, D., Mattos, N.M.: “O-O, What Have They Done to DB2?”,


Proc. 1999 25th. VLDB Conf., pp. 542-553

9. Cattell, R.G.G., Barry, D., Bartels, D., et al: “The Object Database Stand-ard:

ODMG 2.0”, Morgan-Kaufman Publishers, San Mateo, 1997

10. Gray, J.: “The Benchmark Handbook for Database and Transaction Processing

Systems”, Morgen Kaufmann Publishers, San Mateo, CA, USA, 2nd Ed., 1993

11. Gulutzan, P., Pelzer, T.: “SQL-99 Complete, Really”, R&D Publications, 1999

12. Keller, A., Jensen, R., Agrawal, S.: “Persistence Software: Bridging Object-Ori-ented

Programming and Relational Database”, Proc. ACM SIGMOD Conf., 1993,

pp. 523-528

13. Mahnke, W., Steiert, H.-P.: “The Application Protential of ORDBMS in the Design

Environments“, Proc. CAD 2000, Berlin, pp. 219-239 (in German)

14. Ontos Business Data Server, http://www.ontos.com

15. Poet Object Server, POET Software, POET SQL Object Factory, http://poet.com/

16. Rao, B.R.: “Object-oriented Databases: Technology, Applications, and Products”,

McGraw-Hill, New York, 1994

17. RogueWave Software, DBTools.h++,RDBMS n ORDBMS

http://www.roguewav3e.com/products/dbtools/

REACTIONS: 
LABELS: COMPUTER SCIENCE AND INFORMATION TECHNOLOGY

0 COMMENTS:
POST A COMMENT

LINKS TO THIS POST

Create a Link

Newer PostOlder PostHome

Subscribe to: Post Comments (Atom)

Note: Hundreds of Free MBA Reports are available on this website, yet the front page doesn't show all.
Please type in the right-side box to get your desired reports.

Say bye bye to Internet Explorer, use Google Chrome for fastest browsing.

CATEGORIES

 Agriculture
 Articles
 Automobiles Cars Vehicles
 Bangladesh
 Banks and Financial Institutions
 Business Communication
 Business Research Methods BRM
 Case Studies
 Chemicals Paint and Coatings Industry
 Compensation System Development
 Computer Science and Information Technology
 Construction and Building Business
 Consumer Products
 Current Affairs and Politics
 Customer Relation Management
 Defense and Weapons
 Economics
 Education Learning and Training
 Electrical and Electronics Business
 Engineering
 Entrepreneurship
 Faisalabad
 Fashion Industry
 Finance
 Financial Accounting
 Financial Management
 Food and Drinks Industry
 Garments and Clothes
 General Business Topics
 Hotels and Recreational Business
 Human Resource Management (HRM)
 Import and Export
 India
 Internship and Apprenticeship
 Live Stock and Animals Feed
 Living and Household Products
 Management
 Managerial
 Managerial Accounting Management Accounting
 Marketing
 Multan
 Multinational Corporation and International Companies
 National Investment
 Natural Gas and Oil Business Reports
 Natural Resources
 Non Profit Not-for-Profit Organizations
 Organizational Behavior (OB)
 Pakistan
 Plastic Industry
 Psychology
 Public Service and NGOs
 Rankings and Ratings
 Ratio Analysis Financial Analysis
 Shipping and Transportation
 Small Industries
 Socio-Economic
 Telecommunication
 Textile Mills
 USA
 Video Lectures

NEW REPORTS UPDATE

 HBL Human Resource (HR) Practices


 HRM Report: Bank Alfalah
 Internship Report: Allah Wasaya Textile Mills Multan
 Internship Report: UniLever Pakistan
 Internship Report: Mehr Dastgir Textile Mills Limited

RECENT COMMENTS BY USERS

 a copy paste of website material


 I think I am going to create my own weblog also as...
 I already using the services of Habib Bank Limited...
 The theme is fine in your website &amp; it is well...
 This is new for me to your website &amp; after pen...

VIDEO LECTURES DOWNLOAD

Economics USA - Download 2.56GB Financial Markets - Download 6.56GB 

WRITE YOUR MESSAGE HERE

If you have any query, want a report but can't find it or you need any other help, then write in this
message box to notify admin. You can also talk to other users here.

POPULAR REPORTS

 Complete Research Report on Information Technology


 Project Report on Nokia
 Internship Report Allied Bank Ltd
 HRM Report on Nestlé
 Internship Report: Gul Ahmed Textile Mills Pretreatment Deptt.
 New Product Development at KFC - Page 1
 Report on Strategic Management of Unilever
 Report on Crescent Textile Mills Limited Faisalabad

STATS

Total visits since Oct-2009 are 224,626

Você também pode gostar