Você está na página 1de 17

Computing and Informatics

Database management systems (DBMS)


1.1. What is a Database?
A brief definition might be:

A STORE OF INFORMATION,
HELD OVER A PERIOD OF TIME,
IN COMPUTER-READABLE FORM.

Let us examine the parts of this definition in more detail.


1.1.1. A store of information .....
Typical examples of information stored for some practical purpose are:

Information collected for the sake of making a statistical analysis, e.g. the national census, or a survey
of cracks in a stretch of motorway.
Textual material required for information retrieval e.g. technical abstracts, statutory or other
regulations. Currently there is some interest in extending such data bases in the direction of intelligent
knowledge-based systems (IKBS) where rules for interpretation by an expert user are included along
with the information itself.
Operational and administrative information required for running an organisation. In a commercial
concern this will take the form of stock records, personnel records, customer records, among others.

The three main examples given here are of STATIC, GROWING and DYNAMIC databases respectively.

Note that nothing was said in the definition about the quantity of information being held. Although many
of the benefits associated with using a database are due to economies of scale, a small database may
be very worthwhile (for instance to the secretary of the local sports club) if the information is to be
processed frequently and in a repetitive manner.

1.1.2. ..... held over a period of time.

This part of the definition goes without saying in most people's minds but it is worth dwelling on it for a
minute. Because of the investment involved in setting up a database, the expectation must be that it
will continue to be useful, over years rather than months. But the relationship with time varies from one
type of information to another.

Census information is collected on a particular date and stored as a snapshot of the state of affairs when
the survey was taken. Information from later observations will be kept quite separately, but appropriate
comparisons may be made provided that the framework remains consistent.
Bibliographic or other textual databases are accumulated over time - new material is added periodically
but probably very little will be removed. When designing such a database it will be important to
estimate and allow for the expected rate of growth, and perhaps to ensure that the more recent
information is given some priority.
An organisational database may not change very drastically in size, but it will be subject to frequent
updating (deletions, amendments, insertions) following relevant actions within the organisation itself.
Ensuring the accuracy, efficiency and security of this process is the main concern of many database
designers and administrators.

1.1.3. ..... in computer-readable form.

Information (often referred to in this context as data) has been processed by computer for over 30
years, using a variety of storage media. Some form of magnetic disc is likely to be used, since discs
currently provide the most cost-effective way of holding large quantities of data while allowing fast
access to any individual item. Other methods are obviously under development, notably optical storage
- CD Rom - which as yet does not give enough scope for updating in most database applications.

Database handling techniques grew out of earlier and simpler file processing techniques. A file consists
of an ordered collection of records; a database consists of two or more related files which we may wish
to process together in various different ways. It will store not only the individual records containing the
numbers or words needed for some application, but auxiliary information which will allow those records
to be accessed more quickly, or which will link related records or data items together. A database
designer may be required to choose how much and what sort of auxiliary information to store, using his
knowledge of how the database will be used.

Computer storage and processing implies the use of software: in the current context a DATABASE
MANAGEMENT SYSTEM (DBMS). The function of the DBMS is to store and retrieve information as
required by applications programs or users sitting at terminals, using the facilities provided by the
computer operating system. It is one of a number of software layers making computer facilities available
to users with perhaps comparatively little technical expertise.

1.2. Summary of DBMS functions.

1.2.1. Data definition.

This includes describing:

FILES
RECORD STRUCTURES
FIELD NAMES, TYPES and SIZES
RELATIONSHIPS between records of different types
Extra information to make searching efficient, e.g. INDEXES.

1.2.2. Data entry and validation.

Validation may include:

TYPE CHECKING
RANGE CHECKING
CONSISTENCY CHECKING

In an interactive data entry system, errors should be detected immediately - some can be prevented
altogether by keyboard monitoring - and recovery and re-entry permitted.

1.2.3. Updating.

Updating involves:

Record INSERTION
Record MODIFICATION
Record DELETION.

At the same time any back-ground data such as indexes or pointers from one record to another must be
changed to maintain consistency. Updating may take place interactively, or by submission of a file of
transaction records; handling these may require a program of some kind to be written, either in a
conventional programming language (a host language, e.g. COBOL or C) or in a language supplied by
the DBMS for constructing command files.

1.2.4. Data retrieval on the basis of selection criteria.

For this purpose most systems provide a QUERY LANGUAGE with which the characteristics of the
required records may be specified. Query languages differ enormously in power and sophistication but a
standard which is becoming increasingly common is based on the so-called RELATIONAL operations.
These allow:

selection of records on the basis of particular field values.


selection of particular fields from records to be displayed.
linking together records from two different files on the basis of matching field values.
Arbitrary combinations of these operators on the files making up a database can answer a very large
number of queries without requiring users to go into one record at a time processing.

1.2.5. Report definition.

Most systems provide facilities for describing how summary reports from the database are to be created
and laid out on paper. These may include obtaining:

COUNTS
TOTALS
AVERAGES
MAXIMUM and MINIMUM values

over particular CONTROL FIELDS. Also specification of PAGE and LINE LAYOUT, HEADINGS, PAGE-
NUMBERING, and other narrative to make the report comprehensible.

1.2.6. Security.

This has several aspects:

Ensuring that only those authorised to do so can see and modify the data, generally by some extension
of the password principle.
Ensuring the consistency of the database where many users are accessing and up-dating it
simultaneously.
Ensuring the existence and INTEGRITY of the database after hardware or software failure. At the very
least this involves making provision for back-up and re-loading.

1.3. Why have a database (and a DBMS)?

An organisation uses a computer to store and process information because it hopes for speed, accuracy,
efficiency, economy etc. beyond what could be achieved using clerical methods. The objectives of using
a DBMS must in essence be the same although the justifications may be more indirect.

Early computer applications were based on existing clerical methods and stored information was
partitioned in much the same way as manual files. But the computer's processing speed gave a
potential for RELATING data from different sources to produce valuable manage-ment information,
provided that some standardisation could be imposed over departmental boundaries. The idea emerged
of the integrated database as a central resource. Data is captured as close as possible to its point of
origin and transmitted to the database, then extracted by anyone within the organisation who requires
it. However many provisos have become attached to this idea in practice, it still provides possibly the
strongest motivation for the introduction of a DBMS in large organisations. The idea is that any piece of
information is entered and stored just once, eliminating duplications of effort and the possibility of
inconsistency between different departmental records.

Other advantages relate to the task of running a conventional Data Processing (DP) department.
Organisational requirements change over time, and applications programs laboriously developed need
to be periodically adjusted. A DBMS gives some protection against change by taking care of basic
storage and retrieval functions in a standard way, leaving the applications developer to concentrate on
specific organisational requirements. Changes in one of these areas need not have repercussions
elsewhere. In general a DBMS is a substantial piece of software, the result of many man-years of effort.
Because its development costs are spread over a number of purchasers it can probably provide more
facilities than would be economic in a one-off product.

The points discussed above are probably most relevant to the larger organisation using a DBMS for its
administrative functions - the environment in which the idea of databases first originated. In other
contexts the convenience of a DBMS may be the primary consideration. The purchaser of a small
business computer needs all the software to run it in package form, written so that the minimum of
expertise is required to use it. The same applies to departments (e.g. Research & Development) with
special needs which cannot be satisfied by a large centralised system. When comparing database
management systems it is obvious that some are designed in the expectation that professional DP staff
will be available to run them, while others are aimed at the total novice.

There are of course costs associated with adopting a DBMS. Actual monetary costs vary widely from, for
instance, a large multi-user Oracle system to a small PC-based filing system. In the first case the charge
will cover support, some training, extensive documentation and the provision of periodical upgrades to
the software; in the second case the purchaser will be on his own with the manual. But there is also a
tendency for the cost of software to reflect the cost of the hardware on which it is run!

Probably the main cost associated with acquiring a DBMS is due to the work involved in designing and
implementing systems to use it. In order to provide a general and powerful set of facilities for its users
any DBMS imposes restraints on the way information can be described and accessed, and demands
familiarity with the DATA MODEL which it supports and the command language which it provides to
define and manipulate data. Data models still in use are HIERARCHICAL (tree-structured), NETWORK and
RELATIONAL (tabular). Of these the last is the current favourite, providing a good basis for high-level
query languages and giving scope for the exploitation of special-purpose hardware in efficient large-
scale data handling.

This course will concentrate on the RELATIONAL model.

1.4. Data Base Project Development.

The conventional SYSTEMS LIFE CYCLE consists of:

1.4.1. ANALYSIS 1.4.2. DESIGN 1.4.3. DEVELOPMENT 1.4.4. IMPLEMENTATION 1.4.5.


MAINTENANCE

In practice these phases are not always sharply distinguished; for small projects it may not be necessary
to go formally through every one. The move from one phase to the next is essentially a move from the
general to the specific. At each stage, particularly where a DBMS is involved, we shall be concerned
both with information and with processes to be performed using that information.

1.4.1. Analysis

The outputs from this stage should be:

A CONCEPTUAL DATA MODEL describing the information which is used within the organisation but
not in computer-related terms. This level of data analysis will be considered in more detail later. One of
the problems with any systems design in a large organisation is that it must proceed in a piecemeal
manner - it is impossible to create a totally new GLOBAL system in one fell swoop, and each sub-system
must dovetail with others which may be at quite a different stage of development. The conceptual data
model provides a context within which more detailed design specifications can be produced, and should
help in maintaining consistency from one application area to another.

A CONCEPTUAL PROCESS MODEL describing the functions of the organisation in terms of


events (e.g. a purchase, a payment, a booking) and the processes which must be performed
within the organisation to handle them. This may lead to a more detailed functional
specification - describing the organisational requirements which must be satisfied, but not
how they are to be achieved.

1.4.2. Design

This stage should produce:

A LOGICAL DATA MODEL: a description of the data to be stored in the database, using the
conventions prescribed by the particular DBMS to be used. This is sometimes referred to as a SCHEMA
and some DBMSs also give facilities for defining SUB-SCHEMA or partitions of the overall schema.
Logical data models supported by present day DBMSs will be considered later.

A SYSTEM SPECIFICATION, describing in some detail what the proposed system should do. This will
now refer to COMPUTER PROCESSES, but probably in terms of INPUT and OUTPUT MESSAGES rather
than internal logic, describing, for instance, the effect of selecting an item from a menu, or any option
within a command driven system. Program modules are defined in terms of the screen displays and/or
reports which they generate. Note that the data referred to here has a temporary existence, in contrast
with what is stored in the database itself.

1.4.3. Development.

Specification of the database itself must now come down another level, to decisions about PHYSICAL
DATA STORAGE in particular files on particular devices. For this a knowledge of the computer operating
system, as well as the DBMS, is required. Conventional program development - coding, testing,
debugging etc. may also be done. If a totally packaged system has been purchased this may not be
necessary - it will simply be a matter of discovering how to use the command and query language
already supplied to store and retrieve data, generate reports and other outputs. Even here an element
of testing and debugging may be involved, since it is unlikely that the new user of a system will get it
exactly right the first time. It is certainly inadvisable for this sort of experimentation to take place using
a live database!

1.4.4. Implementation.

This puts the work of the previous three phases into everyday use. It involves such things as loading the
database with live rather than test data, staff training, probably the introduction of new working
practices. It is not unusual to have an old and a new system running side by side for a while so that
some back-up is available if the new system fails unexpectedly.

1.4.5. Maintenance.

Systems once implemented generally require further work done on them as time goes by, either to
correct original design faults or to accommodate changes in user requirements or operating constraints.
One of the objectives of using a DBMS is to reduce the impact of such changes - for example the data
can be physically re-arranged without affecting the logic of the programs which use it. Some DBMSs
provide utility programs to re-organise the data when either its physical or logical design must be
altered.

------------------------------------------------------------------------------------------------------------------------

The relational model.

The relational model consists of three components:

1. A Structural component -- a set of TABLES (also called RELATIONS).


2. MANIPULATIVE component consisting of a set of high-level operations which act upon
and produce whole tables.
3. A SET OF RULES for maintaining the INTEGRITY of the database.

The terminology associated with relational database theory originates from the branch of
mathematics called set theory although there are widely used synonyms for these precise,
mathematical terms.

Data structures are composed of two components which represent a model of the situation being
considered. These are (i) ENTITY TYPES - i.e. data group types, and (ii) the RELATIONSHIPS between
the entity types.
Entity types are represented by RELATIONS or BASE TABLES. These two terms are interchangeable -
a RELATION is the mathematical term for a TABLE.
A base table is loosely defined as an un-ordered collection of zero, one or more TUPLES (ROWS) each
of which consists of one or more un-ordered ATTRIBUTES (COLUMNS). All tuples are made up of
exactly the same set of attributes.

For the remainder of this discussion we shall use the more widely known terminology:
TABLE for RELATION
ROW for TUPLE
COLUMN for ATTRIBUTE
Each column is drawn from a DOMAIN, that is, a set of values from which the actual values are taken
(e.g. a set of car model names). More than one column in a table may draw its values from the same
domain.
A column entry in any row is SINGLE-VALUED, i.e. it contains exactly one item only (e.g. a surname).
Repeating groups, i.e. columns which contain sets of values rather than a single value, not allowed.
Each row of a table is uniquely identified by a PRIMARY KEY composed of one or more columns. This
implies that a table may not contain duplicate rows.
Note that, in general, a column, or group of columns, that uniquely identifies a row in a table is called
a CANDIDATE KEY. There may be more than one candidate key for a particular table; one of these will
be chosen as the primary key.
The ENTITY INTEGRITY RULE of the model states that no component of the primary key may contain a
NULL value.
A column, or combination of columns, that matches the primary key of another table is called a
FOREIGN KEY.
The REFERENTIAL INTEGRITY RULE of the model states that, for every foreign key value in a table
there must be a corresponding primary key value in another table in the database.
Only two kinds of table may be defined in a SQL schema; BASE TABLES and VIEWS. These are called
NAMED RELATIONS. Other tables, UNNAMED RELATIONS, may be derived from these by means of
relational operations such as JOINS and PROJECTIONS.
All tables are LOGICAL ENTITIES. Of these only base tables physically exist in that there exist
physically stored records, and possible physical access paths such as indexes, in one or more stored
files, that directly support the table in physical storage. Although standard techniques such as
HASHING, INDEXING, etc. will be used for implementation efficiency, the user of the data base should
require no knowledge of previously defined access paths.
Views and the results of all operations on tables - unnamed relations - are tables that exist as
LOGICAL DEFINITIONS, in terms of a view definition, or a [SELECT .. FROM .. WHERE .. ORDER BY]
sequence.
The term UPDATE has two meanings:
as a SQL operation in its own right which causes one or more columns in a table to be altered; in this
context it will always be shown in upper-case letters - UPDATE.
as a generic term used to include the SQL operations INSERT, DELETE and UPDATE; in this context it
will always be shown in lower-case letters.

Data Base Management System - DBMS

What is DBMS? what are different types of DBMS? Compare 3 types of DBMS.
A database management system (DBMS) is computer software designed for the purpose of managing
databases. Typical examples of DBMSs include Oracle, DB2, Microsoft Access, Microsoft SQL Server,
PostgreSQL, MySQL, FileMaker and Sybase Adaptive Server Enterprise. DBMSs are typically used by
Database administrators in the creation of Database systems.

Description

A DBMS is a complex set of software programs that controls the organization, storage, management,
and retrieval of data in a database. A DBMS includes:

1. A modeling language to define the schema of each database hosted in the DBMS, according to the
DBMS data model.

The four most common types of organizations are the hierarchical, network, relational and object
models. Inverted lists and other methods are also used.

A given database management system may provide one or more of the four models. The optimal
structure depends on the natural organization of the application's data, and on the application's
requirements (which include transaction rate (speed), reliability, maintainability, scalability, and cost).
The dominant model in use today is the ad hoc one embedded in SQL, despite the objections of purists
who believe this model is a corruption of the relational model, since it violates several of its
fundamental principles for the sake of practicality and performance. Many DBMSs also support the Open
Database Connectivity API that supports a standard way for programmers to access the DBMS.

2. Data structures (fields, records, files and objects) optimized to deal with very large amounts of data
stored on a permanent data storage device (which implies relatively slow access compared to volatile
main memory).

3. A database query language and report writer to allow users to interactively interrogate the database,
analyze its data and update it according to the users privileges on data.

It also controls the security of the database.


Data security prevents unauthorized users from viewing or updating the database. Using passwords,
users are allowed access to the entire database or subsets of it called subschemas. For example, an
employee database can contain all the data about an individual employee, but one group of users may
be authorized to view only payroll data, while others are allowed access to only work history and
medical data.
If the DBMS provides a way to interactively enter and update the database, as well as interrogate it, this
capability allows for managing personal databases. However, it may not leave an audit trail of actions or
provide the kinds of controls necessary in a multi-user organization. These controls are only available
when a set of application programs are customized for each data entry and updating function.

4. A transaction mechanism, that ideally would guarantee the ACID properties, in order to ensure data
integrity, despite concurrent user accesses (concurrency control), and faults (fault tolerance).
It also maintains the integrity of the data in the database.

The DBMS can maintain the integrity of the database by not allowing more than one user to update the
same record at the same time. The DBMS can help prevent duplicate records via unique index
constraints; for example, no two customers with the same customer numbers (key fields) can be
entered into the database. See ACID properties for more information (Redundancy avoidance).

The DBMS accepts requests for data from the application program and instructs the operating system to
transfer the appropriate data.

When a DBMS is used, information systems can be changed much more easily as the organization's
information requirements change. New categories of data can be added to the database without
disruption to the existing system.

Organizations may use one kind of DBMS for daily transaction processing and then move the detail onto
another computer that uses another DBMS better suited for random inquiries and analysis. Overall
systems design decisions are performed by data administrators and systems analysts. Detailed
database design is performed by database administrators.

Database servers are specially designed computers that hold the actual databases and run only the
DBMS and related software. Database servers are usually multiprocessor computers, with RAID disk
arrays used for stable storage. Connected to one or more servers via a high-speed channel, hardware
database accelerators are also used in large volume transaction processing environments.

DBMS's are found at the heart of most database applications. Sometimes DBMSs are built around a
private multitasking kernel with built-in networking support although nowadays these functions are left
to the operating system.

Features and Abilities Of DBMS

One can characterize a DBMS as an "attribute management system" where attributes are small chunks
of information that describe something. For example, "color" is an attribute of a car. The value of the
attribute may be a color such as "red", "blue", "silver", etc. Lately databases have been modified to
accept large or unstructured (pre-digested or pre-categorized) information as well, such as images and
text documents. However, the main focus is still on descriptive attributes.
DBMS roll together frequently-needed services or features of attribute management. This allows one to
get powerful functionality "out of the box" rather than program each from scratch or add and integrate
them incrementally. Such features include:

Query ability

Querying is the process of requesting attribute information from various perspectives and combinations
of factors. Example: "How many 2-door cars in Texas are green?"

A database query language and report writer to allow users to interactively interrogate the database,
analyze its data and update it according to the users privileges on data. It also controls the security of
the database. Data security prevents unauthorized users from viewing or updating the database. Using
passwords, users are allowed access to the entire database or subsets of it called subschemas. For
example, an employee database can contain all the data about an individual employee, but one group
of users may be authorized to view only payroll data, while others are allowed access to only work
history and medical data. If the DBMS provides a way to interactively enter and update the database, as
well as interrogate it, this capability allows for managing personal databases. However, it may not leave
an audit trail of actions or provide the kinds of controls necessary in a multi-user organization. These
controls are only available when a set of application programs are customized for each data entry and
updating function.

Backup and replication

Copies of attributes need to be made regularly in case primary disks or other equipment fails. A periodic
copy of attributes may also be created for a distant organization that cannot readily access the original.
DBMS usually provide utilities to facilitate the process of extracting and disseminating attribute sets.

When data is replicated between database servers, so that the information remains consistent
throughout the database system and users cannot tell or even know which server in the DBMS they are
using, the system is said to exhibit replication transparency.

Rule enforcement

Often one wants to apply rules to attributes so that the attributes are clean and reliable. For example,
we may have a rule that says each car can have only one engine associated with it (identified by Engine
Number). If somebody tries to associate a second engine with a given car, we want the DBMS to deny
such a request and display an error message. However, with changes in the model specification such
as, in this example, hybrid gas-electric cars, rules may need to change. Ideally such rules should be able
to be added and removed as needed without significant data layout redesign.

Security

Often it is desirable to limit who can see or change which attributes or groups of attributes. This may be
managed directly by individual, or by the assignment of individuals and privileges to groups, or (in the
most elaborate models) through the assignment of individuals and groups to roles which are then
granted entitlements.

Computation

There are common computations requested on attributes such as counting, summing, averaging,
sorting, grouping, cross-referencing, etc. Rather than have each computer application implement these
from scratch, they can rely on the DBMS to supply such calculations.

Change and access logging

Often one wants to know who accessed what attributes, what was changed, and when it was changed.
Logging services allow this by keeping a record of access occurrences and changes.

Automated optimization
If there are frequently occurring usage patterns or requests, some DBMS can adjust themselves to
improve the speed of those interactions. In some cases the DBMS will merely provide tools to monitor
performance, allowing a human expert to make the necessary adjustments after reviewing the statistics
collected.....

Meta-data repository

Metadata (also spelled meta-data) is information about information. For example, a listing that
describes what attributes are allowed to be in data sets is called "meta-information".

History

Databases have been in use since the earliest days of electronic computing. Unlike modern systems
which can be applied to widely different databases and needs, the vast majority of older systems were
tightly linked to the custom databases in order to gain speed at the expense of flexibility. Originally
DBMSs were found only in large organizations with the computer hardware needed to support large
data sets.

----------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
What is a Database?
A database management system, or DBMS, gives the user access to their data and helps them
transform the data into information. Such database management systems include dBase, Paradox, IMS,
and Oracle. These systems allow users to create, update, and extract information from their databases.
Compared to a manual filing system, the biggest advantages to a computerized database system are
speed, accuracy, and accessibility.
A database is a structured collection of data. Data refers to the characteristics of people, things, and
events. Oracle stores each data item in its own field. For example, a person's first name, date of birth,
and their postal code are each stored in separate fields. The name of a field usually reflects its contents.
A postal code field might be named POSTAL-CODE or PSTL_CD. Each DBMS has its own rules for naming
the data fields.

-----------------------------------------------------------------------------------------------------------------------
relational database

- A relational database is a collection of data items organized as a set of formally-described


tables from which data can be accessed or reassembled in many different ways without having to
reorganize the database tables. The relational database was invented by E. F. Codd at IBM in 1970. The
standard user and application program interface to a relational database is the structured query
language (SQL). SQL statements are used both for interactive queries for information from a relational
database and for gathering data for reports. In addition to being relatively easy to create and access, a
relational database has the important advantage of being easy to extend. After the original database
creation, a new data category can be added without requiring that all existing applications be modified.
A relational database is a set of tables containing data fitted into predefined categories. Each table
(which is sometimes called a relation) contains one or more data categories in columns. Each row
contains a unique instance of data for the categories defined by the columns. For example, a typical
business order entry database would include a table that described a customer with columns for name,
address, phone number, and so forth. Another table would describe an order: product, customer, date,
sales price, and so forth. A user of the database could obtain a view of the database that fitted the
user's needs. For example, a branch office manager might like a view or report on all customers that
had bought products after a certain date. A financial services manager in the same company could,
from the same tables, obtain a report on accounts that needed to be paid.
When creating a relational database, you can define the domain of possible values in a data column and
further constraints that may apply to that data value. For example, a domain of possible customers
could allow up to ten possible customer names but be constrained in one table to allowing only three of
these customer names to be specifiable. The definition of a relational database results in a table of
metadata or formal descriptions of the tables, columns, domains, and constraints.

--------------------------------------------------------------------------------------
Database Models: Hierarcical, Network, Relational, Object-Oriented ...
Hierarchical Model

The hierarchical data model organizes data in a tree structure. There is a hierarchy of parent and
child data segments. This structure implies that a record can have repeating information, generally in
the child data segments. Data in a series of records, which have a set of field values attached to it. It
collects all the instances of a specific record together as a record type. These record types are the
equivalent of tables in the relational model, and with the individual records being the equivalent of
rows. To create links between these record types, the hierarchical model uses Parent Child
Relationships. These are a 1:N mapping between record types. This is done by using trees, like set
theory used in the relational model, "borrowed" from maths. For example, an organization might
store information about an employee, such as name, employee number, department, salary. The
organization might also store information about an employee's children, such as name and date of
birth. The employee and children data forms a hierarchy, where the employee data represents the
parent segment and the children data represents the child segment. If an employee has three
children, then there would be three child segments associated with one employee segment. In a
hierarchical database the parent-child relationship is one to many. This restricts a child segment to
having only one parent segment. Hierarchical DBMSs were popular from the late 1960s, with the
introduction of IBM's Information Management System (IMS) DBMS, through the 1970s.

Network Model

The popularity of the network data model coincided with the popularity of the hierarchical data
model. Some data were more naturally modeled with more than one parent per child. So, the
network model permitted the modeling of many-to-many relationships in data. In 1971, the
Conference on Data Systems Languages (CODASYL) formally defined the network model. The basic
data modeling construct in the network model is the set construct. A set consists of an owner record
type, a set name, and a member record type. A member record type can have that role in more than
one set, hence the multiparent concept is supported. An owner record type can also be a member or
owner in another set. The data model is a simple network, and link and intersection record types
(called junction records by IDMS) may exist, as well as sets between them . Thus, the complete
network of relationships is represented by several pairwise sets; in each set some (one) record type
is owner (at the tail of the network arrow) and one or more record types are members (at the head of
the relationship arrow). Usually, a set defines a 1:M relationship, although 1:1 is permitted. The
CODASYL network model is based on mathematical set theory.

Relational Model

(RDBMS - relational database management system) A database based on the relational model
developed by E.F. Codd. A relational database allows the definition of data structures, storage and
retrieval operations and integrity constraints. In such a database the data and relations between
them are organised in tables. A table is a collection of records and each record in a table contains the
same fields.

Properties of Relational Tables:


Values Are Atomic
Each Row is Unique
Column Values Are of the Same Kind
The Sequence of Columns is Insignificant
The Sequence of Rows is Insignificant
Each Column Has a Unique Name

Certain fields may be designated as keys, which means that searches for specific values of that field
will use indexing to speed them up. Where fields in two different tables take values from the same
set, a join operation can be performed to select related records in the two tables by matching values
in those fields. Often, but not always, the fields will have the same name in both tables. For example,
an "orders" table might contain (customer-ID, product-code) pairs and a "products" table might
contain (product-code, price) pairs so to calculate a given customer's bill you would sum the prices of
all products ordered by that customer by joining on the product-code fields of the two tables. This
can be extended to joining multiple tables on multiple fields. Because these relationships are only
specified at retreival time, relational databases are classed as dynamic database management
system. The RELATIONAL database model is based on the Relational Algebra.

Object/Relational Model

Object/relational database management systems (ORDBMSs) add new object storage capabilities to
the relational systems at the core of modern information systems. These new facilities integrate
management of traditional fielded data, complex objects such as time-series and geospatial data and
diverse binary media such as audio, video, images, and applets. By encapsulating methods with data
structures, an ORDBMS server can execute comple x analytical and data manipulation operations to
search and transform multimedia and other complex objects.

As an evolutionary technology, the object/relational (OR) approach has inherited the robust
transaction- and performance-management features of it s relational ancestor and the flexibility of its
object-oriented cousin. Database designers can work with familiar tabular structures and data
definition languages (DDLs) while assimilating new object-management possibi lities. Query and
procedural languages and call interfaces in ORDBMSs are familiar: SQL3, vendor procedural
languages, and ODBC, JDBC, and proprie tary call interfaces are all extensions of RDBMS languages
and interfaces. And the leading vendors are, of course, quite well known: IBM, Inform ix, and Oracle.

Object-Oriented Model

Object DBMSs add database functionality to object programming languages. They bring much more
than persistent storage of programming language objects. Object DBMSs extend the semantics of the
C++, Smalltalk and Java object programming languages to provide full-featured database
programming capability, while retaining native language compatibility. A major benefit of this
approach is the unification of the application and database development into a seamless data model
and language environment. As a result, applications require less code, use more natural data
modeling, and code bases are easier to maintain. Object developers can write complete database
applications with a modest amount of additional effort.

According to Rao (1994), "The object-oriented database (OODB) paradigm is the combination of
object-oriented programming language (OOPL) systems and persistent systems. The power of the
OODB comes from the seamless treatment of both persistent data, as found in databases, and
transient data, as found in executing programs."

In contrast to a relational DBMS where a complex data structure must be flattened out to fit into
tables or joined together from those tables to form the in-memory structure, object DBMSs have no
performance overhead to store or retrieve a web or hierarchy of interrelated objects. This one-to-one
mapping of object programming language objects to database objects has two benefits over other
storage approaches: it provides higher performance management of objects, and it enables better
management of the complex interrelationships between objects. This makes object DBMSs better
suited to support applications such as financial portfolio risk analysis systems, telecommunications
service applications, world wide web document structures, design and manufacturing systems, and
hospital patient record systems, which have complex relationships between data.

Semistructured Model
In semistructured data model, the information that is normally associated with a schema is contained
within the data, which is sometimes called ``self-describing''. In such database there is no clear
separation between the data and the schema, and the degree to which it is structured depends on
the application. In some forms of semistructured data there is no separate schema, in others it exists
but only places loose constraints on the data. Semi-structured data is naturally modelled in terms of
graphs which contain labels which give semantics to its underlying structure. Such databases
subsume the modelling power of recent extensions of flat relational databases, to nested databases
which allow the nesting (or encapsulation) of entities, and to object databases which, in addition,
allow cyclic references between objects.

Semistructured data has recently emerged as an important topic of study for a variety of reasons.
First, there are data sources such as the Web, which we would like to treat as databases but which
cannot be constrained by a schema. Second, it may be desirable to have an extremely flexible
format for data exchange between disparate databases. Third, even when dealing with structured
data, it may be helpful to view it as semistructured for the purposes of browsing.

Associative Model

The associative model divides the real-world things about which data is to be recorded into two sorts:
Entities are things that have discrete, independent existence. An entity’s existence does not depend
on any other thing. Associations are things whose existence depends on one or more other things,
such that if any of those things ceases to exist, then the thing itself ceases to exist or becomes
meaningless.
An associative database comprises two data structures:
1. A set of items, each of which has a unique identifier, a name and a type.
2. A set of links, each of which has a unique identifier, together with the unique identifiers of three
other things, that represent the source source, verb and target of a fact that is recorded about the
source in the database. Each of the three things identified by the source, verb and target may be
either a link or an item.
For more information see: The Associative Model of Data

Entity-Attribute-Value (EAV) data model

The best way to understand the rationale of EAV design is to understand row modeling (of which EAV
is a generalized form). Consider a supermarket database that must manage thousands of products
and brands, many of which have a transitory existence. Here, it is intuitively obvious that product
names should not be hard-coded as names of columns in tables. Instead, one stores product
descriptions in a Products table: purchases/sales of individual items are recorded in other tables as
separate rows with a product ID referencing this table. Conceptually an EAV design involves a single
table with three columns, an entity (such as an olfactory receptor ID), an attribute (such as species,
which is actually a pointer into the metadata table) and a value for the attribute (e.g., rat). In EAV
design, one row stores a single fact. In a conventional table that has one column per attribute, by
contrast, one row stores a set of facts. EAV design is appropriate when the number of parameters
that potentially apply to an entity is vastly more than those that actually apply to an individual entity.
For more information see: The EAV/CR Model of Data

Context Model

The context data model combines features of all the above models. It can be considered as a
collection of object-oriented, network and semistructured models or as some kind of object database.
In other words this is a flexible model, you can use any type of database structure depending on
task. Such data model has been implemented in DBMS ConteXt.
The fundamental unit of information storage of ConteXt is a CLASS. Class contains METHODS and
describes OBJECT. The Object contains FIELDS and PROPERTY. The field may be composite, in this
case the field contains SubFields etc. The property is a set of fields that belongs to particular Object.
(similar to AVL database). In other words, fields are permanent part of Object but Property is its
variable part.
The header of Class contains the definition of the internal structure of the Object, which includes the
description of each field, such as their type, length, attributes and name. Context data model has a
set of predefined types as well as user defined types. The predefined types include not only
character strings, texts and digits but also pointers (references) and aggregate types (structures).

A context model comprises three main data types: REGULAR, VIRTUAL and REFERENCE. A regular
(local) field can be ATOMIC or COMPOSITE. The atomic field has no inner structure. In contrast, a
composite field may have a complex structure, and its type is described in the header of Class. The
composite fields are divided into STATIC and DYNAMIC. The type of a static composite field is stored
in the header and is permanent. Description of the type of a dynamic composite field is stored within
the Object and can vary from Object to Object.

Like a NETWORK database, apart from the fields containing the information directly, context
database has fields storing a place where this information can be found, i.e. POINTER (link, reference)
which can point to an Object in this or another Class. Because main addressed unit of context
database is an Object, the pointer is made to Object instead of a field of this Object. The pointers are
divided on STATIC and DYNAMIC. All pointers that belong to a particular static pointer type point to
the same Class (albeit, possibly, to different Object). In this case, the Class name is an integral part
of the that pointer type. A dynamic pointer type describes pointers that may refer to different
Classes. The Class, which may be linked through a pointer, can reside on the same or any other
computer on the local area network. There is no hierarchy between Classes and the pointer can link
to any Class, including its own.

In contrast to pure object-oriented databases, context databases is not so coupled to the


programming language and doesn't support methods directly. Instead, method invocation is partially
supported through the concept of VIRTUAL fields.

A VIRTUAL field is like a regular field: it can be read or written into. However, this field is not
physically stored in the database, and in it does not have a type described in the scheme. A read
operation on a virtual field is intercepted by the DBMS, which invokes a method associated with the
field and the result produced by that method is returned. If no method is defined for the virtual field,
the field will be blank. The METHODS is a subroutine written in C++ by an application programmer.
Similarly, a write operation on a virtual field invokes an appropriate method, which can changes the
value of the field. The current value of virtual fields is maintained by a run-time process; it is not
preserved between sessions. In object-oriented terms, virtual fields represent just two public
methods: reading and writing. Experience shows, however, that this is often enough in practical
applications. From the DBMS point of view, virtual fields provide transparent interface to such
methods via an aplication written by application programer.

A context database that does not have composite or pointer fields and property is essentially
RELATIONAL. With static composite and pointer fields, context database become OBJECT-ORIENTED.
If the context database has only Property in this case it is an ENTITY-ATTRIBUTE-VALUE database.
With dynamic composite fields, a context database becomes what is now known as a
SEMISTRUCTURED database. If the database has all available types... in this case it is ConteXt
database!
Database - database models

A database model is a theory or specification describing how a database is structured and used. Several
such models have been suggested.
Common models include:
Hierarchical model
Network model
Relational model
Entity-relationship
Object-Relational model
Object model

Other models include:


Associative
Concept-oriented
Entity-Attribute-Value
Multi-dimensional model
Semi-structured
Star schema
XML database
--------------------------------------------------------------------------------------------------------------

Database
A computer database is a structured collection of records or data that is stored in a computer system so
that a computer program or person using a query language can consult it to answer queries. The
records retrieved in answer to queries are information that can be used to make decisions. The
computer program used to manage and query a database is known as a database management system
(DBMS). The properties and design of database systems are included in the study of information
science.
A typical query could be to answer questions such as, "How many hamburgers with two or more beef
patties were sold in the month of March in New Jersey?". To answer such a question, the database would
have to store information about hamburgers sold, including number of patties, sales date, and the
region The term "database" originated within the computing discipline. Although its meaning has been
broadened by popular use, even to include non-electronic databases, this article is about computer
databases. Database-like collections of information existed well before the Industrial Revolution in the
form of ledgers, sales receipts and other business-related collections of data.
The central concept of a database is that of a collection of records, or pieces of information. Typically,
for a given database, there is a structural description of the type of facts held in that database: this
description is known as a schema. The schema describes the objects that are represented in the
database, and the relationships among them. There are a number of different ways of organizing a
schema, that is, of modelling the database structure: these are known as database models (or data
models). The model in most common use today is the relational model, which in layman's terms
represents all information in the form of multiple related tables each consisting of rows and columns
(the true definition uses mathematical terminology). This model represents relationships by the use of
values common to more than one table. Other models such as the hierarchical model and the network
model use a more explicit representation of relationships.
The term database refers to the collection of related records, and the software should be referred to as
the database management system or DBMS. When the context is unambiguous, however, many
database administrators and programmers use the term database to cover both meanings.

Many professionals consider a collection of data to constitute a database only if it has certain properties:
for example, if the data is managed to ensure its integrity and quality, if it allows shared access by a
community of users, if it has a schema, or if it supports a query language. However, there is no
definition of these properties that is universally agreed upon.
Database management systems are usually categorized according to the data model that they
support: relational, object-relational, network, and so on. The data model will tend to determine the
query languages that are available to access the database. A great deal of the internal engineering of a
DBMS, however, is independent of the data model, and is concerned with managing factors such as
performance, concurrency, integrity, and recovery from hardware failures. In these areas there are large
differences between products.
Database models
Various techniques are used to model data structure.
Most database systems are built around one particular data model, although it is increasingly common
for products to offer support for more than one model. For any one logical model various physical
implementations may be possible, and most products will offer the user some level of control in tuning
the physical implementation, since the choices that are made have a significant effect on performance.
An example is the relational model: all serious implementations of the relational model allow the
creation of indexes which provide fast access to rows in a table if the values of certain columns are
known.
Flat model
The flat (or table) model consists of a single, two-dimensional array of data elements, where all
members of a given column are assumed to be similar values, and all members of a row are assumed to
be related to one another.
Hierarchical model
In a hierarchical model, data is organized into a tree-like structure, implying a single upward link in
each record to describe the nesting, and a sort field to keep the records in a particular order in each
same-level list.
Network model
The network model tends to store records with links to other records. Associations are tracked via
"pointers". These pointers can be node numbers or disk addresses. Most network databases tend to also
include some form of hierarchical model.
Relational model
Three key terms are used extensively in relational database models: relations, attributes, and domains.
A relation is a table with columns and rows. The named columns of the relation are called attributes, and
the domain is the set of values the attributes are allowed to take.
The basic data structure of the relational model is the table, where information about a particular entity
(say, an employee) is represented in columns and rows (also called tuples). Thus, the "relation" in
"relational database" refers to the various tables in the database; a relation is a set of tuples. The
columns enumerate the various attributes of the entity (the employee's name, address or phone
number, for example), and a row is an actual instance of the entity (a specific employee) that is
represented by the relation. As a result, each tuple of the employee table represents various attributes
of a single employee.

All relations (and, thus, tables) in a relational database have to adhere to some basic rules to qualify as
relations. First, the ordering of columns is immaterial in a table. Second, there can't be identical tuples
or rows in a table. And third, each tuple will contain a single value for each of its attributes.
A relational database contains multiple tables, each similar to the one in the "flat" database model. One
of the strengths of the relational model is that, in principle, any value occurring in two different records
(belonging to the same table or to different tables), implies a relationship among those two records. Yet,
in order to enforce explicit integrity constraints, relationships between records in tables can also be
defined explicitly, by identifying or non-identifying parent-child relationships characterized by assigning
cardinality (1:1, (0)1:M, M:M). Tables can also have a designated single attribute or a set of attributes
that can act as a "key", which can be used to uniquely identify each tuple in the table.
A key that can be used to uniquely identify a row in a table is called a primary key. Keys are commonly
used to join or combine data from two or more tables. For example, an Employee table may contain a
column named Location which contains a value that matches the key of a Location table. Keys are also
critical in the creation of indices, which facilitate fast retrieval of data from large tables. Any column can
be a key, or multiple columns can be grouped together into a compound key. It is not necessary to
define all the keys in advance; a column can be used as a key even if it was not originally intended to be
one.
Relational operations
Users (or programs) request data from a relational database by sending it a query that is written in a
special language, usually a dialect of SQL. Although SQL was originally intended for end-users, it is much
more common for SQL queries to be embedded into software that provides an easier user interface.
Many web sites, such as Wikipedia, perform SQL queries when generating pages.
In response to a query, the database returns a result set, which is just a list of rows containing the
answers. The simplest query is just to return all the rows from a table, but more often, the rows are
filtered in some way to return just the answer wanted. Often, data from multiple tables are combined
into one, by doing a join. There are a number of relational operations in addition to join.
Normal forms
Main article: Database normalization
Relations are classified based upon the types of anomalies to which they're vulnerable. A database
that's in the first normal form is vulnerable to all types of anomalies, while a database that's in the
domain/key normal form has no modification anomalies. Normal forms are hierarchical in nature. That
is, the lowest level is the first normal form, and the database cannot meet the requirements for higher
level normal forms without first having met all the requirements of the lesser normal form.
Object database models
In recent years, the object-oriented paradigm has been applied to database technology, creating a new
programming model known as object databases. These databases attempt to bring the database world
and the application programming world closer together, in particular by ensuring that the database uses
the same type system as the application program. This aims to avoid the overhead (sometimes referred
to as the impedance mismatch) of converting information between its representation in the database
(for example as rows in tables) and its representation in the application program (typically as objects).
At the same time, object databases attempt to introduce the key ideas of object programming, such as
encapsulation and polymorphism, into the world of databases.
A variety of these ways have been tried for storing objects in a database. Some products have
approached the problem from the application programming end, by making the objects manipulated by
the program persistent. This also typically requires the addition of some kind of query language, since
conventional programming languages do not have the ability to find objects based on their information
content. Others have attacked the problem from the database end, by defining an object-oriented data
model for the database, and defining a database programming language that allows full programming
capabilities as well as traditional query facilities.
Post-relational database models
Several products have been identified as post-relational because the data model incorporates relations
but is not constrained by the Information Principle, requiring that all information is represented by data
values in relations. Products using a post-relational data model typically employ a model that actually
pre-dates the relational model. These might be identified as a directed graph with trees on the nodes.

Examples of models that could be classified as post-relational are PICK aka MultiValue, and MUMPS.
Fuzzy databases
It is possible to develop fuzzy relational databases. Basically, a fuzzy database is a database using
fuzzy logic, for example with fuzzy attributes, which may be defined as attributes of a item, row or
object in a database, which allow to store fuzzy information (imprecise or uncertain data). There are
many forms of adding flexibility in fuzzy databases. The simplest technique is to add a fuzzy
membership degree to each record, i.e. an attribute in the range [0,1]. However, there are other kinds
of databases allowing fuzzy values to be stored in fuzzy attributes using fuzzy sets (including fuzzy
spatial datatypes), possibility distributions or fuzzy degrees associated to some attributes and with
different meanings (membership degree, importance degree, fulfillment degree...). Sometimes, the
expression “fuzzy databases” is used for classical databases with fuzzy queries or with other fuzzy
aspects, such as constraints.
The first fuzzy relational database, FRDB, appeared in Maria Zemankova's dissertation. After, some
other models arose like the Buckles-Petry model, the Prade-Testemale Model, the Umano-Fukami model
or the GEFRED model by J.M. Medina, M.A. Vila et al. In the context of fuzzy databases, some fuzzy
querying languages have been defined, highlighting the SQLf by P. Bosc et al. and the FSQL by J. Galindo
et al. These languages define some structures in order to include fuzzy aspects in the SQL statements,
like fuzzy conditions, fuzzy comparators, fuzzy constants, fuzzy constraints, fuzzy thresholds, linguistic
labels and so on.

----------------------------------------------------------------------------------------------------------------------------------------

We shall be illustrating the theory and principles of relation database systems with reference to the
Oracle proprietary Relational Database Management Systems (RDBMS) and Structured Query Language
(SQL), the most commonly used database language used to manipulate the structure of the database
and data held within it
The relational model.
The relational model consists of three components:

A Structural component -- a set of TABLES (also called RELATIONS).


MANIPULATIVE component consisting of a set of high-level operations which act upon and produce
whole tables.
A SET OF RULES for maintaining the INTEGRITY of the database.
The terminology associated with relational database theory originates from the branch of mathematics
called set theory although there are widely used synonyms for these precise, mathematical terms.

Data structures are composed of two components which represent a model of the situation being
considered. These are (i) ENTITY TYPES - i.e. data group types, and (ii) the RELATIONSHIPS between the
entity types.
Entity types are represented by RELATIONS or BASE TABLES. These two terms are interchangeable - a
RELATION is the mathematical term for a TABLE.
A base table is loosely defined as an un-ordered collection of zero, one or more TUPLES (ROWS) each of
which consists of one or more un-ordered ATTRIBUTES (COLUMNS). All tuples are made up of exactly the
same set of attributes. For the remainder of this discussion we shall use the more widely known
terminology:
TABLE for RELATION
ROW for TUPLE
COLUMN for ATTRIBUTE
Each column is drawn from a DOMAIN, that is, a set of values from which the actual values are taken
(e.g. a set of car model names). More than one column in a table may draw its values from the same
domain.
A column entry in any row is SINGLE-VALUED, i.e. it contains exactly one item only (e.g. a surname).
Repeating groups, i.e. columns which contain sets of values rather than a single value, not allowed.
Each row of a table is uniquely identified by a PRIMARY KEY composed of one or more columns. This
implies that a table may not contain duplicate rows.
Note that, in general, a column, or group of columns, that uniquely identifies a row in a table is called a
CANDIDATE KEY. There may be more than one candidate key for a particular table; one of these will be
chosen as the primary key.
The ENTITY INTEGRITY RULE of the model states that no component of the primary key may contain a
NULL value.
A column, or combination of columns, that matches the primary key of another table is called a
FOREIGN KEY.
The REFERENTIAL INTEGRITY RULE of the model states that, for every foreign key value in a table there
must be a corresponding primary key value in another table in the database.
Only two kinds of table may be defined in a SQL schema; BASE TABLES and VIEWS. These are called
NAMED RELATIONS. Other tables, UNNAMED RELATIONS, may be derived from these by means of
relational operations such as JOINS and PROJECTIONS.
All tables are LOGICAL ENTITIES. Of these only base tables physically exist in that there exist physically
stored records, and possible physical access paths such as indexes, in one or more stored files, that
directly support the table in physical storage. Although standard techniques such as HASHING,
INDEXING, etc. will be used for implementation efficiency, the user of the data base should require no
knowledge of previously defined access paths.
Views and the results of all operations on tables - unnamed relations - are tables that exist as LOGICAL
DEFINITIONS, in terms of a view definition, or a [SELECT .. FROM .. WHERE .. ORDER BY] sequence.
The term UPDATE has two meanings:
as a SQL operation in its own right which causes one or more columns in a table to be altered; in this
context it will always be shown in upper-case letters - UPDATE.
as a generic term used to include the SQL operations INSERT, DELETE and UPDATE; in this context it will
always be shown in lower-case letters

Você também pode gostar