Você está na página 1de 23

Distributed Database

Introduction

A major motivation behind the development of database systems is the desire to integrate the operational data of an organization and to provide controlled access to the data. Although integration and controlled access may imply centralization, but this is not the intention. In fact, the development of computer networks promotes a decentralized mode of work. This decentralized approach mirrors the organizational structure of many companies, which are logically distributed into divisions, departments, projects, and so on, and physically distributed into offices, plants, factories, where each unit maintains its own operational data.
The sharing ability of the data and the efficiency of data access should be improved by the development of a distributed database system that reflects this organizational structure, makes the data in all units accessible, and stores data proximate to the location where it is most frequently used.

Distributed DBMS
The software system that permits the management of the distributed database and makes the distribution transparent to users.
A Distributed Database Management System (DDBMS) consists of a single logical database that is split into a number of fragments. Each fragment is stored on one or more computers under the control of a separate DBMS, with the computers connected by a communications network.

Each site is capable of independently processing user requests that require access to local data and is also capable of processing data stored on other computers in the network.
Users access the distributed database via applications. Applications are classified as those that do not require data from other sites (local Applications) and those that do require data from other sites (global applications). We require a DDBMS to have at least one global application.

Banking Example Using distributed database technology, a bank may implement their database system on a number of separate computer systems rather than a single, centralized mainframe. The computer systems may be located at each local branch office: for example, Amritsar, Patiala, and Jalandhar. A network linking the computer will enable the branches to communicate with each other, and DDBMS will enable them to access data stored at another branch office. Thus, a client living in Amritsar can also check his/her account during the stay in Patiala or Jalandhar.

Data Allocation There are four alternative strategies regarding the placement of data:

Centralized
Fragmented Complete replication

Selective replication.

We now compare these strategies using the strategic objective identified above.

Centralized This strategy consists of a single database and DBMS stored at one site with users distributed across the network (we referred to this previously as distributed processing). Locality of reference is at its lowest as all sites, except the central site, have to use the network for all data accesses. This also means that communication costs are high. Reliability and availability are low, as a failure of the central site results in the loss of the entire database system. Fragmented (or partitioned)

This strategy partitions the database into disjoint fragments, with each fragment assigned to one site.
If data items are located at the site where they are used most frequently, locality of reference is high. As there is no replication, storage cost are low; similarly, reliability and availability are low, although they are higher than in the centralized case; as the failure of a site results in the loss of only that sites data. Performance should be good and communications costs low if the distribution is designed properly.

Advantages of fragmentation Usage Efficiency Parallelism Security

Disadvantages of fragmentation
Performance Integrity

Data Fragmentation

If relation r is fragmented, r is divided into a number of fragments r1, r2 rn. These fragments contain sufficient information to allow reconstruction of the original relation r. As we shall see, this reconstruction can take place through the application of either the union operation or a special type of join operation on the various fragments.
There are three different schemes for fragmenting a relation:

Horizontal fragmentation
Vertical fragmentation Mixed fragmentation

We shall illustrate these approaches by fragmenting the relation document, with Example Schema:
EMP (EMPNO, ENAME, JOB, MGR, HIREDATE, SAL, COMM, DEPTNO)

Horizontal Fragmentation
In horizontal fragmentation, the relations (tables) are divided horizontally. That is some of the tuples of the relation is placed in one computer and rest are placed in other computers. A horizontal fragment is a subset of the total tuples in that relation To construct the relation R from various horizontal fragments, a UNION operation can be performed on the fragments. Such a fragment containing all the tuples of relation R is called a complete horizontal fragment.

For example, suppose that the relation r is the EMP relation of above. This relation can be divided into n different fragments, each of which consists of tuples of employee belonging to a particular department. EMP relation has three departments 10,20 and 30 results three different fragments:
EMP1=DEPTNO=10 (EMP) EMP2=DEPTNO=20 (EMP)

EMP3=DEPTNO=30 (EMP)
These three fragments are shown below. Fragment r1 is stored in the department number 10 site, fragment r2 is stored in the department number 20 site and so on r3 is stored at department number 30 site. :

These fragments are shown below

We obtain the reconstruction of the relation r by taking the union of all fragments; that is, R=r1r2..rn

Vertical Fragmentation
In vertical fragmentation, some of the columns (attributes) are stored in one computer and rest are stored in other computers. This is because each site may not need all the attributes of a relation. A vertical fragment keeps only certain attributes of the relation. The fragmentation should be done such that we can reconstruct relation r from the fragments by taking the natural join

r=r1*r2*r3rn

Mixed Fragmentation

Mixed fragmentation, also known as Hybrid fragmentation, intermixes the horizontal and vertical fragmentation.
The relation r is divided into a number of fragment relations r1, r2..rn. Each fragment is obtained as the result of application of either the horizontal fragmentation or vertical fragmentation scheme on relation r, or on a fragment of r that was obtained previously. For example, if we can combine the horizontal and vertical fragmentation of the EMP relation, it will result into a mixed fragmentation. This relation is divided initially into the fragments EMP1 and EMP2 as vertical fragments. We can now further divide fragment EMP1 using the horizontal-fragmentation scheme, into the following two fragments: EMP1a=DEPTNO= 10 (EMP1) EMP2a=DEPTNO= 20 (EMP2) EMP3a=DEPTNO= 30 (EMP3)

Data Replication and Fragmentation

The techniques described for data replication and data fragmentation can be applied successively to the same relation. That is, a fragment can be replicated, replicas of fragments can be fragmented further, and so on. For example, consider a distributed system consisting of sites S1, S2.S11. We can fragment EMP into EMP1a, EMP2a and EMP2, and for example, store a copy of EMP1a at sites S1, S3 and S7; a copy of EMP2a at sites S4 and S11; and a copy of EMP2 at sites S2, S8 and S9.

Complete replication

This strategy consists of maintaining a complete copy of the database at each site. Therefore, locality of reference, reliability and availability, and performance are maximized. However, storage costs and communication costs for updates are the most expensive. To overcome some of these problems, snapshots are sometimes used. A snapshot is a copy of the data at a given time. The copies are updated periodically, for example, hourly or weekly, so they may not be always up to date. Snapshots are also sometimes used to implement views in a distributed database to improve the time it takes to perform a database operation on a view.
Selective replication This strategy is a combination of fragmentation, replication and centralized. Some data items are fragmented to achieve high locality of reference and others, which are used at many sites and are not frequently updated, are replicated; otherwise, the data items are centralized. The objective of this strategy is to have all the advantages of the other approaches but none of the disadvantages. This is the most commonly used strategy because of its flexibility.

Distributed Relational Database Design


In this section we examine the factors that have to be considered for the design of a distributed relational database. More specifically, we examine:

Fragmentation

A relation may be divided into a number of subrelations, called fragments, which are the distributed.

There are two main types of fragmentation: 1) Horizontal fragmentation 2) Vertical fragmentation

Allocation distribution.

Each fragment is stored at the site with optimal

Replication The DDBMS may maintain a copy of a fragment at several different sites. The definition and allocation of fragments must be based on how the database is to be used. This involves analyzing transactions. The design should be based on both quantitative and qualitative information. Quantitative information is used in allocation;

qualitative information is used in fragmentation.


The quantitative information may include: The frequency with which a transaction is run.

The site from which a transaction is run.


The performance criteria for transactions.

The qualitative information may include information about the transaction that are following objectives:
Locality of reference Improved reliability and availability Acceptable performance Balanced storage capacities and costs Minimal communication costs

Você também pode gostar