Você está na página 1de 18

Distributed Databases – Basic Concepts

• Concepts.
• Advantages and disadvantages of distributed
databases.
Distributed Databases • Functions and architecture for a DDBMS.
• Distributed database design.
• Levels of transparency.
• Comparison criteria for Distributed DBMSs.

Connolly & Begg. Chapter 22. Third edition

COMP 302 Valentina Tamma COMP 302 Valentina Tamma

Why distributed databases? Concepts

Some initial motivations: Distributed Database


• The development of computer networks promotes A logically interrelated collection of shared data (and a
decentralization. description of this data), physically distributed over a
• In a company, the database organization might reflect the computer network.
organizational structure, which is distributed into units.
Each unit maintains its own database.
Distributed DBMS (DDBMS)
• Sharing of data can be achieved by developing a
Software system that permits the management of the
distributed database system which:
distributed database and makes the distribution
• makes data accessible by all units transparent to users.
• stores data close to where it is most frequently used.

COMP 302 Valentina Tamma COMP 302 Valentina Tamma


3
An example of DDBMS DDBMS - characteristics

• Collection of logically-related shared data.


• Data split into fragments.
• Fragments may be replicated.
• Fragments/replicas allocated to sites.
• Sites linked by a communications network.
• Data at each site is under control of a DBMS.
• DBMSs handle local applications autonomously.
• Each DBMS participates in at least one global
application.

COMP 302 Valentina Tamma COMP 302 Valentina Tamma


4

These are not DDBMSs These are not DDBMSs

Distributed Processing A centralized database that can Parallel DBMS


be accessed over a computer network.
A DBMS running across multiple processors and
disks designed to execute operations in parallel,
whenever possible, to improve performance.

• Based on premise that single processor systems can no


longer meet requirements for cost-effective scalability,
reliability, and performance.
• Parallel DBMSs link multiple, smaller machines to achieve
same throughput as single, larger machine, with greater
scalability and reliability.

COMP 302 Valentina Tamma


6 COMP 302 Valentina Tamma
7
Parallel DBMS Parallel DBMS

Main architectures for parallel DBMSs are:


(a) shared
• Shared memory, memory
• Shared disk,
• Shared nothing. (b) shared disk

(c) shared
nothing

COMP 302 Valentina Tamma COMP 302 Valentina Tamma

Advantages of DDBMSs Disadvantages of DDBMSs


• Reflects Organizational Structure
• Improved Sharing and Local Autonomy • Complexity
• Improved Availability • Cost
A failure does not make the entire system inoperable Especially in system management
• Improved Reliability • Security
Data may be replicated network must be made secure
• Improved Performance • Integrity Control More Difficult
Data are local to the site of “greatest demand”
• Lack of Standards
• Economics
Many small computers cost less than a big one! • Lack of Experience
• Modular Growth • Database Design More Complex
easy to add new modules due to fragmentation, allocation of fragments to a specific
site, …..

COMP 302 Valentina Tamma


10 COMP 302 Valentina Tamma
11
Types of DDBMS Multidatabase System (MDBS)
Homogeneous DDBMS
• All sites use same DBMS product (eg.Oracle) • DDBMS in which each site maintains complete
• Fairly easy to design and manage. autonomy.

Heterogeneous DDBMS • DBMS that resides transparently on top of existing


database and file systems and presents a single
• Sites may run different DBMS products (eg. Oracle and database to its users.
Ingress)
• Possibly different underlying data models (eg. relational • Allows users to access and share data without
DB and OO database) requiring physical database integration.
• Occurs when sites have implemented their own databases • Unfederated MDBS (no local users) and federated
and integration is considered later. MDBS.
• We won’t consider heterogeneous DDBMSs here.

COMP 302 Valentina Tamma


12 COMP 302 Valentina Tamma

Overview of Networking Overview of Networking

• Network - Interconnected collection of autonomous


computers, capable of exchanging information.
• Local Area Network (LAN) intended for connecting
computers at same site.
• Wide Area Network (WAN) used when computers or LANs
need to be connected over long distances.
• WAN relatively slow and less reliable than LANs. DDBMS
using LAN provides much faster response time than one
using WAN.

COMP 302 Valentina Tamma COMP 302 Valentina Tamma


Functions of a DDBMS Reference Architecture for DDBMS

• Expect DDBMS to have at least the functionality of • Due to diversity, no accepted architecture equivalent to
a DBMS (see Connolly & Begg. Chapter 2. Third edition) ANSI/SPARC 3-level architecture for DBMSs.
• Also to have following functionality: • A possible reference architecture consists of:
• Extended communication services. • Set of global external schemas.
• Global conceptual schema (GCS).
• Extended Data Dictionary.
• Fragmentation schema and allocation schema.
• Distributed query processing. • Set of schemas for each local DBMS conforming to 3-level
• Extended concurrency control. ANSI/SPARC .
• Extended recovery services. • Some levels may be missing, depending on levels of
• Extended security control. transparency supported.

COMP 302 Valentina Tamma COMP 302 Valentina Tamma

Reference Architecture for DDBMS Reference Architecture for DDBMS

• Global Conceptual Schema is the logical


description of the DB as if it were not distributed. It
contains definitions of entities, relationships,
constraints, security, and integrity information.
• Fragmentation and Allocation Schemas describe
how data are logically partitioned, and where they
are located, taking replication into account.
• Local Schemas are the logical descriptions of the
local DBs.

COMP 302 Valentina Tamma COMP 302 Valentina Tamma


Components of a DDBMS

Distributed Databases

Issues in Distributed Database Design

COMP 302 Valentina Tamma COMP 302 Valentina Tamma

Issues in Distributed Database Design Issues in Distributed Database Design

Three key issues we have to consider: • Definition and allocation of fragments carried out
strategically to achieve:
• Locality of Reference
• Data Allocation: where are data placed? Data should be
• Improved Reliability and Availability
stored at site with "optimal" distribution.
• Improved Performance
• Fragmentation: relation may be divided into a number of
sub-relations (called fragments) , which are stored in • Balanced Storage Capacities and Costs
different sites. • Minimal Communication Costs.
• Replication: copy of fragment may be maintained at • Involves analysing most important transactions,
several sites. based on quantitative/qualitative information.

COMP 302 Valentina Tamma


27 COMP 302 Valentina Tamma
Fragmentation Data Allocation

• Quantitative information may include: • Four strategies regarding placement of data:


• frequency with which a transaction is run;
• Centralized
• site from which a transaction is run;
• performance criteria for transactions. • Partitioned (or Fragmented)
• Complete Replication
• Qualitative information may include transactions that are • Selective Replication
executed such as:
• type of access (read or write);
• predicates of read operations.

COMP 302 Valentina Tamma COMP 302 Valentina Tamma


30

Data Allocation Fragmentation

• Centralized: Consists of single database stored at one site A relation R is divided into fragments r1, r2, …rn,
with users distributed across the network.
(This is not a DDB but distributed processing!!) which contain enough information to allow
reconstruction of R
• Partitioned: Database partitioned into disjoint fragments,
each fragment assigned to one site.
Example:
We have a relation Sells(pub, address,price,type)
• Complete Replication: Consists of maintaining complete
copy of database at each site. Type is “bitter” or “lager”.
We can split Sells into twp dfferent fragments:
• Selective Replication:Combination of partitioning, • SellsBitter= σ type = “bitter”(Sells)
replication, and centralization.
• SellsLager= σtype = “lager”(Sells)

COMP 302 Valentina Tamma


31 COMP 302 Valentina Tamma
28
Comparison of Strategies for Data Distribution Why Fragment?

• Usage
• Applications work with views rather than entire relations.
• Efficiency
• Data is stored close to where it is most frequently used.
• Data that is not needed by local applications is not stored
• Parallelism
• With fragments as unit of distribution, transaction can be divided
into several sub-queries that operate on fragments.
• Security
• Data not required by local applications is not stored and so not
available to unauthorized users.

COMP 302 Valentina Tamma COMP 302 Valentina Tamma

Why Fragment? Types of Fragmentation

• Four types of fragmentation:


Two main disadvantages:
• Horizontal,
• Performance • Vertical,
• Integrity. • Mixed,
• Derived.

• Other possibility is no fragmentation:


• If relation is small and not updated frequently, may
be better not to fragment relation.

COMP 302 Valentina Tamma COMP 302 Valentina Tamma


© Pearson Education Limited 1995, 2005
Horizontal and Vertical Fragmentation Mixed Fragmentation

Two types of fragmentation:

• Horizontal

• Vertical

COMP 302 Valentina Tamma COMP 302 Valentina Tamma


© Pearson Education Limited 1995, 2005

Horizontal Fragmentation Horizontal Fragmentation

• Each fragment consists of a subset of the tuples


• This strategy is determined by looking at predicates
of a relation R. used by transactions.
• Defined using Selection operation of relational algebra: • Involves finding set of minimal (complete and relevant)
• σp(R) predicates.
• Set of predicates is complete, if and only if, any two
• Example: tuples in same fragment are referenced with same
• Relation: Sells(pub, address,price,type) probability by any application.
• Fragments: • Predicate is relevant if there is at least one application
» SellsBitter= σ type = “bitter”(Sells) that accesses fragments differently.
» SellsLager= σtype = “lager”(Sells)

COMP 302 Valentina Tamma


43 COMP 302 Valentina Tamma
Vertical Fragmentation Mixed Fragmentation
• Each fragment consists of a subset of attributes of a relation
R. • We can also mix horizontal and vertical fragmentation.
• Defined using projection operation of relational algebra:
• Πa1,…an(R) • We obtain a fragment that consist of an horizontal
fragment that is vertically fragmented, or a vertical
• Determined by establishing affinity of one attribute to another. fragment that is horizontally fragmented.
• Example:
• Relation: Bars(name,address,licence,employees,owner)
• Defined using Selection and Projection operations of
• Fragments: relational algebra.
» Πname,address,licence (Bars)
σp(Π a1,…an(R))
» Πname,address,employees,owner(Bars)
Πa1,…an(σp(R))

COMP 302 Valentina Tamma


45 COMP 302 Valentina Tamma
46

Example - Mixed Fragmentation Derived Horizontal Fragmentation

S1 = ∏staffNo, position, sex, DOB, salary(Staff) • A horizontal fragment that is based on


S2 = ∏staffNo, fName, lName, branchNo(Staff) horizontal fragmentation of a parent relation.
• Ensures that fragments that are frequently
joined together are at same site.
S21 = σ branchNo=‘B003’(S2 )
• Defined using Semijoin operation of relational
S22 = σ branchNo=‘B005’(S2 ) algebra:
S23 = σ branchNo=‘B007’(S2 )
Ri = R >F Si, 1≤i≤w

COMP 302 Valentina Tamma COMP 302 Valentina Tamma


Example - Derived Horizontal Fragmentation Derived Horizontal Fragmentation

S3 = σ branchNo=‘B003’(Staff)
• If relation contains more than one foreign key, need to
S4 = σ select one as parent.
branchNo=‘B005’(Staff)
S5 = σ • Choice can be based on fragmentation used most
branchNo=‘B007’(Staff)
frequently or fragmentation with better join
characteristics.
Could use derived fragmentation for Property:

Pi = PropertyForRent >branchNo Si, 3≤i≤5

COMP 302 Valentina Tamma COMP 302 Valentina Tamma

How can we define fragments correctly? Correctness of Fragmentation

In defining fragments we have to be very careful.


Completeness: If relation R is decomposed into
Three correctness rules: fragments r1, r2, …rn, each data item that can be
found in R must appear in at least one fragment.
• Completeness
This ensures no loss of data during fragmentation.
• Reconstruction
• Disjointness.

COMP 302 Valentina Tamma


37 COMP 302 Valentina Tamma
38
Correctness of Fragmentation Correctness of Fragmentation

Recostruction: we must be able to reconstruct the entire R Disjointness: if data item x appears in fragment ri, then it
from fragments. should not appear in any other fragment.

For horizontal fragmentation is union operation. Exception: vertical fragmentation, where primary key
R = r1 ∪ r2 ∪ … ∪ rn, attributes must be repeated to allow reconstruction.

For vertical fragmentation is natural join operation. For horizontal fragmentation, data item is a tuple
R = r1 >< r2 >< … >< rn, For vertical fragmentation, data item is an attribute.
To ensure reconstruction we have to include primary key
attributes in all fragments.

COMP 302 Valentina Tamma COMP 302 Valentina Tamma


39

Correctness of Horizontal Fragmentation Correctness of Vertical Fragmentation

Relation: Sells(pub, address,price,type) type={Bitter, Lager} Relation: Bars(name,address,licence,employees,owner)


Fragments: Fragments:
• SellsBitter= σ type = “bitter”(Sells) • r1 =Πname,address,licence (Bars)
• SellsLager= σtype = “lager”(Sells) • r2 = Πname,address,employees,owner(Bars)

Correctness rules Correctness rules


• Completeness: Each tuple in the relation appears either in • Completeness: Each attribute in the Bars relation appears either in
SellsBitter, or in SellsLager r1 or in r2
• Reconstruction: The Bars relation can be reconstructed from the
• Reconstruction: The Sells relation can be reconstructed from fragments
the fragments Bars = r1 >< r 2
Sells = SellsBitter ∪ SellsLager
• Disjointness: The two fragments are disjoint, except for the primary
• Disjointness: The two fragments are disjoint, there can be no key, name, which is necessary for reconstruction
beer that is both “Lager” and “Bitter”

COMP 302 Valentina Tamma


43 COMP 302 Valentina Tamma
43
Transparencies in a DDBMS

• Distribution Transparency
• Transaction Transparency
Distributed Databases
• Performance Transparency
• DBMS Transparency
Transparency in Distributed databases

COMP 302 Valentina Tamma COMP 302 Valentina Tamma

Distribution Transparency Naming Transparency

The user has to perceive the DDB as a single, • Each item in a DDB must have a unique name.
logical entity
• DDBMS must ensure that no two sites create a
database object with same name.
• Fragmentation Transparency: the user does not need to
know that data is fragmented Solution 1: create central name server.
• Location Transparency: the user does not need to know Disadvantages:
the location of data items
• Replication Transparency: the user is unaware of • loss of some local autonomy;
relication of data. • central site may become a bottleneck;
• Naming transparency: items in a database must have a • low availability; if the central site fails, remaining sites
unique name, but users don’t need to worry about it. cannot create any new objects.

COMP 302 Valentina Tamma


51 COMP 302 Valentina Tamma
54
Naming Transparency Naming Transparency

Solution 2: prefix object with identifier of site that Solution 3: use aliases for each database object.
created it.
Example: S1.Beer might be known as local_Beer
Example: Beer created at site S1 might be named by user at site S1.
S1.Beer.
The DDBMS has task of mapping an alias to
Disadvantage: loss of distribution transparency. appropriate database object.

COMP 302 Valentina Tamma


55 COMP 302 Valentina Tamma
56

Transaction Transparency Example - Distributed Transaction

• Ensures that all distributed transactions maintain Relation: Sells(pub, beer,price,type)


distributed database’s integrity and consistency. Fragments:
• Distributed transaction accesses data stored at more than • SellsBitter= σ type = “bitter”(Sells)
one location.
• SellsLager= σtype = “lager”(Sells)
• Each transaction is divided into number of sub-
transactions, one for each site that has to be accessed.
• DDBMS must ensure the indivisibility of both the global The two fragments are at two different sites.
transaction and each sub-transactions. Transaction T prints out the names of all pubs in the relation
• Must ensure both concurrency transparency, and failure sells. This transaction is split into two sub-transactions,
transparency one for each fragment.

COMP 302 Valentina Tamma


57 COMP 302 Valentina Tamma
58
Example - Distributed Transaction Concurrency Transparency

• All transactions must execute independently and be


T prints out names of all staff, using schema logically consistent with results obtained if transactions
defined above as S1, S2, S21, S22, and S23. executed one at a time, in some arbitrary serial order.
Define three subtransactions TS3, TS5, and TS7 • Same fundamental principles as for centralised DBMS.
to represent agents at sites 3, 5, and 7. • DDBMS must ensure both global and local transactions do
not interfere with each other.
• Similarly, DDBMS must ensure consistency of all sub-
transactions of global transaction.
• Techniques for concurrency control. Usually different from
the ones for DBMS.

COMP 302 Valentina Tamma COMP 302 Valentina Tamma


59

Concurrency Transparency Concurrency Transparency

• Replication makes concurrency more complex. • Could limit update propagation to only those sites
• If a copy of a replicated data item is updated, currently available. Remaining sites updated when
update must be propagated to all copies. they become available again.
• Could propagate changes as part of original • Could allow updates to copies to happen
transaction, making it an atomic operation. asynchronously, sometime after the original
update. Delay in regaining consistency may range
• However, if one site holding copy is not reachable, from a few seconds to several hours.
then transaction is delayed until site is reachable.

COMP 302 Valentina Tamma COMP 302 Valentina Tamma


Failure Transparency Performance Transparency

• DDBMS must ensure atomicity and durability of DDBMS must perform as if it were a
global transaction. centralized DBMS:
• Means ensuring that sub-transactions of global • DDBMS should not suffer any performance
transaction either all commit or all abort. degradation due to distributed architecture.
• Thus, DDBMS must synchronize global • DDBMS should determine most cost-effective
transaction to ensure that all sub-transactions strategy to execute a request.
have completed successfully before recording a
final COMMIT for global transaction.
• Must do this in presence of site and network
failures.

COMP 302 Valentina Tamma COMP 302 Valentina Tamma


63

Performance Transparency Performance Transparency

• Distributed Query Processor (DQP) maps data • DQP produces execution strategy optimised
request into ordered sequence of operations on with respect to some cost function.
local databases. • Typically, costs associated with a distributed
• It must consider fragmentation, replication, and request include:
allocation schemas. • I/O cost;
• DQP has to decide: • CPU cost;
• which fragment to access; • communication cost.
• which copy of a fragment to use;
• which location to use.

COMP 302 Valentina Tamma


64 COMP 302 Valentina Tamma
65
Performance Transparency - Example Performance Transparency - Example

Assume:
Property(Pno, City) 10000 records in London
• Each tuple in each relation is 100 characters long.
Renter(Rno,Max_Price) 100000 records in Glasgow
• 10 renters with maximum price greater than
Viewing(Pno, Rno) 1000000 records in London £200,000.
• 100 000 viewings for properties in Aberdeen.
SELECT p.pno • Computation time negligible compared to
FROM property p INNER JOIN communication time.
(renter r INNER JOIN viewing v ON r.rno = v.rno)
ON p.pno = v.pno
WHERE p.city=‘Aberdeen’ AND r.max_price > 200000;

COMP 302 Valentina Tamma


66 COMP 302 Valentina Tamma
67

Performance Transparency - Example Date’s 12 Rules for a DDBMS


0. Fundamental Principle
To the user, a distributed system should look exactly like
a non-distributed system.

1. Local Autonomy
2. No Reliance on a Central Site
3. Continuous Operation
4. Location Independence
5. Fragmentation Independence
6. Replication Independence

COMP 302 Valentina Tamma


68 COMP 302 Valentina Tamma
69
Date’s 12 Rules for a DDBMS Distributed Transaction Management
0. Fundamental Principle
To the user, a distributed system should look exactly like a non- • DDBMS must ensure:
distributed system. • synchronization of sub-transactions with other local
transactions executing concurrently at a site;
7. Distributed Query Processing • synchronization of sub-transactions with global
8. Distributed Transaction Processing transactions running simultaneously at same or different
9. Hardware Independence sites.
10. Operating System Independence
• Global transaction manager (transaction
11. Network Independence
coordinator) at each site, to coordinate global and
12. Database Independence
local transactions initiated at that site.
Note: last four rules are ideal!

COMP 302 Valentina Tamma


69 COMP 302 Valentina Tamma
4

Distributed Concurrency Control

• Techniques for Distributed Concurrency Control


must ensure distributed serializability.

• Locking protocols (extensions of 2PL protocol)


Distributed Deadlock management.
• Timestamping methods (extend the definition of
timestamp so that it includes a site identifier)

COMP 302 Valentina Tamma

Você também pode gostar