Escolar Documentos
Profissional Documentos
Cultura Documentos
• Concepts.
• Advantages and disadvantages of distributed
databases.
Distributed Databases • Functions and architecture for a DDBMS.
• Distributed database design.
• Levels of transparency.
• Comparison criteria for Distributed DBMSs.
(c) shared
nothing
• Expect DDBMS to have at least the functionality of • Due to diversity, no accepted architecture equivalent to
a DBMS (see Connolly & Begg. Chapter 2. Third edition) ANSI/SPARC 3-level architecture for DBMSs.
• Also to have following functionality: • A possible reference architecture consists of:
• Extended communication services. • Set of global external schemas.
• Global conceptual schema (GCS).
• Extended Data Dictionary.
• Fragmentation schema and allocation schema.
• Distributed query processing. • Set of schemas for each local DBMS conforming to 3-level
• Extended concurrency control. ANSI/SPARC .
• Extended recovery services. • Some levels may be missing, depending on levels of
• Extended security control. transparency supported.
Distributed Databases
Three key issues we have to consider: • Definition and allocation of fragments carried out
strategically to achieve:
• Locality of Reference
• Data Allocation: where are data placed? Data should be
• Improved Reliability and Availability
stored at site with "optimal" distribution.
• Improved Performance
• Fragmentation: relation may be divided into a number of
sub-relations (called fragments) , which are stored in • Balanced Storage Capacities and Costs
different sites. • Minimal Communication Costs.
• Replication: copy of fragment may be maintained at • Involves analysing most important transactions,
several sites. based on quantitative/qualitative information.
• Centralized: Consists of single database stored at one site A relation R is divided into fragments r1, r2, …rn,
with users distributed across the network.
(This is not a DDB but distributed processing!!) which contain enough information to allow
reconstruction of R
• Partitioned: Database partitioned into disjoint fragments,
each fragment assigned to one site.
Example:
We have a relation Sells(pub, address,price,type)
• Complete Replication: Consists of maintaining complete
copy of database at each site. Type is “bitter” or “lager”.
We can split Sells into twp dfferent fragments:
• Selective Replication:Combination of partitioning, • SellsBitter= σ type = “bitter”(Sells)
replication, and centralization.
• SellsLager= σtype = “lager”(Sells)
• Usage
• Applications work with views rather than entire relations.
• Efficiency
• Data is stored close to where it is most frequently used.
• Data that is not needed by local applications is not stored
• Parallelism
• With fragments as unit of distribution, transaction can be divided
into several sub-queries that operate on fragments.
• Security
• Data not required by local applications is not stored and so not
available to unauthorized users.
• Horizontal
• Vertical
S3 = σ branchNo=‘B003’(Staff)
• If relation contains more than one foreign key, need to
S4 = σ select one as parent.
branchNo=‘B005’(Staff)
S5 = σ • Choice can be based on fragmentation used most
branchNo=‘B007’(Staff)
frequently or fragmentation with better join
characteristics.
Could use derived fragmentation for Property:
Recostruction: we must be able to reconstruct the entire R Disjointness: if data item x appears in fragment ri, then it
from fragments. should not appear in any other fragment.
For horizontal fragmentation is union operation. Exception: vertical fragmentation, where primary key
R = r1 ∪ r2 ∪ … ∪ rn, attributes must be repeated to allow reconstruction.
For vertical fragmentation is natural join operation. For horizontal fragmentation, data item is a tuple
R = r1 >< r2 >< … >< rn, For vertical fragmentation, data item is an attribute.
To ensure reconstruction we have to include primary key
attributes in all fragments.
• Distribution Transparency
• Transaction Transparency
Distributed Databases
• Performance Transparency
• DBMS Transparency
Transparency in Distributed databases
The user has to perceive the DDB as a single, • Each item in a DDB must have a unique name.
logical entity
• DDBMS must ensure that no two sites create a
database object with same name.
• Fragmentation Transparency: the user does not need to
know that data is fragmented Solution 1: create central name server.
• Location Transparency: the user does not need to know Disadvantages:
the location of data items
• Replication Transparency: the user is unaware of • loss of some local autonomy;
relication of data. • central site may become a bottleneck;
• Naming transparency: items in a database must have a • low availability; if the central site fails, remaining sites
unique name, but users don’t need to worry about it. cannot create any new objects.
Solution 2: prefix object with identifier of site that Solution 3: use aliases for each database object.
created it.
Example: S1.Beer might be known as local_Beer
Example: Beer created at site S1 might be named by user at site S1.
S1.Beer.
The DDBMS has task of mapping an alias to
Disadvantage: loss of distribution transparency. appropriate database object.
• Replication makes concurrency more complex. • Could limit update propagation to only those sites
• If a copy of a replicated data item is updated, currently available. Remaining sites updated when
update must be propagated to all copies. they become available again.
• Could propagate changes as part of original • Could allow updates to copies to happen
transaction, making it an atomic operation. asynchronously, sometime after the original
update. Delay in regaining consistency may range
• However, if one site holding copy is not reachable, from a few seconds to several hours.
then transaction is delayed until site is reachable.
• DDBMS must ensure atomicity and durability of DDBMS must perform as if it were a
global transaction. centralized DBMS:
• Means ensuring that sub-transactions of global • DDBMS should not suffer any performance
transaction either all commit or all abort. degradation due to distributed architecture.
• Thus, DDBMS must synchronize global • DDBMS should determine most cost-effective
transaction to ensure that all sub-transactions strategy to execute a request.
have completed successfully before recording a
final COMMIT for global transaction.
• Must do this in presence of site and network
failures.
• Distributed Query Processor (DQP) maps data • DQP produces execution strategy optimised
request into ordered sequence of operations on with respect to some cost function.
local databases. • Typically, costs associated with a distributed
• It must consider fragmentation, replication, and request include:
allocation schemas. • I/O cost;
• DQP has to decide: • CPU cost;
• which fragment to access; • communication cost.
• which copy of a fragment to use;
• which location to use.
Assume:
Property(Pno, City) 10000 records in London
• Each tuple in each relation is 100 characters long.
Renter(Rno,Max_Price) 100000 records in Glasgow
• 10 renters with maximum price greater than
Viewing(Pno, Rno) 1000000 records in London £200,000.
• 100 000 viewings for properties in Aberdeen.
SELECT p.pno • Computation time negligible compared to
FROM property p INNER JOIN communication time.
(renter r INNER JOIN viewing v ON r.rno = v.rno)
ON p.pno = v.pno
WHERE p.city=‘Aberdeen’ AND r.max_price > 200000;
1. Local Autonomy
2. No Reliance on a Central Site
3. Continuous Operation
4. Location Independence
5. Fragmentation Independence
6. Replication Independence