Você está na página 1de 30

Advanced DBMS Concepts

Raj Kishore
-----------------------
D2Hawkeye Services Pvt. Ltd.
ISO 9001:2000 Certified
Distributed Databases

• Data stored at several locations


• Managed by a DBMS that can run
autonomously
• Ideally, location of data is
unknown to client
• Clients can write Transactions
regardless of where the affected
data are located
Types of Distributed Database

• Homogeneous: Every site runs


the same type of DBMS (All sites
runs on Oracle)
• Heterogeneous: Different sites
run different DBMS (maybe
Oracle, MSSQL Server, DB/2)
Distributed Databases
Distributed DBMS Architectures

• Client - Servers:
o Client sends query to each database server
in the distributed system
o Client caches and accumulates responses
• Collaborating Server:
o Client sends query to “nearest” Server
o Server sends query to other Servers, as
required
o Server sends response to Client
Storing the Distributed Data

• In fragments at each site


o Split the data up
o Each site stores one or more fragments
• In complete replicas at each site
o Each site stores a replica of the complete
data
• A mixture of fragments and replicas
o Each site stores some replicas and/or
fragments or the data
Advantages Distributed DBMS

• Fragmentation (Sub-set Data)


o Exploit data access locality
o Put data near consumer
o Less network traffic
o Better response time
o Better availability
o Spread Load
• Replicated Data (Complete)
o Improves availability
o Disconnected (mobile) operation
o Reads are cheaper
Fragmentation (Sub-Setting)

• Horizontal – “Row- wise”


o rows of the table make up one
fragment
• Vertical – “Column- Wise”
o columns of the table make up one
fragment
• Selected Tables residing in
selected locations
Replication

• Make synchronized or
unsynchronized copies of data at
different servers
o Synchronized: data are always current,
updates are constantly shipped between
replicas
o Unsynchronized: data queued up for later
synchronization, good for read-only data
• Increases availability of data
• Makes query execution faster
Replication Catalogue

• Which objects are being replicated


• Where objects are being replicated to
• How updates are propagated
• Catalogue is a set of tables that can
be backed up, and recovered (as any
other table)
• These tables are themselves
replicated to each replication site
o No single point of failure in the Distributed
Database
Distributed Transaction

• All data that have been changed must


be propagated before the Transaction
commits (Distributed Replicated)
• Before Transaction can commit, it
obtains locks on all modified copies
• Sends lock requests to remote sites,
holds lock
• If links or remote sites fail,
Transaction cannot commit until
links/sites restored
• commit protocol is complex, and
involves many to and fro messages
Distributed Locking

• How to manage Locks across many


sites?
o Centrally: one site does all locking
 Vulnerable to single site failure
o Primary Copy: all locking for an object
done at the primary copy site for the object
 Reading requires access to locking site as
well as site which stores object
o Fully Distributed: locking for a copy done
at site where the copy is stored
 Locks at all sites while writing an object
Two- Phase Commit

• Site which originates Transaction is coordinator,


other sites involved in Transaction are
subordinates
• When the Transaction needs to Commit:
o Coordinator sends “prepare” message to
subordinates
o Subordinates each force-writes an abort or prepare
Log record, and sends “yes” or “no” message to
Coordinator
o If Coordinator gets unanimous “yes” messages,
force-writes a commit Log record, and sends
“commit” message to subordinates
o Subordinates force-write abort/commit Log record
accordingly, then send an “ack” message to
Coordinator
o Coordinator writes end end Log record after
receiving all acks
Parallel Processing

• Parallel processing divides a large task


into many smaller tasks and executes the
smaller tasks concurrently on several
nodes. As a result, the larger task
completes more quickly
• A node is a separate processor, often on a
separate machine. Multiple processors,
however, can reside on a single machine
Sequential Processing of a Single Task
Executing Component Tasks in Parallel
Problems of Parallel Processing

• Effective implementation of parallel


processing involves two challenges:
o Structuring tasks so some tasks
execute at the same time "in parallel"
o Preserving task sequencing for tasks
that must execute serially
Characteristics of a Parallel Processing
System

• A parallel processing system has the


following characteristics:
o Each processor in a system can perform
tasks concurrently
o Tasks may need to be synchronized
o Nodes usually share resources, such as
data, disks, and other devices
Parallel Processing for SMPs and
MPPs

• Parallel processing architectures


support:
o Clustered and massively parallel
processing (MPP) hardware where each
node has its own memory
o Single memory systems, also known as
"symmetric multiprocessing" (SMP)
hardware, where multiple processors
use one memory resource
The Goals of Parallel Processing

• Speedup is the extent to which more


hardware can perform the same
task in less time than the original
system
• Scaleup is the factor that expresses
how much more work can be done
in the same time period by a larger
system
Speedup and Scaleup with Different
Workloads

Workload   Speedup   Scaleup  


------------------------------------------
OLTP No Yes
DSS Yes Yes
Parallel Query Yes Yes
Batch (Mixed) Possible Yes
Benefits of Parallel Databases

• Parallel database technology can


benefit certain kinds of applications
by enabling:
o Higher Performance With more CPUs
available to an application, higher
speedup and scaleup can be attained
o High Availability Nodes are isolated
from each other, so a failure at one
node does not bring the entire system
down
Multi-Instance Database System
Distributed Database System
Parallel Execution

• With parallel execution features, DBMS


can divide the work of processing SQL
statements among multiple query server
processes
• Provides the framework for parallel
execution to work between nodes
• The data server must parallelize
individual queries into units of work that
can be processed simultaneously in
multiprocessing systems
Example of Parallel Execution
Processing
Multi-dimensional models

• Data items are each related by


several attributes, or dimensions.
o A quantity, e.g. detergent sale, has
dimensions including
 time (when sold),
 cost,
 location (where sold and in which type of
store)
Usage of Multi-Dimensional Data

• Question:- How many 3kg laundry packs


of powder did we sell in Eastern Region in
the last three months?
o Involve several dimensions of information and
can be answered straightforward if a multi-
dimensional view of the data is available
o Complicated, because individual dimensions
often have an inherent hierarchy of variables
within them
Online Analytical-Processing (OLAP)

• Fast Analysis of Shared Multidimensional


Information
o FAST: most responses to users within about 5
seconds simplest analyses taking no more than 1
second very few taking more than 20 seconds
o ANALYSIS: any business logic and statistical
analysis easy enough for the target user
o SHARED all the security requirements for
confidentiality
o MULTIDIMENSIONAL: multidimensional conceptual
view of the data, including full support for
hierarchies
o INFORMATION: all of the data and derived info.
needed
OLAP Vs OLTP (Online Transaction
Processing

Você também pode gostar