Você está na página 1de 18

1. What is data model?

List different categories of database models


Data model is a collection of conceptual tools for describing data, data
relationships, data semantics and consistency constraints.
Types of Data Models
Entity relationship model
Relational model
Object oriented data model
Object relational data model
Network data model
Hierarchical data model
Semistructured Data Models
2.What is DML ? List the types of DML?
DML meant for data manipulation language which includes a query language
and commands to insert tuples into, delete tuples from, and modify tuples
in the database.
Procedural DMLs require a user to specify what data are needed and how to
get those data.
Declarative DMLs (also referred to as nonprocedural DMLs) require a user to
specify what data are needed without specifying how to get those data.
3. How do you test for empty relations?
The exists construct returns the value true if the argument subquery is
nonempty.
exists r r 6=
not exists r r =
4. Define natural join and thetajoin operations
The natural join operation on two relations matches tuples whose values are
the same on all attribute names that are common to both relations.
The theta join operation is a variant of the natural-join operation that allows
us to combine a selection and a Cartesian product into a single operation.
Consider relations r (R) and s(S), and let be a predicate on attributes in the
schema R S.
The theta join operation

is defined as follows

5. State the purpose of indexing.


Indexing mechanisms used to speed up access to desired data
It is a data structure that is added to a file to provide faster access to the data.
It reduces the number of blocks that the DBMS has to check

6. What is assertion? Give its syntax


An assertion is any condition that the database must always satisfy. Domain
constraints and referential-integrity constraints are special forms of assertions.
When an assertion is created, the system tests it for validity. If the assertion is
valid, then any future modification to the database is allowed only if it does not
cause that assertion to be violated.
This testing may result in significant overhead if the assertions are complex.
Because of this, the assert should be used with great care.
Some system developer omits support for general assertions or provides
specialized form of assertions that are easier to test.
Syntax
CREATE ASSERTION <Constraint name>
CHECK (search condition)
[ <constraint attributes> ]

7. What is lossless join decomposition?


Let r (R) be a relation schema, and let F be a set of functional dependencies on
r (R). Let R1 and R2 form a decomposition of R. We say that the decomposition
is a lossless decomposition if there is no loss of information by replacing r (R)
with two relation schemas r1(R1) andr2(R2).
A decomposition of R into R1 and R2 is lossless join if at least one of the
following dependencies is in F+:
R1 R2 R1
R1 R2 R2

The above functional dependencies are a sufficient condition for lossless join
decomposition; the dependencies are a necessary condition only if all
constraints are functional dependencies
select * from (select R1 from r) natural join (select R2 from r)
This is stated more succinctly in the relational algebra as:

8. What is shadow-copy technique?


In the shadow-copy scheme, a transaction that wants to update the database
first creates a complete copy of the database. All updates are done on the new
database copy, leaving the original copy, the shadow copy, untouched. If at any
point the transaction has to be aborted, the system merely deletes the new
copy.The old copy of the database has not been affected. The current copy of
the database is identified by a pointer, called db-pointer, which is stored on
disk.
9. What actions need to be taken to recover from a deadlock?
When deadlock is detected: Some transaction will have to rolled back (made a
victim) to break deadlock. Select that transaction as victim that will incur
minimum cost.
Rollback -- determine how far to roll back transaction Total rollback: Abort the
transaction and then restart it.
More effective to roll back transaction only as far as necessary to break
deadlock.
Starvation happens if same transaction is always chosen as victim. Include the
number of rollbacks in the cost factor to avoid starvation.
10. Define check point? Where it is used?
Checkpoint is a mechanism where all the previous logs are removed from the
system and stored permanently in storage disk. Checkpoint declares a point
before which the DBMS was in consistent state and all the transactions were
committed.

T1 can be ignored (updates already output to disk due to checkpoint)


T2 and T3 redone.
T4 undone

PART-B

11.a)Explain the different types of architectures of database applications


Database applications are usually partitioned into two or three parts.In a twotier architecture, the application resides at the client machine,where it
invokes database system functionality at the server machine through query
language statements. Application program interface standards like ODBC and
JDBC are used for interaction between the client and the server.In contrast, in
a three-tier architecture, the client machine acts as merely a front end and
does not contain any direct database calls. Instead, the client end
communicates with an application server, usually through a forms interface.
The application server in turn communicates with a database system to access
data. The business logic of the application, which says what actions to carry
out under what conditions, is embedded in the application server, instead of
being distributed across multiple clients. Three-tier applications are more
appropriate for large applications, and for applications that run on the
WorldWideWeb

11.b) Write the function of data base administrator


The functions of a DBA include:
Schema definition. The DBA creates the original database schema by
executing a set of data definition statements in the DDL.
Storage structure and access-method definition.
Schema and physical-organization modification. The DBA carries out
changes to the schema and physical organization to reflect the changing needs
of the organization, or to alter the physical organization to improve performance
Granting of authorization for data access. By granting different types of
authorization, the database administrator can regulate which parts of the
database various users can access. The authorization information is kept in a
special system structure that the database system consults whenever someone
attempts to access the data in the system.
Routine maintenance:
Periodically backing up the database, either onto tapes or onto remote
servers, to prevent loss of data in case of disasters such as flooding.
Ensuring that enough free disk space is available for normal operations,
and upgrading disk space as required.

Monitoring jobs running on the database and ensuring that performance is


not degraded by very expensive tasks submitted by some users.
12 a. Explain Extended Relational-Algebra-Operations
Extended Relational-Algebra-Operations
Generalized Projection
Outer Join
Aggregate Functions
Generalized Projection
Extends the projection operation by allowing arithmetic functions to be used in
the
projection
list.
F1, F2, , Fn(E)
E is any relational-algebra expression
Each of F1, F2, , Fn are are arithmetic expressions involving constants and
attributes in the schema of E.
empno,ename,depno,sal*12(emp)
Aggregation function takes a collection of values and returns a single value as a
result.
avg:
average
value
min:
minimum
value
max:
maximum
value
sum:
sum
of
values
count: number of values
Aggregate operation in relational algebra
G1, G2, , Gn g F1( A1), F2( A2),, Fn( An) (E)
E is any relational-algebra expression
G1, G2 , Gn is a list of attributes on which to group (can be empty)
Each Fi is an aggregate function
Each Ai is an attribute name.
All tuples in a group have the same values for G1, G2, . . . , Gn.
Tuples in different groups have different values for G1, G2, . . . , Gn.
Find the average salary of all employees
Gavg(salary)(emp)
Outer Join
An extension of the join operation that avoids loss of information.

Computes the join and then adds tuples form one relation that does not match
tuples in the other relation to the result of the join.
Uses null values:
null signifies that the value is unknown or does not exist
All comparisons involving null are (roughly speaking) false by definition.
Consider the following schemas
Suppliers (sid: number,sname:sting)
Orders (oid:number,sid:number orderdate;date);
select s.sid,s.sname,o.orderdate
from suppliers s
join/left join/right join/full join
orders o
on s.sid=o.sid;

12 b. Discuss with examples various set operations in SQL


The set operations union, intersect, and except operate on relations and
correspond to the relational algebra operations ,, and .
Each of the above operations automatically eliminates duplicates; to retain all
duplicates use the corresponding multiset versions union all, intersect all and
except all.
Suppose a tuple occurs m times in r and n times in s, then, it occurs:
m + n times in r union all s
min(m, n) times in r intersect all s
max(0, m n) times in r except all s
Find all customers who have a loan, an account, or both:
(select customer-name from depositor)
union
(select customer-name from borrower)
Find all customers who have both a loan and an account.
(select customer-name from depositor)
intersect
(select customer-name from borrower)

Find all customers who have an account but no loan.


(select customer-name from depositor)
except
(select customer-name from borrower)
13.a)

13.b) Define functional dependency and write a procedure to compute the


closure of a set of functional dependencies FDS

Defintion of FD
An FD is a relationship between an attribute "Y" and a determinant (1 or more
other attributes) "X" such that for a given value of a determinant the value of
the attribute is uniquely defined.
X is a determinant
X determines Y
Y is functionally dependent on X
To compute the closure of a set of functional dependencies F
F+=F
repeat for each functional dependency f in F+
apply reflexivity and augmentation rules on f add the resulting functional
dependencies to F +
for each pair of functional dependencies f1 and f2 in F +
if f1 and f2 can be combined using transitivity
then add the resulting functional dependency to F +
until F + does not change any further
14.a) What is hashing? Explain handling of buckets overflows.
Hashing is an effective technique to calculate direct location of data record on
the disk without using index structure.
It uses a function, called hash function and generates address when called
with search key as parameters. Hash function computes the location of desired
data on the disk.
Hash Organization

Bucket: Hash file stores data in bucket format. Bucket is considered a unit of
storage. Bucket typically stores one complete disk block, which in turn can
store one or more records.
Hash Function: A hash function h, is a mapping function that maps all set of
search-keys K to the address where actual records are placed. It is a function
from search keys to bucket addresses.

Operation:
Insertion: When a record is required to be entered using static hash, the hash
function h, computes the bucket address for search key K, where the record
will be stored.
Bucket address = h(K)
Search: When a record needs to be retrieved the same hash function can be
used to retrieve the address of bucket where the data is stored.
Delete: This is simply search followed by deletion operation.
BUCKET OVERFLOW:
The condition of bucket-overflow is known as collision. This is a fatal state for
any static hash function. In this case overflow chaining can be used.
Bucket overflow can occur because of Insufficient buckets .
Skew in distribution of records.
This can occur due to two reasons:
multiple records have same search-key value
chosen hash function produces non-uniform distribution of key values
Although the probability of bucket overflow can be reduced, it cannot be
eliminated; it is handled by using overflow buckets
Overflow Chaining: When buckets are full, a new bucket is allocated for the
same hash result and is linked after the previous one. This mechanism is
called Closed Hashing.

Linear Probing: When hash function generates an address at which data is


already stored, the next free bucket is allocated to it. This mechanism is called
Open Hashing.

b) Discuss the concepts of conflict serializability with an example.


Instructions li and lj of transactions Ti and Tj respectively, conflict if and only
if there exists some item Q accessed by both li and lj, and at least one of these
instructions wrote Q.
1.
2.
3.
4.

li
li
li
li

=
=
=
=

read(Q), lj = read(Q). li and lj dont conflict.


read(Q), lj = write(Q). They conflict.
write(Q), lj = read(Q). They conflict
write(Q), lj = write(Q). They conflict

Intuitively, a conflict between li and lj forces a (logical) temporal order between


them. If li and lj are consecutive in a schedule and they do not conflict, their
results would remain the same even if they had been interchanged in the
schedule.
If a schedule S can be transformed into a schedule Sby a series of swaps of
non-conflicting instructions, we say that S and Sare conflict equivalent.
We say that a schedule S is conflict serializable if it is conflict equivalent to a
serial schedule.
Schedule 3 can be transformed into Schedule 6, a serial schedule where T2
follows T1, by series of swaps of non-conflicting instructions. Therefore
Schedule 3 is conflict serializable .

Example of a schedule that is not conflict serializable

We are unable to swap instructions in the above schedule to obtain either the
serial schedule < T3, T4 >, or the serial schedule < T4, T3 >.

15. a)What is a time stamp? Write about time stamp based protocol.
The timestamp of a transaction denotes the time when it was first activated.
Each transaction is issued a timestamp when it enters the system. If an old
transaction Ti has time-stamp TS(Ti), a new transaction Tj is assigned timestamp TS(Tj) such that TS(Ti) <TS(Tj).
The protocol manages concurrent execution such that the time-stamps
determine the serializability order.
In order to assure such behavior, the protocol maintains for each data Q two
timestamp values:
W-timestamp(Q) is the largest time-stamp of any transaction that executed
write(Q) successfully.
R-timestamp(Q) is the largest time-stamp of any transaction that executed
read(Q) successfully
The timestamp ordering protocol ensures that any conflicting read and write
operations are executed in timestamp order.
Suppose a transaction Ti issues a read(Q)
If TS(Ti) W-timestamp(Q), then Ti needs to read a value of Q that was already
overwritten. Hence, the read operation is rejected, and Ti is rolled back.

If TS(Ti) W-timestamp(Q), then the read operation is executed, and Rtimestamp(Q) is set to max(R-timestamp(Q), TS(Ti)).

Suppose that transaction Ti issues write(Q).


If TS(Ti) < R-timestamp(Q), then the value of Q that Ti is producing was needed
previously, and the system assumed that that value would never be produced.
Hence, the write operation is rejected, and Ti is rolled back.
If TS(Ti) < W-timestamp(Q), then Ti is attempting to write an obsolete value of Q.
Hence, this write operation is rejected, and Ti is rolled back.
Otherwise, the write operation is executed, and W-timestamp(Q) is set to
TS(Ti).

Timestamp protocol ensures freedom from deadlock as no transaction ever


waits.

But the schedule may not be cascade-free, and may not even be recoverable.
15.b)Explain Graph based protocol
Graph-based protocols are an alternative to two-phase locking
Impose a partial ordering on the set D = {d1, d2 ,..., dh} of all data items.
If di dj then any transaction accessing both di and dj must access di before
accessing dj.
Implies that the set D may now be viewed as a directed acyclic graph, called a
database graph.
The tree-protocol is a simple kind of graph protocol

Only exclusive locks are allowed.


The first lock by Ti may be on any data item. Subsequently, a data Q can be
locked by Ti only if the parent of Q is currently locked by Ti.
Data items may be unlocked at any time.
A data item that has been locked and unlocked by Ti cannot subsequently be
relocked by Ti

The tree protocol ensures conflict serializability as well as freedom from


deadlock.
Unlocking may occur earlier in the tree-locking protocol than in the two-phase
locking protocol.
shorter waiting times, and increase in concurrency
protocol is deadlock-free, no rollbacks are required
Drawbacks
Protocol does not guarantee recoverability or cascade freedom.
Need to introduce commit dependencies to ensure recoverability
Transactions may have to lock data items that they do not access.
increased locking overhead, and additional waiting time
potential decrease in concurrency
Schedules not possible under two-phase locking are possible under tree
protocol, and vice versa.

17. Write short note on


Armstrongs Axioms
b) Bitmap indices
c) Remote backup systems
a)Armstrongs Axioms
Axioms, or rules of inference, provide a simpler technique for reasoning about
functional dependencies
We can use the following three rules to find logically implied functional
dependencies. By applying these rules repeatedly, we can find all of F+, given F.

Armstrongs axioms are sound, because they do not generate any incorrect
functional dependencies.
Armstrongs axioms complete, because, for a given set F of functional
dependencies, they allow us to generate all F+.
List additional rules

Bitmap indices
Bitmap indices are a special type of index designed for efficient querying on
multiple keys.

Records in a relation are assumed to be numbered sequentially from, say, 0


Given a number n it must be easy to retrieve record n Particularly easy if
records are of fixed size
Applicable on attributes that take on a relatively small number of distinct
values E.g. gender, country, state,
E.g. income-level (income broken up into a small number of levels such as 09999, 10000-19999, 20000-50000, 50000- infinity)
A bitmap is simply an array of bits

In its simplest form a bitmap index on an attribute has a bitmap for each value
of the attribute Bitmap has as many bits as records
In a bitmap for value v, the bit for a record is 1 if the record has the value v for
the attribute, and is 0 otherwise
Bitmap indices are useful for queries on multiple attributes not particularly
useful for single attribute queries
Queries are answered using bitmap operations
Intersection (and) Union (or) Complementation (not)
Each operation takes two bitmaps of the same size and applies the operation
on corresponding bits to get the result bitmap
E.g. 100110 AND 110011 = 100010

100110 OR 10011 = 110111


NOT 100110 = 011001
Males with income level L1: 10010 AND 10100 = 10000 .
Can then retrieve required tuples.
Counting number of matching tuples is even faster
Remote Backup Systems
Remote backup systems provide high availability by allowing transaction
processing to continue even if the primary site is destroyed.

Detection of failure: Backup site must detect when primary site has failed
To distinguish primary site failure from link failure maintain several
communication links between the primary and the remote backup.
Heart-beat messages
Transfer of control:
To take over control backup site first perform recovery using its copy of the
database and all the long records it has received from the primary.
Thus, completed transactions are redone and incomplete transactions are
rolled back.
When the backup site takes over processing it becomes the new primary.

To transfer control back to old primary when it recovers, old primary must
receive redo logs from the old backup and apply all updates locally.
Time to recover: To reduce delay in takeover, backup site periodically proceses
the redo log records (in effect, performing recovery from previous database
state), performs a checkpoint, and can then delete earlier parts of the log.
Hot-Spare configuration permits very fast takeover:
Backup continually processes redo log record as they arrive, applying the
updates locally.
When failure of the primary is detected the backup rolls back incomplete
transactions, and is ready to process new transactions.
Alternative to remote backup: distributed database with replicated data.
Remote backup is faster and cheaper, but less tolerant to failure .

Você também pode gostar