Você está na página 1de 8

# 1/8

## Section, circle one: 1 ~~~ 2

UNI:
Name:

COMS4111, Introduction to Databases
Spring 2014, Midterm Exam
Professor Alex Biliris
DURATION: 1 hour and 30 minutes

Nothing should be on your desk except this exam.

Problem : 1 2 3 4 5 Total
Max points: 5 pts 10 pts 10 pts 10 pts 15 pts 50 pts

Problem 1 (5 Points, 1 pt each)
In one very brief sentence, answer the questions or explain the meaning of the terms in the context of
relational databases.

a) Weak entity set:
Its entities exists only when they are associated with another entity (the owning entity) - they can
be identified by considering the primary of the owning entity.

b) Foreign key:
A set of fields in a tuple that refers (logically points) to another tuple its value is the value of the
primary key of the referenced tuple.

c) View:
A view is a SQL statement that defines a table - the table is not stored, rather it is computed
dynamically at the time it is being used.

d) Fill in the blanks: If the value of attribute A, an integer, is NULL, the result of A >10 is _____,
the result of A=9 OR 3<5 is _____, the result of A=9 AND FALSE is _____.
NULL, TRUE, FALSE.

e) For two relations R(a, b) and S(a, c), what is the difference in the result of R natural join S vs.
R left outer join S? Both joins are on R.a =S.a.
The natural join will combine only those tuples of R and S that match on column a, while the
left outer join will include all tuples of R even if there is no matching tuple in S.

2/8
Problem 2 (10 Points)
You are going to design an E-R diagram for a home insurance company. Here is the information that you
gathered:
Each customer is identified by ssn, has a name, and owns one or more homes.
Each home is identified by its address and is owned by one customer.
Each insurance policy is identified by a unique policy ID, it is valid for a specific period of time (start
and end date), and it covers one or more homes that belong to the same customer. A customer may
choose to cover some or all of his/her homes under the same policy.
Each policy establishes a series of payments that the customer needs to pay. Each payment record
describes the due date, the amount due, the actual date the payment was received and the amount
received. For a given policy, only one payment can be due on a specific date.

1. Draw an ER diagram that captures this situation. Make sure that your E/R diagrams are clear and
arrowheads, etc.). You will not receive the benefit of the doubt in the presence of ambiguity. (9 pts.)
2. Explicitly mention constraints that cannot be captured by the ER diagram. (1 pt.)

R E
0
R E
1
R E
1
R E
=1
The different types of
connecting lines between
entity set E and
relationship R indicate the
number of times an entity
in E may participate in R
with another entity.
3/8

4/8
Problem 3 (10 Points)
Map the ER diagrams given below to a relational database by specifying its SQL schema; see previous
problem for the interpretation of lines connecting entity sets and relationships. Note: if you prefer, you
may use the compact form of schema definition that omits the data types of the attributes. If you use the
actual SQL syntax, you may assume all attributes are of some type INT.

Make sure you capture as many integrity constraints as possible and explicitly mention those that cannot
by captured by the SQL schema (if any).

Solution 1:
A(A1,A2,PK(A1))
C(C1,C2,PK(C1))
B_R(A1, C1, B1, B2, R1, PK(B1), FK(A1)->A, FK(C1)->C)
//Cannot set A1 or C1 to NOT NULL here because it means every entity in B must participate in
the relationship exactly once, but the ER diagram shows each entity in B participates in the
relationship not more than once.
CREATE ASSERTION A_IN_R (
CHECK NOT EXISTS (
SELECT A1 FROM A
WHERE A1 NOT IN(
SELECT A1 FROM B_R) //Cannot use SELECT * here, because you
need to check whether A1 is NOT IN a set of values of A1.
)
)
All constraints are captured.

A B
R
A1 A2 B2 R1 B1
C
C1
C2
5/8
Solution 2:
A(A1,A2,PK(A1))
B(B1,B2,PK(B1))
C(C1,C2,PK(C1))
R(A1 NOT NULL, B1, C1 NOT NULL, R1, PK(B1), FK(A1)->A, FK(B1)->B, FK(C1)->C)
//Must set A1 and C1 to NOT NULL which means there are an entity A, an entity B and an entity
C in each record of relationship R. You do not need to set B1 to NOT NULL, because PK is not
null on default.
CREATE ASSERTION A_IN_R (
CHECK NOT EXISTS (
SELECT A1 FROM A
WHERE A1 NOT IN(
SELECT A1 FROM R)
)
)
All constraints are captured.

6/8
Problem 4 (10 points, 5 pts each question)
Consider the following relational schema (keys are in bold and underlined):
Suppliers(sid, name, city) - suppliers id, name, and the city the supplier is located in.
Parts(pid, name, color) - the id, name and color of parts.
Catalog(sid, pid, cost) - the price supplier sid charges for part pid.

Write the following queries in relational algebra. You may use S, P, and C as shorthand for Suppliers,
Parts, and Catalog, respectively.

(a) Find the names of suppliers that do not supply any red or green parts.

name
( (
sid
(S)
sid
(C
color =red

color =green
(P) ) ) S)

(b) Find the names of parts supplied by all suppliers.

name
( (
sid, pid
(C) /
sid
(S)) P)

or

name
( (
pid
(C)
pid
(
pid
(C) X
sid
(S)
pid,sid
(C) ) ) P)

7/8

Problem 5 (15 Points, 7.5 pts. each)
Assume the following SQL schema representing the history of transactions performed by customers in a
supermarket (keys are in bold and underlined):
Product(pid, name, price, mfr) - the product id, name, price and manufacturer of the product
Customer(cid, name, age) - customer cid and his/her name and age
Transaction(cid, pid, datetime) - customer cid purchased product pid on some date & time

Write one SQL statement for each of the following queries.

(a, 7.5 pts.) For each customer who has spent at least double the average amount spent by active
customers (customers that have made at least one purchase), print his/her name, the amount spent by this
customer, as well as the price of the most expensive product this customer bought.

select A.name, A.spent, C.max_spent
from
(select c.cid, c.name, sum(p.price) as spent
from product p, customer c, transaction t
where p.pid=t.pid AND c.cid=t.cid
group by cid, c.name) A,

(select AVG(spend) as avg_spent
from (select sum(p.price) as spent
from product p, transaction t
where p.pid=t.pid
group by t.cid)) B,

(select c.cid, max(p.price) as max_spent
from product p, transaction t
where p.pid=t.pid
group by cid) C

where A.cid=C.cid AND A.spent >=B.avg_spent*2

8/8
(b, 7.5 pts.) For each product that has been sold at least once, print the product name, the total quantity of
sales (the number of times the product has been sold), and the number of customers that bought the
product as well as their average age.

We will do it in steps to show the logic.
Lets first write down this portion of the query: number of sales and number of customers per product.

SELECT p.pid, p.name, count(*) AS nsales, COUNT(DISTINCT t.cid) AS ncustomers
FROM Transaction t, Product p
WHERE p.pid =t.pid
GROUP BY p.pid, p,name

We now have to figure out the last part of the query: for each product, find the average age of customers
that bought this product. It is tempting to take the above SQL, add a join with Customer c and add
AVG(DISTINCT c.age) or AVG(c.age) to the SELECT clause. However, this is wrong. The first AVG
will compute the average of unique c.age we want to include all ages even if two customers are of the
same age. The second AVG will include in the computation the age of a customer as many times as this
customer bought the same product, also wrong. The solution is to write a second SQL statement to get
each age of all customers (even if some are of the same age) per product exactly once:

SELECT t.pid, c.age
FROM Transaction t, Customer c
WHERE t.cid =c.cid
GROUP BY t.pid, c.age, c.cid

And now, we need to combine these two queries:

SELECT tmp1.name, tmp1.nsales, tmp1.ncustomers, AVG(tmp2.age)
FROM
(SELECT p.pid, p.name, count(*) AS nsales, COUNT(DISTINCT t.cid) AS ncustomers
FROM Transaction t, Product p
WHERE p.pid =t.pid
GROUP BY p.pid, p,name) tmp1,

(SELECT t.pid, c.age
FROM Transaction t, Customer c
WHERE t.cid =c.cid
GROUP BY t.pid, c.age, c.cid) tmp2

WHERE tmp2.pid =tmp1.pid
GROUP BY tmp1.pid, tmp1.name, tmp1.nsales, tmp1.ncustomers