Escolar Documentos
Profissional Documentos
Cultura Documentos
1. 2. Dalvi, Suciu. Efficient query evaluation on probabilistic databases, VLDB2004. Das Sarma et al. Working models for uncertain data, ICDE2006.
A tuple is an answer to the query is a probabilistic event Can be extended to all data models; we discuss only probabilistic relational data
2
Pr : INST ! [0,1]
Query Semantics
Given a query Q and a probabilistic database Ip, what is the meaning of Q(Ip) ?
Query Semantics
Semantics 1: Possible Answers A probability distribution on sets of tuples 8 A. Pr(Q = A) = I 2 INST. Q(I) = A Pr(I)
Semantics 2: Possible Tuples A probability function on tuples 8 t. Pr(t 2 Q) = I 2 INST. t2 Q(I) Pr(I)
5
Representation Formalisms
Problem Need a good representation formalism Will be interpreted as possible worlds Several formalisms exists, but no winner
Evaluation of Formalisms
Completeness? What possible worlds can it represent? What probability distributions on worlds? Closure? Is it closed under evaluation of query operators?
9
J=
John Sue
e1 e2 e3 =
Ip
000 ;
001
010
011
100
Sue
101
110
111
John Seattle
Denver
John Seattle
Denver
p1p2(1-p3) 11 +p1p2p3
E1 = e1 John Seattle E2 = :e1 e2 p1 John Boston E3 = :e1 :e2 e3 Sue Seattle E4 = :e1 :e2 :e3 e4 Prefix code Name Address John Seattle p2
Sue Seattle
Name Address
John John Sue Seattle Boston Seattle
E
E1 E2 E1 E4 E1 E2 E3
=Ip
Name Address Sue Seattle
J=
p3
p4
v E
v1 v2
E1 E2 E1 E2
v E
v1 E1
v2 E2
v E1
v E2
15
J= Ip = ; I1
Name John City Seattl Name Sue
I2
I3
I4 I5 I6
p1(1-p2)p3
I7
(1-p1)p2p3
I8
p1p2p3
16
=1
p1
Sue
p1p2
=Ip
Very limited cannot capture correlations across tuples Not Closed Query operators can introduce complex correlations!
17
1-p1 - p1p2
Fred
SELECT DISTINCT x.city FROM Person x, Purchase y WHERE x.Name = y.Customer and y.Product = Gadget Tuple Seattle Boston
Gadget
...
q7
Name City
John Sue Fred Seattle Boston Boston
Profession
statistician musician physicist
Step 1: evaluate ~ predicates
Sue
Sue Fred
Gadget
Gadget Gadget
microphone
instrument microphone
SELECT DISTINCT x.city FROM Person x, Purchase y WHERE x.Name = y.Cust and y.Product = Gadget and x.profession ~ scientist and y.category ~ music
19
Name City
John Sue Seattle Boston
Profession pr
statistician musician p1=0.8 p2=0.2
John
John
John Sue Sue Sue
Gadget
Gadget Camera Gadget Gadget
instrument
instrument musicware
q2=0.6
q3=0.6 q4=0.9 q6=0.6
Fred
Boston
physicist
p3=0.9
SELECT DISTINCT x.city FROM Personp x, Purchasep y WHERE x.Name = y.Cust and y.Product = Gadget and x.profession ~ scientist and y.category ~ music
Fred
Gadget
Tuple
microphone q7=0.7
Probability p1(1-(1-q2)(1-q3)) 1-(1-p2(1-(1-q5)(1-q6))) 20 3q7) (1-p
Seattle Boston
21
22
Randomly make each variable true with the following probabilities Pr(X1) = p1, Pr(X2) = p2, . . . . . , Pr(X6) = p6
23
0 0
0
0 1 1
1
1 0 0
0
1 0 1
0
1 0 1 p1(1-p2)p3 (1-p1)p2p3
1
1
1
1
0
1
1
1
p1p2(1-p3)
p1p2p3
[Valiant:1979]
The decision problem for 2CNF is in PTIME The counting problem for 2CNF is #P-complete 25
Query Complexity
Data complexity of a query Q: Compute Q(Ip), for probabilistic database Ip Simplest scenario only: Possible tuples semantics for Q Independent tuples for Ip
27
[Fuhr&Roellke:1997,Dalvi&Suciu:2004]
v p1(1-p2)
v p1 p2
v p
v1 p1
v2 p2
p1
v p2
[Dalvi&Suciu:2004]
SELECT DISTINCT x.City FROM Personp x, Purchasep y WHERE x.Name = y.Cust and y.Product = Gadget Jon Sea p1(1-(1-q1)(1-q2)(1-q3)) Correct Jon 1-(1-q1)(1-q2)(1-q3) Jon Jon q1 q2
Jon Sea p1
Jon Sea
p1
[Dalvi&Suciu:2004]
Query Complexity
Sometimes @ correct (safe) extensional plan
Theorem The following are equivalent Q has PTIME data complexity Q admits an extensional plan (and one finds it in PTIME) Q does not have Qbad as a subquery
30
31
Sound and complete safe SPJ evaluation algorithm If a safe plan exists, the algo finds it!
32