Pid 23

Reverse Top-k Queries
Akrivi Vlachou*, Christos Doulkeridis*, Yannis Kotidis#, Kjetil Nrvg*

*Norwegian University of Science and Technology (NTNU), Trondheim, Norway
#
Athens University of Economics and Business (AUEB), Greece
Outline
Motivation
& Preliminaries
Monochromatic Reverse Top-k Queries
Bichromatic Reverse Top-k Queries
Threshold-based
Algorithm
Materialized Views
Experimental
Evaluation
Conclusions & Future Work
2
Rank-aware Query Processing

Huge
amount of
available data
Users prefer to retrieve
a limited set of k
ranked data objects
that best match their
preferences (top-k
queries)
3
Top-k Query
Given a scoring function f(),

retrieve the k object that best
match the user preferences
Linear scoring function
f w(p) = w[i]*p[i]
Weight w[i]:
relative importance of attribute i
Definition TOPk(w): Given a

weighting vector w and a
positive integer k, find the k data
points p with the minimum f(p)
scores
Query line of w at point p: defines the

score of p
Query space of w defined by point p:
number of enclosed points determines
the rank of p
4
Reversing the Top-k Query

From
the perspective of
manufacturers:
it is important that a
product is returned in
the highest ranked
positions for as many
user preferences as
possible
estimate the impact of
a product compared to
their competitors
products
advertise a product to
potential customers
sales representative
Which customers
would be interested?
customer customer customer customer

5
Reversing the Top-k Query

Reverse
top-k query:
Given a potential product q

and a positive integer k,
which are the weighting
vectors w for which q is in
the top-k query result set?
Two
different versions
Monochromatic:
sales representative
no knowledge of user
preferences
Bichromatic:
a dataset with user

preferences is given
customer customer customer customer

6
Car Database Example
A database containing information about different cars

Different users have different preferences
Bob prefers a cheap car, and does not care much about the age
the best choice (top-1) for Bob is the car p1 with score 2.5
Tom prefers a newer car rather than a cheap car
the best choice for Tom and Max is the car p2
Car Database Example
Query point q=p2, k=1:

Bichromatic reverse top-k: {(0.2,0.8), (0.5,0.5)}
advertise product to Tom and Max
Monochromatic reverse top-k: line segment w[price]=[1/7,5/6]
estimate the impact of p2 as 69%
Query point q=p3, k=1: empty result set for the bichromatic query
Outline
Motivation
& Preliminaries
Threshold-based
Algorithm
Materialized Views
Experimental
Evaluation
9
Monochromatic Reverse Top-k Query
mRTOPk(q): Given a point q, a

positive number k and a dataset
S, the result set of the
monochromatic reverse top-k
query is the locus for which
there exists p in TOPk(wi) such
that fwi(q) fwi(p).
The solution space W can be
split into a finite set of nonadjacent partitions such that
query point q has the same rank
for all the weighting vectors.
For the monochromatic case: we
focus on the 2-d space
mRTOP1(q)
1
Solution space
10
Geometric Interpretation d=2, k =1
If q belongs to the convex hull, then

there exists exactly one partition in
mRTOP1(q)
Weighting vectors that are

perpendicular to pq and qr define the
line segment
For weighting vectors with smaller and
larger slopes than w1, the relative order
of p and q changes
Monochromatic
reverse top-k, k>1:
The solution space may contain

more than 1 partition
11
Outline
Motivation
& Preliminaries
Threshold-based
Algorithm
Materialized Views
Experimental
Evaluation
14
Bichromatic Reverse Top-k Query

bRTOPk(q):
Given a point q, a positive number k

and two datasets S and W, where S represents data
points and W is a dataset containing different
weighting vectors, a weighting vector wi belongs to
the result set, if and only if there exists p in TOPk(wi)
such that fwi(q) fwi(p)
Nave
approach:
for each weighting vector process the top-k query
test if query point q is in the top-k list
15
Threshold-based Algorithm (RTA)

Goal:
reduce
the number of top-k evaluations by discarding

weighting vectors
Threshold-based Algorithm
sort
(RTA):
the weighting vectors based on pairwise similarity
top-k
sets
queries defined by similar vectors, have similar result
evaluate
the first top-k query, calculate a threshold

For each weighting vector
possibly
prune based on threshold

refine threshold
16
Example of RTA Algorithm (k=2)

Evaluate
top-2 query
for w1
p9
p8
Set
threshold based
on w2
fw2(q)
> threshold
discard w2
Refine
Buffer: p1, p2
threshold for
p10
p5
p1
p6
p4
w3
p
2
w1
w2
p7
p3
w3
W=[ w1, w2, w3 ]
17
Materialized Views
Threshold-based Algorithm
(RTA)
reduce
the top-k evaluations by discarding some

weighting vectors that are not in the reverse top-k
result set
process at least as many top-k evaluations as the
cardinality of the result set
Materialized
Views
find
weighting vectors that belong definitely to

the result without top-k evaluation
18
Materialized Views
Grid-based
space
w1, w2, w3
partitioning
cell
Ci
lower
left corner CiL
upper
right corner CiU
We store
for each cell Ci

the results of reverse
top-k queries for corners
CiL and CiU
19
Materialized Views
Given
a point q enclosed
in cell Ci
all weighting vectors
in RTOPk(CiU) belong
to the result set of q
only weighting
vectors in
w1, w2, w3
RTOPk(CiL) - RTOPk(CiU)
have to be examined
Materialized views can
be generalized for
arbitrary k<K values
w1, w2, w3 , w4
20
Outline
Motivation
& Preliminaries
Threshold-based
Algorithm
Materialized Views
Experimental
Evaluation
21
Experimental Setup
Comparison
between Nave and RTA

(varying dimensionality, cardinality, data
distribution real data)
Queries: uniform and k-skyband points
Metrics:
time
I/Os
number
of top-k evaluations
22
RTA vs. Nave

uniform distribution of S and uniform weights W
|S|=10K, |W|=10K, top-k=10, skyband query points
RTA outperforms naive by 1 to 2 orders of magnitude

as dimensionality increases, |RTOPk(q)| decreases leading to
fewer top-k evaluations
23
Scalability of RTA Algorithm

various distributions (UN, AC, CO) of S and uniform weights W
|S|=10K or |W|=10K, d=5, top-k=10, skyband query points
naive requires |W| top-k query evaluations

|W|=5K, correlated dataset:
RTA needs on 544 out of 5000 top-k evaluations (saves 89.12% of the cost)
the average size of the result set is 459
24
Performance of RTA on Real Data

NBA consists of 17265 tuples, d=5 (number of points scored, rebounds,
assists, steals and blocks)
HOUSE consists of 127930 tuples, d=6 (income spent on gas, electricity,
water, heating, insurance, and property tax)
uniform and clustered weights W (|W|=10K)

clustered weights lead to fewer top-k evaluations
25
Outline
Motivation
& Preliminaries
Example of Reverse Top-k Queries
Threshold-based
Algorithm
Materialized Views
Experimental
Evaluation
26
Conclusions and Future Work

We introduced
reverse top-k queries

geometric interpretation of the solution space
efficient algorithm for bichromatic reverse top-k
query
materialized reverse top-k views
Future Work
interpretation of solution space for higher
dimensions (monochromatic reverse top-k)
improve the performance of the bichromatic reverse
top-k computation
27
Thank you!
Related work:
Akrivi Vlachou, Christos Doulkeridis, Yannis Kotidis, Kjetil Nrvg: "Reverse
Top-k Queries"
Akrivi Vlachou, Christos Doulkeridis, Kjetil Nrvg, Yannis Kotidis: "Identifying
the Most Influential Data Objects with Reverse Top-k Queries"
More information: http://www.idi.ntnu.no/~vlachou/

28

Pid 23

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Pid 23

Enviado por

Direitos autorais:

Formatos disponíveis

Reverse Top-k Queries

Akrivi Vlachou*, Christos Doulkeridis*, Yannis Kotidis#, Kjetil Nrvg*

Rank-aware Query Processing

Given a scoring function f(),

relative importance of attribute i

Definition TOPk(w): Given a

Query line of w at point p: defines the

Reversing the Top-k Query

customer customer customer customer

Reversing the Top-k Query

Given a potential product q

a dataset with user

customer customer customer customer

Car Database Example

A database containing information about different cars

Car Database Example

Query point q=p2, k=1:

advertise product to Tom and Max

Monochromatic reverse top-k: line segment w[price]=[1/7,5/6]

estimate the impact of p2 as 69%

Monochromatic Reverse Top-k Query

mRTOPk(q): Given a point q, a

Geometric Interpretation d=2, k =1

If q belongs to the convex hull, then

Weighting vectors that are

reverse top-k, k>1:

The solution space may contain

Bichromatic Reverse Top-k Query

Given a point q, a positive number k

Threshold-based Algorithm (RTA)

the number of top-k evaluations by discarding

the weighting vectors based on pairwise similarity

queries defined by similar vectors, have similar result

the first top-k query, calculate a threshold

prune based on threshold

Example of RTA Algorithm (k=2)

the top-k evaluations by discarding some

weighting vectors that belong definitely to

left corner CiL

right corner CiU

for each cell Ci

between Nave and RTA

RTA vs. Nave

RTA outperforms naive by 1 to 2 orders of magnitude

Scalability of RTA Algorithm

naive requires |W| top-k query evaluations

Performance of RTA on Real Data

uniform and clustered weights W (|W|=10K)

Conclusions and Future Work

reverse top-k queries

More information: http://www.idi.ntnu.no/~vlachou/

Você também pode gostar

Akrivi Vlachou, Christos Doulkeridis, Yannis Kotidis#, Kjetil Nrvg*