Você está na página 1de 34

Rank Aggregation 1

Nguyen Tien Truong

16.07.2014| Fachbereich Informatik| Knowledge Engineering Group| Prof.Frnkranz


berblick

1. Basic idea

2. Arrows Criteria Revisited (Pfeils Kriterien)

3. Rank-Aggregation Methods

4. A Refinement Step after Rank Aggregation

5. Rating Aggregation

6. Producing Rating Vectors from Rating Aggregation Matrices

7.Summary of Aggregation Methods

16.07.2014| Fachbereich Informatik| Knowledge Engineering Group| Prof.Frnkranz |2


1.Basic idea

The dictum ,,the whole is greater than the sum of its parts

Merge several ranked lists into in order build a single new superior list

l1 l2 l3 lk

...

Aggregated list

Using mathematical techniques


Before with OD method : r=o/d

16.07.2014| Fachbereich Informatik| Knowledge Engineering Group| Prof.Frnkranz |3


2.Arrows Criteria Revisited

Arrows theorem:
No voting system can ever simultaneously satisfy the four criteria of an
unrestricted domain,independence of irrelevant alternatives,the Pareto
priciple ,and non- dictatorship.

16.07.2014| Fachbereich Informatik| Knowledge Engineering Group| Prof.Frnkranz|4


2.Arrows Criteria Revisited

Some differences between the problem of ranking candidates in a political system


and the problem of ranking teams or individuals in sporting event.

Political Voting : Voter 1 Vote 2 Vote 3 Vote 4 Vote n

Candidate 1 Small m .... few candidates


Candidate 2
Big n .... many voters
.
.
...
.
Candidate m

Sports Ranking: Method 1 Method 2 Method n

Team 1 Big m .... many teams


Team 2 Small n .... few methods
. ...
.
.
Team m

16.07.2014| Fachbereich Informatik| Knowledge Engineering Group| Prof.Frnkranz|5


2.Arrows Criteria Revisited

1.Arrows criterion: an unrestricted domain


the voting system with a requirement not be satified

2.Arrows criterion: the independence of irrelevant alternatives


Relative ranking within subsets of candidates should be maintained when expanding
back to the whole set.

Sensitivity and robustness

A robust ranking method would satify Arrows second criterion

16.07.2014| Fachbereich Informatik| Knowledge Engineering Group| Prof.Frnkranz| 6


2.Arrows Criteria Revisited

3.Arrows criterion: the Pareto principle


If all voters choose candidate A over B ,then a proper voting system
should maintain this order.

4.Arrows criterion: Non-dictatorship


No voter should have more weight than another.

Not useful in the sports ranking context.

16.07.2014| Fachbereich Informatik| Knowledge Engineering Group| Prof.Frnkranz| 7


3.Rank-Aggregation Methods

The idea of rank aggregation is not new .

Rank aggregation tends to act as a ,,smoother.

A rank aggregated list is only as good (or as bad ) as the lists from which it is
built.

Borda Count ,Average Rank (old).

Simulated Game Data, {PageRank,HITS,SALSA} with graph(new).

16.07.2014| Fachbereich Informatik| Knowledge Engineering Group| Prof.Frnkranz| 8


3.Rank-Aggregation Methods
Borda Count

Mathematical ideas (Llull count ) of Ramon Llull (1232-1315) and date back to 1770 by
Jean-Charles de Borda .

For each ranked list ,each candidate receives a score that is equal to the number of
candidates he or she outranks.

Create Borda count ,then Borda Rank.

16.07.2014| Fachbereich Informatik| Knowledge Engineering Group| Prof.Frnkranz| 9


3.Rank-Aggregation Methods
Borda Count

Massey Colley
OD
Example : 1st 1st 1st
2nd 2nd 2nd
3rd 3rd 3rd
4th 4th 4th
5th 5th 5th

OD(r=o/d) Massey Colley Borda Borda Rank


Count
Duke 0 0 0 0 5th
Miami 3 4 4 11 1st
UNC 1 1 2 3 4th
UVA 2 2 1 5 3rd
VT 4 3 3 10 2nd

16.07.2014| Fachbereich Informatik| Knowledge Engineering Group| Prof.Frnkranz| 10


3.Rank-Aggregation Methods
Borda Count

Handle ties
Example:
ranked list from best to worst as {Miami ,VT,UNC/UVA,Duke} (the slash / indicates a tie)
Duke
Borda scores Miami
UNC
UVA
VT

Can produce an output aggregate list that contains ties


Drawback :it is easily manipulated.

16.07.2014| Fachbereich Informatik| Knowledge Engineering Group| Prof.Frnkranz| 11


3.Rank-Aggregation Methods
Average Rank

The integers representing rank in several ordered lists are averaged to create
a rank aggregated list.

Example:

OD(r=o/d) Massey Colley Average Average


Rating Rank
Duke 5th 5th 5th 5 5th
Miami 2nd 1st 1st 1.3 1st
UNC 4th 4th 3rd 3.6 4th
UVA 3rd 3rd 4th 3.3 3rd
VT 1st 2nd 2nd 1.6 2nd

16.07.2014| Fachbereich Informatik| Knowledge Engineering Group| Prof.Frnkranz| 12


3.Rank-Aggregation Methods
Average Rank

Drawback:the frequent occurence of ties in the average rating score.

Example :
let l1 and l2 be two full ranked lists .
Team i is ranked 1st in l1 ,but 3rd in l2.
Team j is ranked 2nd in both lists.
=>Average rank produces a tie for second place.

Strategies for breaking ties

1. Using of past data with pair-wise matchups(i and j played each other,then
the winner should be ranked ahead of the losing team).But a tie-breaking
team is more difficult when averaging more than two lists.
Example : i defeated j ,j defeated k ,k defeated i =>circular tie.
2. Tie breaking list (the superior).We have a method for determining which
ranke list is better (chapter 16).
Average rank can be applied only if all lists are full.

Produce an aggregated list that does not contain ranks .(An average rating
vector ->a ranking vector).

16.07.2014| Fachbereich Informatik| Knowledge Engineering Group| Prof.Frnkranz| 13


3.Rank-Aggregation Methods
Simulated Game Data

A new rank aggregation method that is born from a simple interpretation of a


ranked list.

The interpretation : if team A appears above team B in a ranked list,then in a


matchup between these two teams A ought to beat B.

1st 1st
2nd 2nd
. A is stronger . A is much
. than B . stronger than B
. .

last last

Give implicit information about future game outcomes.


Using of ranked lists to generate so- called simulated game data

16.07.2014| Fachbereich Informatik| Knowledge Engineering Group| Prof.Frnkranz| 14


3.Rank-Aggregation Methods
Simulated Game Data

Each ranked list of length n provides data for =n(n-1)/2 simulated games.

A margin of victory are related to the difference in the ranked position of the
two teams.

Example : 5 teams .Starting with the OD ranking vector


VT beat Miami by one point ,UVA by 2 points,
UNC by 3 points,and Duke by 4 points.
OD Massey Colley

1st 1st 1st


2nd 2nd 2nd
3rd 3rd 3rd
4th 4th 4th
5th 5th 5th

16.07.2014| Fachbereich Informatik| Knowledge Engineering Group| Prof.Frnkranz| 15


3.Rank-Aggregation Methods
Simulated Game Data

Accumulate a mount of simulated game data.


Unlike most other aggregation methods,the lists need not be full.
Set the losing teams score to 18(the average losing teams score for the 2005 season)

Duke Miami UNC UVA VT


Duke 18-21 18-19 18-20 18-22
18-22 18-19 18-20 18-21
18-22 18-20 18-19 18-21
Miami 21-18 20-18 19-18 18-19
22-18 21-18 20-18 19-18
22-18 20-18 21-18 19-18
UNC 19-18 18-20 18-19 18-21
19-18 18-21 18-19 18-20
20-18 18-20 19-18 18-19
UVA 20-18 18-19 19-18 18-20
20-18 18-20 19-18 19-20
19-18 18-21 18-19 18-20
VT 22-18 19-18 21-18 20-18
21-18 18-19 20-18 20-19
21-18 18-19 19-18 20-18

16.07.2014| Fachbereich Informatik| Knowledge Engineering Group| Prof.Frnkranz| 16


3.Rank-Aggregation Methods
Simulated Game Data

Any ranking method is applied to the simulated game data


=>combiner method.

HITS Method Massey Method Colley Method

Simulated Game Data

Combiner Method

Aggregated List

Simulated data method for rank aggregation

16.07.2014| Fachbereich Informatik| Knowledge Engineering Group| Prof.Frnkranz| 17


3.Rank-Aggregation Methods
Simulated Game Data

Combiner Method

OD(r=o/d) Massey Colley

Duke 5th 5th 5th


Miami 1st 1st 1st
UNC 3rd 4th 4th
UVA 4th 3rd 3rd
VT 2nd 2nd 2nd

Simulated data method of rank aggregation using OD,Massey ,and Colley as the combiner method

16.07.2014| Fachbereich Informatik| Knowledge Engineering Group| Prof.Frnkranz| 18


3.Rank-Aggregation Methods
Simulated Game Data

Four properties of the simulated data approach to rank aggregation

1.The ranked aggregated list is only as good (or as bad ) as the input lists.
2. The combiner method acts as a ,,smoother in that it minimizes the effect of outliers
,which are lists containing anomalies that seem inconsistent with the rankings in other
lists.
Example:
We removed Duke from data set
-Massey and Colley methods are used as the combiner methods
=>swap in the rankings between UNC and UVA.
-OD method is the combiner method =>create a consistent ranking.
the OD rank-aggregation method is robust.
Combiner Method
OD(r=o/d) Massey Colley

Miami 1st 1st 1st


UNC 3rd 3rd 3rd
UVA 4th 4th 4th
VT 2nd 2nd 2nd

16.07.2014| Fachbereich Informatik| Knowledge Engineering Group| Prof.Frnkranz| 19


3.Rank-Aggregation Methods
Simulated Game Data

Four properties of the simulated data approach to rank aggregation

3.The input lists satisfy the Pareto principle


=>Rank-aggregated lists satisfy Arrows third criterion.

4.The combiner method is also one of the input methods


=>a partial dictator in the language of Arrows fourth criterion.

16.07.2014| Fachbereich Informatik| Knowledge Engineering Group| Prof.Frnkranz| 20


3.Rank-Aggregation Methods
PageRank,HITS with graph theory

This method relies on graph theory.

Duke 4+3+3=10 Wij=number of ranked lists having i below j


11
4 5 Or
1 Wij =sumer of ranked differences from
VT lists having i below j
Miami
2 6

7 6 5 1st
1 2nd
UNC 3rd
UVA
4th
5th
2
Apply the PagRank method

Two new rank aggregation methods are much more sophisticated then
two old methods.

16.07.2014| Fachbereich Informatik| Knowledge Engineering Group| Prof.Frnkranz| 21


4. A Refinement Step after
Rank Aggregation

After several lists l1 ,l2 ,l3 ,...,lk haven been aggregated into one list , a
refinement step called local Kemenization can be implementd to the further
improve the list .

Tau () measures how far apart the ranked lists are.

The sum of Kendall tau measures between input list li and where
i=1,...,k.

Dont exist pair-wise swaps of items in the list -> is locally Kemeny
optimal.

16.07.2014| Fachbereich Informatik| Knowledge Engineering Group| Prof.Frnkranz| 22


4. A Refinement Step after
Rank Aggregation
For an aggregated list of length n, local Kemenization requires n-1 checks.

Example: OD l1 Massey l2 Colley l3 Sim.Data


1st 1st 1st 1st
2nd 2nd 2nd 2nd
3rd 3rd 3rd 3rd
4th 4th 4th 4th
5th 5th 5th 5th

Local Kemenization on the 5-team example


Check #1
Question :Does the second place item(VT) beat the first place item (Miami) in the majority of the input lists?
Answer :No,VT beats Miami only once in the three input lists ->Action :No
Check #2
Question :Does the third place item(UNC) beat the second place item (VT) in the majority of the input lists?
Answer :No,UNC never beats VT in the three input lists ->Action :No

Check #3
Question :Does the fourth place item(UVA) beat the third place item (UNC) in the majority of the input lists?
Answer :Yes,UVA beats UNC in two out of the three input lists ->Action :Swap UNC and UVA in

Check #4
Question :Does the fifth place item(Duke) beat the fourth place item (UVA) in the majority of the input lists?
Answer :No,Duke never beats UVA in the three input lists ->Action :No

16.07.2014| Fachbereich Informatik| Knowledge Engineering Group| Prof.Frnkranz| 23


4. A Refinement Step after
Rank Aggregation

1st
2nd
The locally Kemenized list is 3rd
4th
5th

16.07.2014| Fachbereich Informatik| Knowledge Engineering Group| Prof.Frnkranz| 24


5. Rating Aggregation

Can rating lists be aggregated as well ? The answer is yes .

How do we aggregate rating vectors when each contains numerical values of


such differing scale (Mastab,Skala)?
->The answer: Put on such varying ratings on the same scale (invoke distances
and percentages)
Example:
Massey Colley mHITS
BEST 18.2 Miami .79 Miami .041 VT
18.0 VT

.65 VT

.027 Miami
.50 UNC
-3.4 UVA
-8.0 UNC
.36 UVA .012 UVA

.006 UNC

WORST -24.8 Duke .21 Duke .003 Duke

16.07.2014| Fachbereich Informatik| Knowledge Engineering Group| Prof.Frnkranz| 25


5. Rating Aggregation

A rating vector of length n-> =n(n-1)/2 comparisons between the n total


teams

These rating differences as distances,then produce the rating distance matrix.

Duke

rMassey = Miami
UNC
UVA
VT

Duke Miami UNC UVA VT


Duke
Miami
RMassey = UNC
UVA
VT

16.07.2014| Fachbereich Informatik| Knowledge Engineering Group| Prof.Frnkranz| 26


5. Rating Aggregation

Duke Duke
Miami Miami
rColley = UNC rOD = UNC
UVA UVA
VT VT

Duke Miami UNC UVA VT


Duke Miami UNC UVA VT
Duke
Duke
Miami
RColley =
Miami
UNC ROD = UNC
UVA
UVA
VT
VT

16.07.2014| Fachbereich Informatik| Knowledge Engineering Group| Prof.Frnkranz| 27


5. Rating Aggregation

Using of the age old trick of normalization


1. S=the sum of all distances in that matrix.
2. Each element of a raw R matrix/S.

Duke Miami UNC UVA VT S=240


Duke 0.1792=43/240
RMassey = Miami
UNC
UVA
VT

16.07.2014| Fachbereich Informatik| Knowledge Engineering Group| Prof.Frnkranz| 28


5. Rating Aggregation

Duke Miami UNC UVA VT


Duke
Miami
RColley = UNC
UVA
VT

Duke Miami UNC UVA VT


Duke
Miami
ROD = UNC
UVA
VT

.1792 is the distance between Miami an Duke,is 17.92% of the total


distances predicted between all pair-wise matchups in the Massey model.

Compare across different methods.

16.07.2014| Fachbereich Informatik| Knowledge Engineering Group| Prof.Frnkranz| 29


5. Rating Aggregation

Duke Miami UNC UVA VT


Duke

Rave = Miami
UNC
UVA
VT

16.07.2014| Fachbereich Informatik| Knowledge Engineering Group| Prof.Frnkranz| 30


6. Producing Rating Vectors
from Rating Aggregation Matrices

Collapsing the information in the two dimensional matrix into a one dimensional
rating vector.

Method 1:
The row sums of Rave ->a measures of a teams offensive output
The column sums of Rave ->a measures of a teams defensive output
The offensive rating vector o= Rave e ,with e=(1,1...,1)T
The defensive rating vector d= eT Rave
The rating vector r=o/d

Method 2:
Apply Markov method
Normalize row of matrix RTave
Method 3:
Using of the dominant eigenvectors of Rave (the Perron vector of Rave )
The dominant eigenvector of an inrreducible nonnegative matrix ->nonnegative

16.07.2014| Fachbereich Informatik| Knowledge Engineering Group| Prof.Frnkranz| 31


6. Producing Rating Vectors
from Rating Aggregation Matrices

The result

Team Method 1 Method 2 Method 3


r=o/d Markov r Perron r
Duke 0 5th .020 5th .27 5th

Miami 16.4 2nd .465 2nd .58 2nd

UNC .4 3rd .025 3rd .34 3rd

UVA .3 4th .024 4th .33 4th

VT 26.0 1st .0466 1st .61 1st

16.07.2014| Fachbereich Informatik| Knowledge Engineering Group| Prof.Frnkranz| 32


7. Summary of Aggregation Methods

Combining human-generated lists ,or merge data from both human and
computer sources.

Qworst Qaggregate Qbest ,where Q is a quality measure that can be used to


score method.

The aggregated list is heplful in one-time predictive situations.

16.07.2014| Fachbereich Informatik| Knowledge Engineering Group| Prof.Frnkranz| 33


Vielen Dank fr
Ihre Aufmerksamkeit

16.07.2014| Fachbereich Informatik| Knowledge Engineering Group| Prof.Frnkranz| 34

Você também pode gostar