Você está na página 1de 69

Machine Learning in Industry

Ralf Herbrich
Amazon
Overview

Theory
Inference in Factor Graphs
Approximate Message Passing
Applications @ Microsoft
TrueSkill: Gamer Rating and Matchmaking
TrueSkill Through Time: History of Chess
Click-Through Rate Prediction in Online Advertising
Matchbox: Recommendation Systems
Applications @ Amazon
Background Material

http://www.coursera.org http://www.cs.ubc.ca/~murphyk/MLbook/index.html

http://www.cs.ucl.ac.uk/staff/d.barber/brml/ http://research.microsoft.com/en-us/um/people/cmbishop/PRML/index.htm
Overview

Theory
Inference in Factor Graphs
Approximate Message Passing
Applications
TrueSkill: Gamer Rating and Matchmaking
TrueSkill Through Time: History of Chess
Click-Through Rate Prediction in Online Advertising
Matchbox: Recommendation Systems
Future Applications
Graphical Models

Definition: Graphical representation of joint


probability distribution
Nodes: = Variables
Edges: Relationship between variables
Variables:
Observed Variables: Data
Unobserved Variables: Causes + Temporary/Latent
Key Questions:
(Conditional) Dependency:
Inference/Marginalisation:
Factor Graphs

Definition: Graphical representation of product


structure of a function (Wiberg, 1996)
Nodes: = Factors = Variables
Edges: Dependencies of factors on variables.

Semantic: a b

c
Local variable dependency of factors
Factor Graphs and Bayes Law

Bayes law
s1 s s2

Factorising prior
t1 t2

Factorising likelihood

d
Inference: Sum out latent variables

y
Factor Trees: Separation
y
f3(x,y)
v w x

f1(v,w) f2(w,x)
z

f4(x,z)

Observation: Sum of products becomes product of sums of all


messages from neighbouring factors to variable!
Messages: From Factors To Variables
y
f3(x,y)
w x

f2(w,x)
z

f4(x,z)

Observation: Factors only need to sum out all their


local variables!
Messages: From Variables To Factors
y
f3(x,y)
x

f2(w,x)
z

f4(x,z)

Observation: Variables pass on the product of all


incoming messages!
The Sum-Product Algorithm
Three update equations (Aji & McEliece, 1997)

Update equations can be directly derived from the


distributive law.
Calculate all marginals at the same time!
Only need to pass messages twice along each edge!
Practical Considerations II

Redundant computations:
t

Caching: Only store and , then


A Bayesian Interpretation

Recall Bayes Law:

Prior and Data Messages: t

Message passing is separating the likelihood and prior


into outgoing and incoming message!
Approximate Message Passing
Problem: The exact messages from factors to
variables may not be closed under products.

Solution: Approximate each marginal as well as


possible in using a divergence measure on beliefs.

General Idea: Leave-one out approximation


Approximate Message Passing

* =
-5 0 5 -5 0 5 -5 0 5


* =
-5 0 5 -5 0 5 -5 0 5
Divergence Measures
Kullback-Leibler Divergence: Expected log-odd ratio
between two distributions:

Minimizer for Exponential Families: Matching the


moments of the distribution !
General -Divergence:

Special Cases:
-Divergence in Pictures
When to use which -Divergence?

x y

=0 resolves multi-modality in the posterior at the


expense of too much certainty!
When to use which -Divergence?

w1 w2

=1 captures all uncertainty for uni-modal posterior


distributions!
Sample (ctd)
Overview

Theory
Inference in Factor Graphs
Approximate Message Passing
Applications @ Microsoft
TrueSkill: Gamer Rating and Matchmaking
TrueSkill Through Time: History of Chess
Click-Through Rate Prediction in Online Advertising
Matchbox: Recommendation Systems
Applications @ Amazon
TrueSkill
Joint work with Thore Graepel, Tom Minka & Phillip Trelford
Motivation

Competition is central to our lives


Innate biological trait
Driving principle of many sports
Chess Rating for fair competition
ELO: Developed in 1960 by rpd Imre l
Matchmaking system for tournaments
Challenges of online gaming
Learn from few match outcomes efficiently
Support multiple teams and multiple players per
team
The Skill Rating Problem

Given:
Match outcomes: Orderings among k teams
consisting of n1, n2 , ..., nk players, respectively
Questions:
Skill si for each player such that

Global ranking among all players


Fair matches between teams of players
Two Player Match Outcome Model

Latent Gaussian performance model for fixed skills


Possible outcomes: Player 1 wins over 2 (and vice versa)

s1 s2

p1 p2

y12
Two Team Match Outcome Model

Skill of a team is the sum of the skills of its members

s1 s2 s3 s4

t1 t2

y12
Multiple Team Match Outcome Model

Possible outcomes: Permutations of the teams

s1 s2 s3 s4

t1 t2 t3

y
Multiple Team Match Outcome Model

But we are interested in the (Gaussian) posterior!

s1 s2 s3 s4

t1 t2 t3

y12 y23
Efficient Approximate Inference

Gaussian Prior Factors

s1 s2 s3 s4

Fast and efficient approximate message passing


t using Expectation
1 t Propagation t
2 3

Ranking Likelihood Factors


y12 y23
Applications to Online Gaming

Leaderboard
Global ranking of all players

Matchmaking
For gamers: Most uncertain outcome
For inference: Most informative
Both are equivalent!
Experimental Setup

Data Set: Halo 2 Beta


3 game modes
Free-for-All
Two Teams
1 vs. 1
> 60,000 match
outcomes
6,000 players
6 weeks of game play
Publically available
Convergence Speed

40

35

30

25
Level

20

15
char (TrueSkill)
10
SQLWildman (TrueSkill)
5 char (Halo 2 rank)
SQLWildman (Halo 2 rank)
0
0 100 200 300 400
Number of Games
Convergence Speed (ctd.)

100%
char wins
SQLWildman wins
Winning probability

80% Both players draw

60%

40%

20%
5/8 games won by char

0%
0 100 200 300 400 500

Number of games played


Xbox 360 & Halo 3

Xbox 360 Live


Launched in September 2005
Every game uses TrueSkill to match players
> 10 million players
> 2 million matches per day
> 2 billion hours of gameplay
Halo 3
Launched on 25th September 2007
Largest entertainment launch in history
> 200,000 player concurrently (peak: 1,000,000)
Halo 3 in Action
Halo 3 Public Beta Analysis
Skill Distributions of Online Games

Golf (18 holes): 60 levels

Car racing (3-4 laps): 40 levels

UNO (chance game): 10 levels


TrueSkillTM Through Time: Chess

Model time-series of skills by


smoothing across time
pt,i pt,j
History of Chess st,i st,j
3.5M game outcomes pt,i pt,j
(ChessBase)
20 million variables (each of
200,000 players in each year of
lifetime + latent variables) pt+1,i pt+1,j
st+1, i st+1, j
40 million factors
pt+1,i pt+1,j
ChessBase Analysis: 1850 - 2006

Garry Kasparov
3000

2800 Robert James Fischer


Anatoly Karpov
2600

Mikhail Botvinnik
Skill estimate

2400 Paul Morphy


Whilhelm Steinitz
2200
Boris V Spassky

2000 Emanuel Lasker

1800
Jose Raul Capablanca
1600
Adolf Anderssen
1400
1850 1858 1866 1875 1883 1891 1899 1907 1916 1924 1932 1940 1949 1957 1965 1973 1981 1990 1998 2006

Year
Online
Advertising
Joint work with Thore Graepel, Joaquin Quionero Candela, Onno Zoeter, Tom Borchert , Phillip Trelford
Why Predict Probability-of-Click?

Display (according to
expected revenue)

Charge (per click)

$1.00 * 10% =$0.10 $0.80
Advantages
$2.00 of improved
* 4% =$0.08 probability
$1.25 estimates:
Increase$0.10
user satisfaction
* 50% =$0.05 by better
$0.05targeting

Fairer charges to advertisers


Increase revenue by showing ads with high click-thru rate
Uncertainty: Bayesian Probabilities
102.34.12.201

15.70.165.9
Client IP
221.98.2.187

92.154.3.86

+ p(pClick)
Match Exact Match
Type Broad Match

ML-1

Position SB-1

SB-2
Training Algorithm in Action

w1 + w2

c
No Click

Prediction
Training/Update
Click
Inference: An Optimization View
Accuracy
MatchBox
Joint work with Thore Graepel, Joaquin Quionero Candela, David Stern, Ulrich Paquet
Crime Drama Action Comedy Action Action

Tarantino Mendes Campbell Mitchell Donner Wachowski

ID=4243 ID=534 ID=9834 ID=6345 ID=2452 ID=9864

1 2 3 4 5 6

Programmer

Age<30
A
ID=33451

Student

Age<30

ID=33431
B
Shopkeeper

Age>45
C
ID=4321

Student

Age<30 D
ID=5641
Matchbox With Metadata
User Metadata Item Metadata
ID=234 Male British Camera SLR
u01 u11 u21 User v11 v21

+ Item s1 User trait 1 t1 +

u02 u12 u22 v12 v22

+ s2 User trait 2 t2 +
Rating potential ~

r
Recommender System: MatchBox
User
likes
dislikes Social Network
Movie

Movie
mark Heat

ralf The Rock


User

tao The Godfather

sheryl
R. Scott

Director
Gender

Male C. Eastwood

Female Q. Tarantino

R. Howard
Message Passing For Matchbox

u01 u11 u21 v11 v21

+ s1 * t1 +

u02 u12 u22 v12 v22

+ s2 * t2 +

r
1.5
User/Item Trait Space
24: Season 3 Adaptation
1

24: Season 2

0.5

Preference Cone for user


145035
0
-1.5 -1 -0.5 0 0.5A Clockwork1Orange 1.5

A Knights Tale

-0.5

AI: Artificial Intelligence

-1

Users
A Cinderella Story Movies
-1.5
Incremental Training with ADF

Items
1 2 3 4 5 6

B
Users

D
ADF: Message Passing Iteration 1
1.5

0.5

0
-1.5 -1 -0.5 0 0.5 1 1.5

-0.5

-1

-1.5
Message Passing Iteration 2
1.5

0.5

0
-1.5 -1 -0.5 0 0.5 1 1.5

-0.5

-1

-1.5
Message Passing Iteration 3
1.5

0.5

0
-1.5 -1 -0.5 0 0.5 1 1.5

-0.5

-1

-1.5
Message Passing Iteration 4
1.5

0.5

0
-1.5 -1 -0.5 0 0.5 1 1.5

-0.5

-1

-1.5
feedback models
Feedback Models

u01 u11 u21 v11 v21

+ s1 t1 +

u02 u12 u22 v12 v22

+ s2 t2 +

r
Feedback Models

u01 u11 u21 v11 v21

+ s1 t1 +

u02 u12 u22 v12 v22

+ s2 t2 +

r
Feedback Models

=3
Feedback Models

> > < <

t0 t1 t2 t3
Feedback Models

>0
Message Passing: Compositionality
u11 u21 v11 v21

+ s1 t1 +

u12 u22 v12 v22

+ s2 t2 +
User Model Item Model
*

x1 x2 x3 x4 r

Context Model +

Feedback Model >0


Overview

Theory
Inference in Factor Graphs
Approximate Message Passing
Applications @ Microsoft
TrueSkill: Gamer Rating and Matchmaking
TrueSkill Through Time: History of Chess
Click-Through Rate Prediction in Online Advertising
Matchbox: Recommendation Systems
Applications @ Amazon
ML Opportunities @ Amazon

Retail Customers Seller Catalog Digital


Demand Product Fraud Browse-Node Named-
Forecasting Recommendation Detection Classification Entity
Vendor Lead Product Search Predictive Meta-data Extraction
Time Visual Search Help validation XRay
Prediction Product Ads Seller Review Analysis Plagiarism
Pricing Shopping Advice Search & Detection
Packaging Crawling
Customer Problem
Substitute Detection
Prediction

89
XRay
Machine Translation
Machine Translation: Deep Dive

p(English) p(Chinese| English)


p(English | Chinese) =
p(Chinese)
p(English) p(Chinese| English)

Language Translation
Model Model

Language Model: What are good English sentences?

Translation Model: What English sentences account


well for a given Chinese sentence?
Thanks!

Você também pode gostar