Você está na página 1de 1

Bayesian Based Bandit Solution For Music Recommendation

Barış Can Esmer, Beyza Gül Gürbüz, H. Esra Aydemir, Gökçe Uludoğan, Advisor: A. Taylan Cemgil
Bogazici University, Computer Engineering, CMPE547 - Bayesian Statistics

Introduction Model Methods Experiments


Most of the music Personalized, interactive, audio content and novelty Variational Bayes-UCB We have simulated user actions with respect to the
recommenders rec- based user rating model: information from our dataset and used this simula-
Following the convention of mean-field approxima-
ommend songs with tion for evaluation purposes.
T (1−t/s) tion, we assume that the joint posterior distribution
the highest user rat- U = UcUn = θ x(1 − e )
factorizes as follows:
ings and do not ex-
plore the user pref- where θ represent the user preferences, x the song p(Ω|D) = p(θ, β, τ |D) ≈ q(θ, β, τ ) = q(θ)q(β)q(τ )
erences, so it fails to features, t the time elapsed since the last time the Because of the choice of conjugate priors each factor
recommend new ris- song is listened and finally, s the recovery speed of distribution q(θ), q(β) and q(τ ) take the same para-
ing songs or to new the novelty. metric forms as the prior distributions. Specifically,
users. However, a 1 T T
q(θ) ∝ exp(− θ ΛθN θ + ηθN θ)
good recommender must balance exploring user 2
preferences and exploiting user ratings. In this work, 1 T T
music recommendation is formulated as a multi- q(β) ∝ exp(− β ΛβN β + ηβN β)
2
armed bandit problem. (a) Exact Model (b) Approximate Model q(τ ) ∝ τ aN −1exp(−bN τ )

Multi-Armed Bandit Approximate Model For optimization, we use the coordinate descent (a) Random (b) Greedy
A multi-armed bandit problem R | x, t, θ, β, σ 2 ∼ N (θ0xβ 0t, σ 2) method to minimize
is a sequential allocation prob- 2 2 KL( p(θ, β, τ |D) || q(θ)q(β)q(τ )) )
θ | σ ∼ N (µθ0, σ D0)
lem defined by a set of actions. 2 2
β | σ ∼ N (µβ0, σ E0)
At each time step, a unit re- Finally, since q(θ) and q(β) are normal distributions
source is allocated to an action τ = 1/σ 2 ∼ G(a0, b0) and linear combination of normal random variables (c) Lin-UCB (d) Bayes-UCB
and some observable payoff is is again a normal random variable, we obtain:
obtained. The goal is to maximize the total payoff Methods T
p(θ x|x, t, D) ≈ −1
T T −1
N (x ΛθN ηθN , x ΛθN x)
obtained in a sequence of allocations. [1] T T −1 T −1
p(β t|x, t, D) ≈ N (t ΛβN ηβN , x ΛβN x) Conclusion
Bayes-UCB and posterior distribution in Equation 2 can be cal-
The posterior distribution of the user parameters, culated as We have specifically investigated Bayes-UCB
Ω = {θ, s}, can be given as T T method for multi-armed bandit on music recommen-
p(U | x, t, D) = p(θ xβ t | x, t, D) =
U dation problem and compared it with other meth-
P (Ω | Dl) ∝ P (Ω)P (Dl | Ω) (1) T T
p(θ x = a | x, t, D)p(β t = | x, t, D)da ods. We observed that Bayes-UCB surpasses other
Z

a
l Greedy & ε-greedy methods in terms of cumulative rating and regret.
λk = p(Uk |Dl) = p(Uk |Ω)p(Ω|Dl)dΩ (2)
Z

Dataset
• Recommends song with highest expected rating.
where Dl = {(xi, ti, ri)}li=1. Bayes-UCB then rec- References
• Pure exploitation (Exploration with ε in
MSD genre dataset is used for song contents[2]. We ommends the song k ∗ that satisfies:
used a sample containing 6568 songs. ε-greedy)
∗ l [1] Sébastien Bubeck, Nicolo Cesa-Bianchi, et al.
k = arg max Q(α, λk ) • L-BFGS-B optimization with minimum MSE.
Features of songs are as follows: k=1...|S| Regret analysis of stochastic and nonstochastic multi-armed bandit
problems.
• genre where Q is the quantile (generalized inverse) func- Lin-UCB
• duration • time signature [2] Thierry Bertin-Mahieux, Daniel P.W. Ellis, Brian Whitman, and Paul
tion. Lamere.
• key • loudness • avg timbres • Assumes expected rating is a linear function of The million song dataset.
To make the algorithm more responsive, a highly
• mode • tempo • var timbres the features. [3] Lihong Li, Wei Chu, John Langford, and Robert E Schapire.
efficient variational inference algorithm is proposed A contextual-bandit approach to personalized news article
• Ridge regression with upper confidence bound.
for approximating the inference procedure. recommendation.
• Balances exploration and exploitation. [4] Xinxi Wang, Yi Wang, David Hsu, and Ye Wang.
Exploration in interactive personalized music recommendation: a
reinforcement learning approach.

Você também pode gostar