Bayesian Bandit Solution for Music Recommendation

Bayesian Based Bandit Solution For Music Recommendation
Barış Can Esmer, Beyza Gül Gürbüz, H. Esra Aydemir, Gökçe Uludoğan, Advisor: A. Taylan Cemgil
Bogazici University, Computer Engineering, CMPE547 - Bayesian Statistics
Introduction Model Methods Experiments

Most of the music Personalized, interactive, audio content and novelty Variational Bayes-UCB We have simulated user actions with respect to the
recommenders rec- based user rating model: information from our dataset and used this simula-
Following the convention of mean-field approxima-
ommend songs with tion for evaluation purposes.
T (1−t/s) tion, we assume that the joint posterior distribution
the highest user rat- U = UcUn = θ x(1 − e )
factorizes as follows:
ings and do not ex-
plore the user pref- where θ represent the user preferences, x the song p(Ω|D) = p(θ, β, τ |D) ≈ q(θ, β, τ ) = q(θ)q(β)q(τ )
erences, so it fails to features, t the time elapsed since the last time the Because of the choice of conjugate priors each factor
recommend new ris- song is listened and finally, s the recovery speed of distribution q(θ), q(β) and q(τ ) take the same para-
ing songs or to new the novelty. metric forms as the prior distributions. Specifically,
users. However, a 1 T T
q(θ) ∝ exp(− θ ΛθN θ + ηθN θ)
good recommender must balance exploring user 2
preferences and exploiting user ratings. In this work, 1 T T
music recommendation is formulated as a multi- q(β) ∝ exp(− β ΛβN β + ηβN β)
2
armed bandit problem. (a) Exact Model (b) Approximate Model q(τ ) ∝ τ aN −1exp(−bN τ )
Multi-Armed Bandit Approximate Model For optimization, we use the coordinate descent (a) Random (b) Greedy
A multi-armed bandit problem R | x, t, θ, β, σ 2 ∼ N (θ0xβ 0t, σ 2) method to minimize
is a sequential allocation prob- 2 2 KL( p(θ, β, τ |D) || q(θ)q(β)q(τ )) )
θ | σ ∼ N (µθ0, σ D0)
lem defined by a set of actions. 2 2
β | σ ∼ N (µβ0, σ E0)
At each time step, a unit re- Finally, since q(θ) and q(β) are normal distributions
source is allocated to an action τ = 1/σ 2 ∼ G(a0, b0) and linear combination of normal random variables (c) Lin-UCB (d) Bayes-UCB
and some observable payoff is is again a normal random variable, we obtain:
obtained. The goal is to maximize the total payoff Methods T
p(θ x|x, t, D) ≈ −1
T T −1
N (x ΛθN ηθN , x ΛθN x)
obtained in a sequence of allocations. [1] T T −1 T −1
p(β t|x, t, D) ≈ N (t ΛβN ηβN , x ΛβN x) Conclusion
Bayes-UCB and posterior distribution in Equation 2 can be cal-
The posterior distribution of the user parameters, culated as We have specifically investigated Bayes-UCB
Ω = {θ, s}, can be given as T T method for multi-armed bandit on music recommen-
p(U | x, t, D) = p(θ xβ t | x, t, D) =
U dation problem and compared it with other meth-
P (Ω | Dl) ∝ P (Ω)P (Dl | Ω) (1) T T
p(θ x = a | x, t, D)p(β t = | x, t, D)da ods. We observed that Bayes-UCB surpasses other
Z
a
l Greedy & ε-greedy methods in terms of cumulative rating and regret.
λk = p(Uk |Dl) = p(Uk |Ω)p(Ω|Dl)dΩ (2)
Z
Dataset
• Recommends song with highest expected rating.
where Dl = {(xi, ti, ri)}li=1. Bayes-UCB then rec- References
• Pure exploitation (Exploration with ε in
MSD genre dataset is used for song contents[2]. We ommends the song k ∗ that satisfies:
used a sample containing 6568 songs. ε-greedy)
∗ l [1] Sébastien Bubeck, Nicolo Cesa-Bianchi, et al.
k = arg max Q(α, λk ) • L-BFGS-B optimization with minimum MSE.
Features of songs are as follows: k=1...|S| Regret analysis of stochastic and nonstochastic multi-armed bandit
problems.
• genre where Q is the quantile (generalized inverse) func- Lin-UCB
• duration • time signature [2] Thierry Bertin-Mahieux, Daniel P.W. Ellis, Brian Whitman, and Paul
tion. Lamere.
• key • loudness • avg timbres • Assumes expected rating is a linear function of The million song dataset.
To make the algorithm more responsive, a highly
• mode • tempo • var timbres the features. [3] Lihong Li, Wei Chu, John Langford, and Robert E Schapire.
efficient variational inference algorithm is proposed A contextual-bandit approach to personalized news article
• Ridge regression with upper confidence bound.
for approximating the inference procedure. recommendation.
• Balances exploration and exploitation. [4] Xinxi Wang, Yi Wang, David Hsu, and Ye Wang.
Exploration in interactive personalized music recommendation: a
reinforcement learning approach.

Bayesian Bandit Solution for Music Recommendation

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Bayesian Bandit Solution for Music Recommendation

Enviado por

Direitos autorais:

Formatos disponíveis

Bayesian Based Bandit Solution For Music Recommendation

Introduction Model Methods Experiments

Você também pode gostar