Você está na página 1de 28

lronuers of

CompuLauonal !ournallsm
Columbla !ournallsm School

Week 6: Pybrld lllLers
CcLober 10, 2014



lllLerlng CommenLs
1housands of commenLs, whaL are Lhe good" ones?
CommenL voung
roblem: pumng commenLs wlLh mosL voLes aL Lop
doesn'L work. Why?
8eddlL CommenL 8anklng (old)
up - down voLes plus ume decay

8eddlL CommenL 8anklng (new)
PypoLheucally, suppose all users voLed on Lhe
commenL, and v ouL of n up-voLed. 1hen we could sorL
by proporuon p = v/n of upvoLes.

n=16
v = 11
p = 11/16 = 0.6873
8eddlL CommenL 8anklng
AcLually, only n users ouL of n voLe, glvlng an observed
approxlmaLe proporuon p' = v'/n
n=3
v' = 1
p' = 1/3 = 0.333
8eddlL CommenL 8anklng
LlmlLed sampllng can rank voLes wrong when we don'L
have enough daLa.
!" $ %&'''
! $ %&()*+

!" $ %&*+
! $ %&,)*+

8andom error ln sampllng
lf we observe p' upvoLes from n random users,
whaL ls Lhe dlsLrlbuuon of Lhe Lrue proporuon p?
ulsLrlbuuon of p' when p=0.3
Condence lnLerval
Clven observed p', lnLerval LhaL Lrue p has a
probablllLy o of lylng lnslde.
8ank commenLs by lower bound
of condence lnLerval
p' = observed proporuon of upvoLes
n = how many people voLed
z
o
= how cerLaln do we wanL Lo be before we assume LhaL p' ls
close" Lo Lrue p
Analyuc soluuon for condence lnLerval, known as Wllson score"

user-lLem maLrlx
SLores raung" of each user for each lLem. Could also
be blnary varlable LhaL says wheLher user cllcked, llked,
sLarred, shared, purchased...
user-lLem maLrlx
no conLenL analysls. We know noLhlng abouL
whaL ls ln" each lLem.
1yplcally very sparse - a user hasn'L waLched
even 1 of all movles.
lllLerlng problem ls guesslng unknown" enLry
ln maLrlx. Plgh guessed values are Lhlngs user
would wanL Lo see.

lllLerlng process
Pow Lo guess unknown raung?
8aslc ldea: suggesL slmllar" lLems.

Slmllar lLems are raLed ln a slmllar way by many
dlerenL users.

8emember, raung" could be a cllck, a llke, a
purchase.
users who boughL A also boughL 8..."
users who cllcked A also cllcked 8..."
users who shared A also shared 8..."



Slmllar lLems
lLem slmllarlLy
Coslne slmllarlLy!
CLher dlsLance measures
ad[usLed coslne slmllarlLy"
SubLracLs average raung for each user, Lo compensaLe
for general enLhuslasm (mosL movles suck" vs. mosL
movles are greaL")
Ceneraung a recommendauon
WelghLed average of lLem raungs by Lhelr slmllarlLy.
ulerenL lllLerlng SysLems
ure algorlLhmlc: newsblasLer analyze Lhe Loplcs ln Lhe
documenLs. no concepL of users.

ure soclal: WhaL l see on 1wluer deLermlned by who l
follow. no conLenL analysls.

Pybrld: 8eddlL commenLs lLered by an algorlLhm LhaL
Lakes voLes as lnpuL.

Pybrld: lLems recommended based co-consumpuon by all
users.

WhaL else ls posslble?
lLem ConLenL My uaLa CLher users' uaLa
1exL analysls, Loplc
modellng, clusLerlng...
who l follow
whaL l've read/llked
soclal neLwork sLrucLure,
oLher users' llkes
Pow Lo evaluaLe/opumlze Lhe lLer?
Pow Lo evaluaLe/opumlze Lhe lLer?
neullx: Lry Lo predlcL Lhe raung LhaL Lhe user
glves a movle aer waLchlng lL.
Amazon: sell more sLu.
Coogle web search: human raLers A/8 LesL
every change
Pow Lo evaluaLe/opumlze Lhe lLer?
uoes Lhe user undersLand how Lhe lLer
works?
Can Lhey congure lL as deslred?
Can Lhey correcLly predlcL whaL Lhey wlll and
won'L see?
Pow Lo evaluaLe/opumlze Lhe lLer?
Can lL be gamed? Spam, "user-generaLed
censorshlp," eLc.
8on aul "LlberLy8oL" on 8eddlL
lllLer deslgn problem
lormally, glven
u = user preferences, hlsLory, characLerlsucs
S = currenL sLory
[} = resulLs of funcuon on prevlous sLorles
[8} = background world knowledge (oLher users?)

uene
r(S,u,[},[8}) ln [0...1]

relevance of sLory S Lo user u
lllLer deslgn problem, resLaLed
When should a user see a sLory?

AspecLs Lo Lhls quesuon:
-./01234
personal: whaL l wanL
socleLal: emergenL group eecLs
56
how do l Lell Lhe compuLer l wanL?
7489-:81;
consLralned by algorlLhmlc posslblllLy
48.-.0:8
cheap enough Lo deploy wldely
Pow Lo evaluaLe/opumlze Lhe lLer?
uoes lL lmprove Lhe user's llfe?

Você também pode gostar