Non-Negative Matrix Factorization

Non-Negative Matrix
Factorization
Problem Statement
Given a set of images:
1. Create a set of basis images that can be
linearly combined to create new images
2. Find the set of weights to reproduce
every input image from the basis images
• One set of weights for each input image
3 ways to do this discussed
• Vector Quantization
• Principal Components Analysis
• Non-negative Matrix Factorization
• Each method optimizes a different aspect

Vector Quantization
• The reconstructed image is the basis image
that is closest to the input image.
What’s wrong with VQ?
• Limited by the number of basis images
• Not very useful for analysis
PCA
• Find a set of orthogonal basis images
• The reconstructed image is a linear combination
of the basis images
What don’t we like about PCA?
• PCA involves adding up some basis
images then subtracting others
• Basis images aren’t physically intuitive
• Subtracting doesn’t make sense in context
of some applications
• How do you subtract a face?
• What does subtraction mean in the context of
document classification?
Non-negative Matrix Factorization
• Like PCA, except the coefficients in the linear
combination cannot be negative
NMF Basis Images
• Only allowing adding of basis images
makes intuitive sense
– Has physical analogue in neurons
• Forcing the reconstruction coefficients to
be positive leads to nice basis images
– To reconstruct images, all you can do is add
in more basis images
– This leads to basis images that represent
parts
PCA vs NMF
• PCA • NMF
– Designed for – Designed for
producing optimal (in producing coefficients
some sense) basis with a specific property
images – Forcing coefficients to
– Just because it’s behave induces “nice”
optimal doesn’t mean basis images
it’s good for your • No SI unit for “nice”
application
The cool idea
• By constraining the weights, we can
control how the basis images wind up
• In this case, constraining the weights

leads to “parts-based” basis images
Objective function
• Let the value of a pixel in an original input image be V. Let
(WH)iµ be the reconstructed pixel.
• If we consider V to be a noisy version of (WH)iµ , then the
Poisson PDF of V is
WH Vi e(W H) i
P(V | (WH )i ) 

V!
 k e 
f ( k;  )  ,
k
Objective function
WH 
V
i e
 (W H)i
P(V | (WH )i ) 

V!
• Now we will maximize the log probability of this PDF over
W and H, leaving the relevant objective function to be:
n m
F   V  l og( WH ) 

i 1
 1
i i
 ( WH )i  
Learning rules?
• Error function
V  WH
2
• Use gradient descent to find a local minimum

• The gradient descent update rule is:
H a  H a  a ( W TV ) a  ( W TWH ) a 

Deriving Update Rules
• Gradient Descent Rule:
H a  H a  a ( W TV ) a  ( W TWH ) a 
H a
• Set  
a
( W TWH ) a 
• The update rule becomes

( W TV ) a
H a  H a
( W WH ) a 
T
What’s significant about this?
• This is a multiplicative update instead of
an additive update.
– If the initial values of W and H are all non-
negative, then the W and H can never
become negative.
• This lets us produce a non-negative
factorization
Convergence
• If F is the objective function, let be G be an
auxiliary function
G( h, h)  F( h) , G( h, h)  F( h)
• If G is an auxiliary function of F, then F is non-

increasing under the update
ht 1  arg min G( h, ht )
h
Auxiliary Function
Convergence
• Let the auxiliary function be
1
F( h)  F( ht )  ( h  ht )T F( ht )  ( h  ht )T( W TW ) ( h  ht )
2
• Then the update is

h t 1
 h  K( h ) F( h )
t t 1 t
• Which results in the update rule:

( W TV ) a
H a  H a
( W TWH ) a 
Main Contributions
• Idea that representations which allow
negative weights do not make sense in
some applications
• A simple system for producing basis
images with non-negative weights
• Points out that this leads to basis images
that are based on parts
– A larger point here is that by constraining the
problem in new ways, we can induce nice
properties

Non-Negative Matrix Factorization

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Non-Negative Matrix Factorization

Enviado por

Direitos autorais:

Formatos disponíveis

Non-Negative Matrix

• Each method optimizes a different aspect

• In this case, constraining the weights

P(V | (WH )i ) 

P(V | (WH )i ) 

• Use gradient descent to find a local minimum

H a  H a  a ( W TV ) a  ( W TWH ) a 

• The update rule becomes

• If G is an auxiliary function of F, then F is non-

• Then the update is

• Which results in the update rule:

Você também pode gostar