Você está na página 1de 21

Non-Negative Matrix

Factorization
Problem Statement
Given a set of images:
1. Create a set of basis images that can be
linearly combined to create new images
2. Find the set of weights to reproduce
every input image from the basis images
• One set of weights for each input image
3 ways to do this discussed
• Vector Quantization
• Principal Components Analysis
• Non-negative Matrix Factorization

• Each method optimizes a different aspect


Vector Quantization
• The reconstructed image is the basis image
that is closest to the input image.
What’s wrong with VQ?
• Limited by the number of basis images
• Not very useful for analysis
PCA
• Find a set of orthogonal basis images
• The reconstructed image is a linear combination
of the basis images
What don’t we like about PCA?
• PCA involves adding up some basis
images then subtracting others
• Basis images aren’t physically intuitive
• Subtracting doesn’t make sense in context
of some applications
• How do you subtract a face?
• What does subtraction mean in the context of
document classification?
Non-negative Matrix Factorization
• Like PCA, except the coefficients in the linear
combination cannot be negative
NMF Basis Images
• Only allowing adding of basis images
makes intuitive sense
– Has physical analogue in neurons
• Forcing the reconstruction coefficients to
be positive leads to nice basis images
– To reconstruct images, all you can do is add
in more basis images
– This leads to basis images that represent
parts
PCA vs NMF
• PCA • NMF
– Designed for – Designed for
producing optimal (in producing coefficients
some sense) basis with a specific property
images – Forcing coefficients to
– Just because it’s behave induces “nice”
optimal doesn’t mean basis images
it’s good for your • No SI unit for “nice”
application
The cool idea
• By constraining the weights, we can
control how the basis images wind up

• In this case, constraining the weights


leads to “parts-based” basis images
Objective function
• Let the value of a pixel in an original input image be V. Let
(WH)iµ be the reconstructed pixel.
• If we consider V to be a noisy version of (WH)iµ , then the
Poisson PDF of V is
WH Vi e(W H) i

P(V | (WH )i ) 


V!

 k e 
f ( k;  )  ,
k
Objective function

WH 
V
i e
 (W H)i

P(V | (WH )i ) 


V!
• Now we will maximize the log probability of this PDF over
W and H, leaving the relevant objective function to be:

n m
F   V  l og( WH ) 

i 1
 1
i i
 ( WH )i  
Learning rules?
• Error function
V  WH
2

• Use gradient descent to find a local minimum


• The gradient descent update rule is:

H a  H a  a ( W TV ) a  ( W TWH ) a 


Deriving Update Rules
• Gradient Descent Rule:
H a  H a  a ( W TV ) a  ( W TWH ) a 
H a
• Set  
a
( W TWH ) a 

• The update rule becomes


( W TV ) a
H a  H a
( W WH ) a 
T
What’s significant about this?
• This is a multiplicative update instead of
an additive update.
– If the initial values of W and H are all non-
negative, then the W and H can never
become negative.
• This lets us produce a non-negative
factorization
Convergence
• If F is the objective function, let be G be an
auxiliary function
G( h, h)  F( h) , G( h, h)  F( h)

• If G is an auxiliary function of F, then F is non-


increasing under the update

ht 1  arg min G( h, ht )
h
Auxiliary Function
Convergence
• Let the auxiliary function be
1
F( h)  F( ht )  ( h  ht )T F( ht )  ( h  ht )T( W TW ) ( h  ht )
2

• Then the update is


h t 1
 h  K( h ) F( h )
t t 1 t

• Which results in the update rule:


( W TV ) a
H a  H a
( W TWH ) a 
Main Contributions
• Idea that representations which allow
negative weights do not make sense in
some applications
• A simple system for producing basis
images with non-negative weights
• Points out that this leads to basis images
that are based on parts
– A larger point here is that by constraining the
problem in new ways, we can induce nice
properties

Você também pode gostar