Você está na página 1de 35

Lecture

20: Model Adapta2on


Machine Learning April 15, 2010

Today
Adapta2on of Gaussian Mixture Models
Maximum A Posteriori (MAP) Maximum Likelihood Linear Regression (MLLR)

Applica2on: Speaker Recogni2on


UBM-MAP + SVM

The Problem
I have a liOle bit of labeled data, and a lot of unlabeled data. I can model the training data fairly well. But we always t training data beOer than tes2ng data. Can we use the wealth of unlabeled data to do beOer?

Lets use a GMM


GMMs to model labeled data. In simplest form, one mixture component per class.

Labeled training of GMM


MLE es2mators of parameters
t =
t

p(i|xt ) ni = N N T xt (xt )(xt ) i = nk

xt x t p(i|xt )xt i = = nk t p(i|xt )

Or these can be used to seed EM.

Adap2ng the mixtures to new data


Essen2ally, let EM start with MLE parameters as seeds. Expand the available data for EM, proceed un2l convergence

Adap2ng the mixtures to new data


Essen2ally, let EM start with MLE parameters as seeds. Expand the available data for EM, proceed un2l convergence

Problem with EM adapta2on


The ini2al labeled seeds could contribute very liOle to the nal model

One Problem with EM adapta2on


The ini2al labeled seeds could contribute very liOle to the nal model

MAP Adapta2on
Constrain the contribu2on of unlabeled data.
i =
i

p(i|xu ) + (1 i )i U

u p(i|xu )xu i = i + (1 i )i u p(i|xu ) T u p(i|xu )(xu i )(xu i ) i = i + (1 i )i U

Let the alpha terms dictate how much weight to give to the new, unlabeled data compared to the exi2ng es2mates.

MAP adapta2on
The movement of the parameters is constrained.

MLLR adapta2on
Another idea Maximum Likelihood Linear Regression. Apply an ane transforma2on to the means. Dont change the covariance matrices
= W

MLLR adapta2on
Another view on adapta2on. Apply an ane transforma2on to the means. Dont change the covariance matrices
= W

MLLR adapta2on
The new means are the MLE of the means with the new data.
x p(i|x, i , i , i )xi i = Wi i = x p(i|x, i , i , i )

MLLR adapta2on
The new means are the MLE of the means with the new data.
x p(i|x, i , i , i )xi i = Wi i = x p(i|x, i , i , i )

MLLR adapta2on
The new means are the MLE of the means with the new data.
i = Wi i Wi = = x p(i|x, i , i , i )xi p(i|x, i , i , i ) x x p(i|x, i , i , i )xi (1 )T x p(i|x, i , i , i )

Why MLLR?
We can 2e the transforma2on matrices of mixture components. For example:
You know that the red and green classes are similar Assump2on: Their transforma2ons should be similar

Why MLLR?
We can 2e the transforma2on matrices of mixture components. For example:
You know that the red and green classes are similar Assump2on: Their transforma2ons should be similar

Applica2on of Model Adapta2on


Speaker Recogni2on. Task: Given speech from a known set of speakers, iden2fy the speaker. Assume there is training data from each speaker. Approach:
Model a generic speaker. Iden2fy a speaker by its dierence from the generic speaker Measure this dierence by adapta2on parameters

Speech Representa2on
Extract a feature representa2on of speech. Samples every 10ms.

MFCC 16 dims

Similarity of sounds
MFCC2 /s/

/b/ /o/ /u/

MFCC1

Universal Background Model


If we had labeled phone informa2on that would be great. But its expensive, and 2me consuming. So just t a GMM to the MFCC representa2on of all of the speech you have.
Generally all but one example, but well come back to this.

MFCC ScaOer
MFCC2 /s/

/b/ /o/ /u/

MFCC1

UBM eng
MFCC2 /s/

/b/ /o/ /u/

MFCC1

MAP adapta2on
When we have a segment of speech to evaluate,
Generate MFCC features. Use MAP adapta2on on the UBM Gaussian Mixture Model.

MAP Adapta2on
MFCC2 /s/

/b/ /o/ /u/

MFCC1

MAP Adapta2on
MFCC2 /s/

/b/ /o/ /u/

MFCC1

UBM-MAP
Claim:
The dierences between speakers can be represented by the movement of the mixture components of the UBM.

How do we train this model?

UBM-MAP training
Supervector
Training Data UBM Training Supervector A vector of adapted means of the gaussian mixture components

Held out Speaker N

MAP

xi = 0

...

ti = Speaker ID

Train a supervised model with these labeled vectors.

UBM-MAP training
xi = 0 1 ... k
Training Data UBM Training Supervector

ti = Speaker ID
Mul2class SVM Training

Held out Speaker N

MAP

Repeat for all training data

UBM-MAP Evalua2on
UBM Supervector Mul2class SVM

Test Data

MAP

Predic2on

Alternate View
Do we need all this? What if we just train an SVM on labeled MFCC data?
Labeled Training Data Mul2class SVM Training Test Data Mul2class SVM

Predic2on

Results
UBM-MAP (with some variants) is the state-of- the-art in Speaker Recogni2on.
Current state of the art performance is about 97% accuracy (~2.5% EER) with a few minutes of speech.

Direct MFCC modeling performs about half as well ~5% EER.

Model Adapta2on
Adapta2on allows GMMs to be seeded with labeled data. Incorpora2on of unlabeled data gives a more robust model. Adapta2on process can be used to dieren2ate members of the popula2on
UBM-MAP

Next Time
Spectral Clustering

Você também pode gostar