Andrew Rosenberg - Lecture 20: Model Adaptation

Lecture
20: Model Adapta2on

Machine Learning April 15, 2010
Today
Adapta2on of Gaussian Mixture Models
Maximum A Posteriori (MAP) Maximum Likelihood Linear Regression (MLLR)
Applica2on: Speaker Recogni2on

UBM-MAP + SVM
The Problem
I have a liOle bit of labeled data, and a lot of unlabeled data. I can model the training data fairly well. But we always t training data beOer than tes2ng data. Can we use the wealth of unlabeled data to do beOer?
Lets use a GMM

GMMs to model labeled data. In simplest form, one mixture component per class.
Labeled training of GMM

MLE es2mators of parameters
t =
t
p(i|xt ) ni = N N T xt (xt )(xt ) i = nk
xt x t p(i|xt )xt i = = nk t p(i|xt )
Or these can be used to seed EM.
Adap2ng the mixtures to new data

Essen2ally, let EM start with MLE parameters as seeds. Expand the available data for EM, proceed un2l convergence
Adap2ng the mixtures to new data

Essen2ally, let EM start with MLE parameters as seeds. Expand the available data for EM, proceed un2l convergence
Problem with EM adapta2on

The ini2al labeled seeds could contribute very liOle to the nal model
One Problem with EM adapta2on

The ini2al labeled seeds could contribute very liOle to the nal model
MAP Adapta2on
Constrain the contribu2on of unlabeled data.
i =
i
p(i|xu ) + (1 i )i U
u p(i|xu )xu i = i + (1 i )i u p(i|xu ) T u p(i|xu )(xu i )(xu i ) i = i + (1 i )i U
Let the alpha terms dictate how much weight to give to the new, unlabeled data compared to the exi2ng es2mates.
MAP adapta2on
The movement of the parameters is constrained.
MLLR adapta2on
Another idea Maximum Likelihood Linear Regression. Apply an ane transforma2on to the means. Dont change the covariance matrices
= W
MLLR adapta2on
Another view on adapta2on. Apply an ane transforma2on to the means. Dont change the covariance matrices
= W
MLLR adapta2on
The new means are the MLE of the means with the new data.
x p(i|x, i , i , i )xi i = Wi i = x p(i|x, i , i , i )
MLLR adapta2on
x p(i|x, i , i , i )xi i = Wi i = x p(i|x, i , i , i )
MLLR adapta2on
i = Wi i Wi = = x p(i|x, i , i , i )xi p(i|x, i , i , i ) x x p(i|x, i , i , i )xi (1 )T x p(i|x, i , i , i )
Why MLLR?
We can 2e the transforma2on matrices of mixture components. For example:
You know that the red and green classes are similar Assump2on: Their transforma2ons should be similar
Why MLLR?
We can 2e the transforma2on matrices of mixture components. For example:
You know that the red and green classes are similar Assump2on: Their transforma2ons should be similar
Applica2on of Model Adapta2on

Speaker Recogni2on. Task: Given speech from a known set of speakers, iden2fy the speaker. Assume there is training data from each speaker. Approach:
Model a generic speaker. Iden2fy a speaker by its dierence from the generic speaker Measure this dierence by adapta2on parameters
Speech Representa2on
Extract a feature representa2on of speech. Samples every 10ms.
MFCC 16 dims
Similarity of sounds
MFCC2 /s/
/b/ /o/ /u/
MFCC1
Universal Background Model

If we had labeled phone informa2on that would be great. But its expensive, and 2me consuming. So just t a GMM to the MFCC representa2on of all of the speech you have.
Generally all but one example, but well come back to this.
MFCC ScaOer
MFCC2 /s/
/b/ /o/ /u/
MFCC1
UBM eng
MFCC2 /s/
/b/ /o/ /u/
MFCC1
MAP adapta2on
When we have a segment of speech to evaluate,
Generate MFCC features. Use MAP adapta2on on the UBM Gaussian Mixture Model.
MAP Adapta2on
MFCC2 /s/
/b/ /o/ /u/
MFCC1
MAP Adapta2on
MFCC2 /s/
/b/ /o/ /u/
MFCC1
UBM-MAP
Claim:
The dierences between speakers can be represented by the movement of the mixture components of the UBM.
How do we train this model?
UBM-MAP training
Supervector
Training Data UBM Training Supervector A vector of adapted means of the gaussian mixture components
Held out Speaker N
MAP
xi = 0
...
ti = Speaker ID
Train a supervised model with these labeled vectors.
UBM-MAP training
xi = 0 1 ... k
Training Data UBM Training Supervector
ti = Speaker ID
Mul2class SVM Training
Held out Speaker N
MAP
Repeat for all training data
UBM-MAP Evalua2on
UBM Supervector Mul2class SVM
Test Data
MAP
Predic2on
Alternate View
Do we need all this? What if we just train an SVM on labeled MFCC data?
Labeled Training Data Mul2class SVM Training Test Data Mul2class SVM
Predic2on
Results
UBM-MAP (with some variants) is the state-of- the-art in Speaker Recogni2on.
Current state of the art performance is about 97% accuracy (~2.5% EER) with a few minutes of speech.
Direct MFCC modeling performs about half as well ~5% EER.
Model Adapta2on
Adapta2on allows GMMs to be seeded with labeled data. Incorpora2on of unlabeled data gives a more robust model. Adapta2on process can be used to dieren2ate members of the popula2on
UBM-MAP
Next Time
Spectral Clustering

Andrew Rosenberg - Lecture 20: Model Adaptation

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Andrew Rosenberg - Lecture 20: Model Adaptation

Enviado por

Direitos autorais:

Formatos disponíveis

Lecture

20: Model Adapta2on

Applica2on: Speaker Recogni2on

Lets use a GMM

Labeled training of GMM

p(i|xt ) ni = N N T xt (xt )(xt ) i = nk

xt x t p(i|xt )xt i = = nk t p(i|xt )

Or these can be used to seed EM.

Adap2ng the mixtures to new data

Adap2ng the mixtures to new data

Problem with EM adapta2on

One Problem with EM adapta2on

u p(i|xu )xu i = i + (1 i )i u p(i|xu ) T u p(i|xu )(xu i )(xu i ) i = i + (1 i )i U

Applica2on of Model Adapta2on

/b/ /o/ /u/

Universal Background Model

/b/ /o/ /u/

/b/ /o/ /u/

/b/ /o/ /u/

/b/ /o/ /u/

How do we train this model?

Held out Speaker N

Train a supervised model with these labeled vectors.

Held out Speaker N

Repeat for all training data

Direct MFCC modeling performs about half as well ~5% EER.

Você também pode gostar