8 visualizações

Enviado por Aiz Dan

© All Rights Reserved

- Analysis of Pork Adulteration in Minced Mutton Using Electronic Nose of Metal Exide Sensors
- Principal Component Analysis i To
- Face Recognition Techniques
- 1407.7094v1
- Chain Code and Holistic Features Based OCR System for Printed Devanagari Script Using ANN and SVM
- Dialnet-TriplependulumModelForStudyingTheVibrationOfMultiD-3699640
- Blind Source Separation
- 2. Comparative Study.full
- -28sici-291099-1638-28199705-2913-3A3-3C153-3A-3Aaid-qre94-3E3.0.co-3B2-g
- Dual Regularized KISS Metric Learning for Person Reidentification
- Root
- OMAE2012-83348
- 3D Face Recognition Method Using 2DPCAEuclidean Distance Classification
- US Federal Reserve: continuous
- Principle Component Analysis
- Factor Analysis
- ReportSparse_v2
- Identification of Animal Fibers With Wavelet
- Computational Linear Algebra
- A PCAMDA Scheme for Hand Posture Recognition

Você está na página 1de 47

Reduction

Linear Discriminant

Analysis (LDA)

Aly A. Farag

Shireen Y. Elhabian

CVIP Lab

University of Louisville

www.cvip.uofl.edu

October 2, 2008

Outline

LDA objective

Recall PCA

Now LDA

LDA Two Classes

Counter example

LDA C Classes

Illustrative Example

Limitations of LDA

LDA Objective

The objective of LDA is to perform

dimensionality reduction

So what, PCA does this

class discriminatory information as possible.

OK, thats new, let dwell deeper

Recall PCA

In PCA, the main idea to re-express the available dataset to extract

the relevant information by reducing the redundancy and minimize

the noise.

We didnt care about whether this dataset represent features from one

or more classes, i.e. the discrimination power was not taken into

consideration while we were talking about PCA.

columns represent different data samples.

n sample vectors

We first started by subtracting the mean to have a zero mean dataset,

then we computed the covariance matrix Sx = XXT.

Eigen values and eigen vectors were then computed for Sx. Hence the

new basis vectors are those eigen vectors with highest eigen values,

where the number of those vectors was our choice.

Thus, using the new basis, we can project the dataset onto a less

dimensional space with more powerful data representation.

Now LDA

Consider a pattern classification problem, where we have C-

classes, e.g. seabass, tuna, salmon

Each class has Ni m-dimensional samples, where i = 1,2, , C.

Hence we have a set of m-dimensional samples {x1, x2,, xNi}

belong to class i.

Stacking these samples from different classes into one big fat

matrix X such that each column represents one sample.

We seek to obtain a transformation of X to Y through projecting

the samples in X onto a hyperplane with dimension C-1.

Lets see what does this mean?

LDA Two Classes

The two classes are not well

separated when projected onto Assume we have m-dimensional samples {x1,

this line

x2,, xN}, N1 of which belong to 1 and

N2 belong to 2.

the samples x onto a line (C-1 space, C = 2).

x1 w1

. .

y = wT x where x = and w =

. .

xm wm

This line succeeded in separating select the one that maximizes the

the two classes and in the separability of the scalars.

meantime reducing the

dimensionality of our problem

from two features (x1,x2) to only a

scalar value y.

LDA Two Classes

In order to find a good projection vector, we need to define a

measure of separation between the projections.

The mean vector of each class in x and y feature space is:

1 1 1

i =

Ni

x

xi

and ~

i =

Ni

yi

y=

Ni

x

wT

xi

1

= wT

Ni

x

xi

= wT

i

as our objective function

(

J ( w) = ~1 ~2 = wT 1 wT 2 = wT 1 2 )

LDA Two Classes

However, the distance between the projected means is not a very

good measure since it does not take into account the standard

deviation within the classes.

LDA Two Classes

The solution proposed by Fisher is to maximize a function that

represents the difference between the means, normalized by a

measure of the within-class variability, or the so-called scatter.

For each class we define the scatter, an equivalent of the

variance, as;

~

si 2 = ( y ~ )2

i

yi

~

si 2 measures the variability within class i after projecting it on

the y-space.

Thus ~

s12 + ~

s22 measures the variability within the two

classes at hand after projection, hence it is called within-class scatter

of the projected samples.

LDA Two Classes

The Fisher linear discriminant is defined as

the linear function wTx that maximizes the

criterion function:

2

~ ~

1 2

J ( w) = ~ 2 ~ 2

s +s

1 2

where examples from the same class are

projected very close to each other and, at the

same time, the projected means are as farther

apart as possible

LDA Two Classes

In order to find the optimum projection w*, we need to express

J(w) as an explicit function of w.

We will define a measure of the scatter in multivariate feature

space x which are denoted as scatter matrices;

Si = (x )(x )

T

i i

x i

S w = S1 + S 2

Where Si is the covariance matrix of class i, and Sw is called the

within-class scatter matrix.

LDA Two Classes

Now, the scatter of the projection y can then be expressed as a function of

the scatter matrix in feature space x.

~

( y i ) =

~

(w x w i )

2

si 2 =

2 T T

yi x i

= w ( x i )( x i ) w

T T

xi

= wT Si w

s1 + s2 = w S1w + w S 2 w = w (S1 + S 2 )w = w SW w = SW

~ 2 ~ 2 T T T T ~

LDA Two Classes

Similarly, the difference between the projected means (in y-space) can be

expressed in terms of the means in the original feature space (x-space).

( ~1 ~2 ) = (w

2 T

1 w 2

T

)

2

= w (1 2 )(1 2 ) w

T T

144 42444 3

SB

~

= w SB w = SB

T

vectors, while S~B is the between-class scatter of the projected samples y.

Since SB is the outer product of two vectors, its rank is at most one.

LDA Two Classes

We can finally express the Fisher criterion in terms of

SW and SB as:

2

~ ~

1 2 wT S B w

J ( w) = ~ 2 ~ 2 = T

s1 + s2 w SW w

means (encoded in the between-class scatter matrix)

normalized by a measure of the within-class scatter

matrix.

LDA Two Classes

To find the maximum of J(w), we differentiate and equate to

zero.

d d wT S B w

J ( w) = T = 0

dw dw w SW w

(

w SW w

T d

dw

) ( ) (

w SB w w SB w

T T d

dw

) (

wT SW w = 0 )

( ) ( )

wT SW w 2S B w wT S B w 2SW w = 0

Dividing by 2 wT SW w :

wT SW w wT S B w

T S B w T SW w = 0

w SW w w SW w

S B w J ( w) SW w = 0

SW1S B w J ( w) w = 0

LDA Two Classes

Solving the generalized eigen value problem

yields

T

SBw

= SW1 (1 2 )

w

w = arg max J ( w) = arg max T

*

w w w SW w

This is known as Fishers Linear Discriminant, although it is not a

discriminant but rather a specific choice of direction for the projection

of the data down to one dimension.

Using the same notation as PCA, the solution will be the eigen

1

vector(s) of S X = SW S B

LDA Two Classes - Example

Compute the Linear Discriminant projection for the following two-

dimensional dataset.

Samples for class 1 : X1=(x1,x2)={(4,2),(2,4),(2,3),(3,6),(4,4)}

10

5

x2

0

0 1 2 3 4 5 6 7 8 9 10

x1

LDA Two Classes - Example

The classes mean are :

1 1 4 2 2 3 4 3

1 =

N1

x1

x = + + + + =

5 2 4 3 6 4 3.8

1 1 9 6 9 8 10 8.4

2 =

N2

x 2

x = + + + + =

5 10 8 5 7 8 7.6

LDA Two Classes - Example

Covariance matrix of the first class:

2 2

4 3 2 3

S1 = (x )(x ) = +

T

1 1

x 1 2 3.8 4 3.8

2 2 2

2 3 3 3 4 3

+ + +

3 3.8 6 3.8 4 3.8

1 0.25

=

0.25 2.2

LDA Two Classes - Example

Covariance matrix of the second class:

2 2

9 8.4 6 8.4

S2 = (x )(x ) = +

T

2 2

x 2 10 7.6 8 7.6

2 2 2

9 8.4 8 8.4 10 8.4

+ + +

5 7.6 7 7.6 8 7.6

2.3 0.05

=

0.05 3.3

LDA Two Classes - Example

Within-class scatter matrix:

S w = S1 + S 2 = +

0.25 2.2 0.05 3.3

3.3 0.3

=

0.3 5.5

LDA Two Classes - Example

Between-class scatter matrix:

S B = (1 2 )(1 2 )

T

T

3 8.4 3 8.4

=

3.8 7.6 3.8 7.6

5.4

= ( 5.4 3.8)

3.8

29.16 20.52

=

20.52 14.44

LDA Two Classes - Example

The LDA projection is then obtained as the solution of the generalized eigen

value problem 1

SW S B w = w

SW1S B I = 0

1

3.3 0.3 29.16 20.52 1 0

= 0

0. 3 5. 5 20.52 14.44 0 1

0.3045 0.0166 29.16 20.52 1 0

= 0

0.0166 0.1827 20.52 14.44 0 1

9.2213 6.489

4.2339 2.9794

= (9.2213 )(2.9794 ) 6.489 4.2339 = 0

2 12.2007 = 0 ( 12.2007 ) = 0

1 = 0, 2 = 12.2007

LDA Two Classes - Example

Hence

9.2213 6.489 w1

w1 = 0{

4.2339 2.9794 1 w2

and

9.2213 6.489 w1

w2 = 12

14 .2

2007

4

3

4.2339 2.9794 2 w2

Thus;

0.5755 0.9088

w1 = and w2 = = w*

0.8178 0.4173

The optimal projection is the one that given maximum = J(w)

LDA Two Classes - Example

Or directly;

1

3.3 0.3 3 8.4

w = S (1 2 ) =

* 1

0.3 5.5 3.8 7.6

W

=

0.0166 0.1827 3.8

0.9088

=

0.4173

LDA - Projection

Classes PDF : using the LDA projection vector with the other eigen value = 8.8818e-016

0.35

The projection vector

corresponding to the

0.3

smallest eigen value

0.25

10 0.2

p(y|w )

i

9

0.15

8

7 0.1

0.05

x2

4

0

-4 -3 -2 -1 0 1 2 3 4 5 6

y

3

Using this vector leads to

2

bad separability

1 between the two classes

0

-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10

x1

LDA - Projection

Classes PDF : using the LDA projection vector with highest eigen value = 12.2007

0.4

corresponding to the 0.35

0.3

0.25

LDA projection vector with the highest eigen value = 12.2007

10

p(y|w )

i

0.2

9

0.15

8

7 0.1

6

0.05

x2

5

0

0 5 10 15

4 y

3

Using this vector leads to

2 good separability

1

between the two classes

0

0 1 2 3 4 5 6 7 8 9 10

x1

LDA C-Classes

Now, we have C-classes instead of just two.

We are now seeking (C-1) projections [y1, y2, , yC-1] by means

of (C-1) projection vectors wi.

wi can be arranged by columns into a projection matrix W =

[w1|w2||wC-1] such that:

yi = wiT x y =WTx

x1 y1

. .

where xm1 = , yC 11 =

. .

xm yC 1

and WmC 1 = [w1 | w2 | ... | wC 1 ]

LDA C-Classes

If we have n-feature vectors, we can stack them into one matrix

as follows;

Y =W X T

. . . . . . . .

where X mn = , YC 1n =

. . . . . . . .

1 n 1 n

xm xm2 . xm yC 1 yC2 1 . yC 1

and WmC 1 = [w1 | w2 | ... | wC 1 ]

LDA C-Classes

Example of two-dimensional

features (m = 2), with three

classes C = 3.

Recall the two classes case, the Sw

within-class scatter was computed as: 1

1

x2

S w = S1 + S 2

This can be generalized in the C-

3

classes case as:

C 2 S w3

SW = Si

i =1 Sw2

where Si = ( x i )( x i )T x1

xi

1

and i =

Ni

x

x

Ni : number of data samples

in class i.

i

LDA C-Classes

Example of two-dimensional

features (m = 2), with three

Recall the two classes case, the between- classes C = 3.

Sw

class scatter was computed as: 1

1

x2

S B = (1 2 )(1 2 )

T

between-class scatter with respect to the S B3 3

mean of all class as follows:

S B2

C 2 S w3

S B = N i (i )(i )

T

i =1

Sw2

1 1

= x = N i i

x1

where

N x N N: number of all data .

x

1

and i =

N i xi

x Ni : number of data samples

in class i.

LDA C-Classes

Similarly,

We can define the mean vectors for the projected samples y as:

1 1

~

i =

Ni

y

y

and ~

=

N

y

y

i

While the scatter matrices for the projected samples y will be:

C C

SW = Si = ( y ~i )( y ~i )

~ ~ T

i =1 i =1 yi

C

S B = N i (~i ~ )(~i ~ )

~ T

i =1

LDA C-Classes

Recall in two-classes case, we have expressed the scatter matrices of the

projected samples in terms of those of the original samples as:

~

SW = W T SW W

~

S B = W T S BW This still hold in C-classes case.

Recall that we are looking for a projection that maximizes the ratio of

between-class to within-class scatter.

Since the projection is no longer a scalar (it has C-1 dimensions), we then use

the determinant of the scatter matrices to obtain a scalar objective function:

~

SB W T S BW

J (W ) = ~ = T

SW W SW W

And we will seek the projection W* that maximizes this ratio.

LDA C-Classes

To find the maximum of J(W), we differentiate with respect to W and equate

to zero.

For C-classes case, we have C-1 projection vectors, hence the eigen value

problem can be generalized to the C-classes case as:

Thus, It can be shown that the optimal projection matrix W* is the one whose

columns are the eigenvectors corresponding to the largest eigen values of the

following generalized eigen value problem:

SW1S BW * = W *

where = J (W * ) = scalar and [

W * = w1* | w2* | ... | wC* 1 ]

Illustration 3 Classes

Lets generate a dataset for each

1

class to simulate the three x2

classes shown

For each class do the following,

3

Use the random number generator

to generate a uniform stream of 2

500 samples that follows U(0,1).

Using the Box-Muller approach,

convert the generated uniform x1

stream to N(0,1). Then use the method of eigen values

and eigen vectors to manipulate the

standard normal to have the required

mean vector and covariance matrix .

Estimate the mean and covariance

matrix of the resulted dataset.

Dataset Generation

By visual inspection of the figure, 1

x2

classes parameters (means and

covariance matrices can be given as

follows:

3

5

Overallmean =

5 2

3 2.5 7

1 = + , 2 = + , 3 = +

7 3.5 5 x1

5 1

S1 = Negative covariance to lead to data samples distributed along the y = -x line.

3 3

4 0

S2 = Zero covariance to lead to data samples distributed horizontally.

0 4

3.5 1

S3 = Positive covariance to lead to data samples distributed along the y = x line.

3 2.5

In Matlab

Its Working

1

x2

20

3

15 2

X - the second feature

10

x1

5

2

-5

-5 0 5 10 15 20

X - the first feature

1

Computing LDA Projection Vectors

Recall

C

SW = Si

i =1

Si = (x )(x )

T

where i i

x i

1

and i =

Ni

x

x i

C

S B = N i (i )(i )

T

i =1

1

S SB 1 1

N N

W

where = x = i i

N x x

1

and i =

N i xi

x

Lets visualize the projection vectors W

25

20

15

10

2

0

-5

-10

-15 -10 -5 0 5 10 15 20 25

X - the first feature

1

Projection y = WTx

Along first projection vector

Classes PDF : using the first projection vector with eigen value = 4508.2089

0.4

0.35

0.3

0.25

p(y|w )

i

0.2

0.15

0.1

0.05

0

-5 0 5 10 15 20 25

y

Projection y = WTx

Along second projection vector

Classes PDF : using the second projection vector with eigen value = 1878.8511

0.4

0.35

0.3

0.25

p(y|w )

i

0.2

0.15

0.1

0.05

0

-10 -5 0 5 10 15 20

y

Which is Better?!!!

Apparently, the projection vector that has the highest eigen

value provides higher discrimination power between classes

Classes PDF : using the first projection vector with eigen value = 4508.2089 Classes PDF : using the second projection vector with eigen value = 1878.8511

0.4 0.4

0.35 0.35

0.3 0.3

0.25 0.25

p(y|w i )

p(y|w )

i

0.2 0.2

0.15 0.15

0.1 0.1

0.05 0.05

0 0

-5 0 5 10 15 20 25 -10 -5 0 5 10 15 20

y y

PCA vs LDA

Limitations of LDA

LDA produces at most C-1 feature projections

If the classification error estimates establish that more features are needed, some other method must be

employed to provide those additional features

If the distributions are significantly non-Gaussian, the LDA projections will not be able to preserve any

complex structure of the data, which may be needed for classification.

Limitations of LDA

LDA will fail when the discriminatory information is not in the mean but

rather in the variance of the data

Thank You

- Analysis of Pork Adulteration in Minced Mutton Using Electronic Nose of Metal Exide SensorsEnviado porSuginoMarwoto
- Principal Component Analysis i ToEnviado porSuleyman Kale
- Face Recognition TechniquesEnviado porPintu Mandava
- 1407.7094v1Enviado porYsrrael Camero
- Chain Code and Holistic Features Based OCR System for Printed Devanagari Script Using ANN and SVMEnviado porAdam Hansen
- Dialnet-TriplependulumModelForStudyingTheVibrationOfMultiD-3699640Enviado porAyesha Tariq
- Blind Source SeparationEnviado porVu Hung Cuong
- 2. Comparative Study.fullEnviado porTJPRC Publications
- -28sici-291099-1638-28199705-2913-3A3-3C153-3A-3Aaid-qre94-3E3.0.co-3B2-gEnviado porLata Deshmukh
- Dual Regularized KISS Metric Learning for Person ReidentificationEnviado porEditor IJTSRD
- RootEnviado porMuhammad Fahad
- OMAE2012-83348Enviado porMvrnaidu Mithra
- 3D Face Recognition Method Using 2DPCAEuclidean Distance ClassificationEnviado porIDES
- Principle Component AnalysisEnviado porPrateek Gupta
- Factor AnalysisEnviado porGeorgeZio
- ReportSparse_v2Enviado porMario Peña
- US Federal Reserve: continuousEnviado porThe Fed
- Identification of Animal Fibers With WaveletEnviado porPaul Diaz
- Computational Linear AlgebraEnviado porUche Urch
- A PCAMDA Scheme for Hand Posture RecognitionEnviado porRekha Jayaprakash
- 10.1.1.78Enviado porKhairu Rijal
- COMPUSOFT, 3(1), 487-490.pdfEnviado porIjact Editor
- JURNAL MARKETINGEnviado porDian Octavia Handayani
- CRORR_4_28Enviado poranon_572063581
- Kravaris, 2012 - Advances and Selected Recent Developments in State and Parameter EstimationEnviado porlfilipe2010
- audit on auditingEnviado porGenevieve Miller
- Multi Variate Analysis SlidesEnviado porMiguelArceMonroy
- System Identification of Crack-damage in RcEnviado porJesus Caballero Valor
- State Space Formulation of MotionEnviado porEnrico Calcetto
- APS 04 Azimuth and ElevationEnviado poradd313199

- Homework 9Enviado porAiz Dan
- Homk10Enviado porAiz Dan
- Homk 12Enviado porAiz Dan
- Actuarial Mathematics-Chapter 3 SolutionsEnviado porAiz Dan
- ch3 solEnviado porXavier Rodriguez
- Pages from SAS_A00-211.pdfEnviado por65bangor
- Hansjoerg Albrecher Andreas Binder Volkmar LaEnviado porAiz Dan
- Eigen CalculationEnviado porAiz Dan
- SPSS Tutorial for BeginnersEnviado porKj Naveen

- Solucinario Serway 30Enviado porLeonidas Miano
- Golden Ratio InvestigationsEnviado porandreastatler
- CS_HY-KVIIT-_XII-2018-19Enviado porBala Elumalai
- IndexEnviado porrongsay
- A Fieldbus Simulator for Training PurposesEnviado porSalvador Lopez
- Topic 3 Oscilloscope and Signal GeneratorEnviado porChin
- Casio-TE100.pdfEnviado porAliang
- 21-RCCIREnviado poracademic19
- Lectura3 - Interpretación de variogramasEnviado porBeatriz Ramos
- 1ST DRAFT.docxEnviado pormerii
- A new metric for program sizeEnviado porJohn Michael Williams
- AdaptiveRFID HamidEnviado porYackeline Ternera Pertuz
- 11-63-5-PB (1)Enviado porAakasH Tivari
- Adjectives Formed With PrefixesEnviado porLasc Emilian
- AgitationEnviado porCiela Jane Geraldizo
- Inequality Among World Citizens: 1820–1992Enviado pormadworld26
- Demo Barrick Report con dronesEnviado porAlejandroFernández
- SamplePlanner2007 FINALEnviado porpradeepnagda
- 2012 10 30 ReInventing the Wheel Part1 R 5Enviado porNiku Marian
- Student So Lns 3Enviado porAnonymous sSUTGiq
- BJP Cross Sectional SurveyEnviado porGurvinder Kalra
- Flexible AC Transmission System Modelling and Control - Chapter ThreeEnviado porHugo Queiroz
- Repetitive Control TheoryEnviado porShri Kulkarni
- Flow Measurement.pdfEnviado porAnirban Banerjee
- PearsonsEnviado poralparselan
- History of the Emergence of Independent Bangladesh (Question)Enviado porSaidur Rahman Sid
- Lab01 Intake ManifoldEnviado porMatteo Torino
- El concepto de equilibrio en diferentes tradiciones económicasEnviado porelias
- Contoh Soal Latihan Simple Present Tense Pilihan GandaEnviado pordarmiati_09
- Understanding drawdownsEnviado porRodolpho Cammarosano de Lima