Você está na página 1de 2

Home

NumericalExcelTutorial
MicroscopicPedestrianSimulation
KardiTeknomo'sTutorial
MicroPedSimFreeDownload
PersonalDevelopmentHandbook


Research LinearDiscriminantAnalysis(LDA)
Public a tions ByKardiTeknomo,PhD.

Tutor ia ls

R e s um e

Se r v ic e
<Previous|Next|Index>
R e s our c e s
Purpose
C onta c t
ThepurposeofDiscriminantAnalysisistoclassifyobjects(people,customers,things,etc.)
intooneoftwoormoregroupsbasedonasetoffeaturesthatdescribetheobjects(e.g.
gender,age,income,weight,preferencescore,etc.).Ingeneral,weassignanobjecttoone
ofanumberofpredeterminedgroupsbasedonobservationsmadeontheobject.

Notethatthegroupsareknownorpredeterminedanddonothaveorder(i.e.nominalscale).
Theclassificationproblemgivesseveralobjectswithasetfeaturesmeasuredfromthose
objects.Whatwearelookingforistwothings:

1.Whichsetoffeaturescanbestdeterminegroupmembershipoftheobject?
2.Whatistheclassificationruleormodeltobestseparatethosegroups?

(Checkthedifferenceofdiscriminantanalysisandclusteranalysis)

Thefirstpurposeisfeatureselectionandthesecondpurposeisclassification.Inthistutorial
wewillnotcoverthefirstpurpose(readerinterestedinthisstepwiseapproachcanuse
statisticalsoftwaresuchasSPSS,SASorstatisticalpackageofMatlab.However,wedocover
thesecondpurposetogettheruleofclassificationandpredictnewobjectbasedontherule.

LinearDiscriminantAnalysis

Forexample,wewanttoknowwhetherasoapproductisgoodorbadbasedonseveral
measurementsontheproductsuchasweight,volume,people'spreferentialscore,smell,
colorcontrastetc.Theobjecthereissoap.Theclasscategoryorthegroup("good"and"bad")
iswhatwearelookingfor(itisalsocalleddependentvariable).Eachmeasurementonthe
productiscalledfeaturesthatdescribetheobject(itisalsocalledindependentvariable).

Thus,indiscriminantanalysis,thedependentvariable(Y)isthegroupandtheindependent
variables(X)aretheobjectfeaturesthatmightdescribethegroup.Thedependentvariableis
alwayscategory(nominalscale)variablewhiletheindependentvariablescanbeany
measurementscale(i.e.nominal,ordinal,intervalorratio).

Ifwecanassumethatthegroupsarelinearlyseparable,wecanuselineardiscriminantmodel
(LDA).Linearlyseparablesuggeststhatthegroupscanbeseparatedbyalinearcombination
offeaturesthatdescribetheobjects.Ifonlytwofeatures,theseparatorsbetweenobjects
groupwillbecomelines.Ifthefeaturesarethree,theseparatorisaplaneandthenumberof
features(i.e.independentvariables)ismorethan3,theseparatorsbecomeahyperplane.

LDAFormula

Usingclassificationcriteriontominimizetotalerrorofclassification(TEC),wetendtomake
theproportionofobjectthatitmisclassifiesassmallaspossible.TECistheperformancerule
inthe'longrun'onarandomsampleofobjects.Thus,TECshouldbethoughtasthe
probabilitythattheruleunderconsiderationwillmisclassifyanobject.Theclassificationruleis
toassignanobjecttothegroupwithhighestconditionalprobability.Thisiscalled
BayesRule.ThisrulealsominimizestheTEC.Ifthereare groups,theBayes'ruleisto

assigntheobjecttogroup where .
Wewanttoknowtheprobability thatanobjectisbelongtogroup ,givenasetof

measurement .Inpracticehowever,thequantityof isdifficulttoobtain.Whatwe

cangetis .Thisistheprobabilityofgettingaparticularsetofmeasurement given


thattheobjectcomesfromgroup .Forexample,afterweknowthatthesoapisgoodorbad
thenwecanmeasuretheobject(weight,smell,coloretc.).Whatwewanttoknowisto
determinethegroupofthesoap(goodorbad)basedonthemeasurementonly.

Fortunately,thereisarelationshipbetweenthetwoconditionalprobabilitiesthatwellknown
asBayesTheorem:

Priorprobability isprobabilityaboutthegroup knownwithoutmakingany


measurement.Inpracticewecanassumethepriorprobabilityisequalforallgroupsorbased
onthenumberofsampleineachgroup.

Inpractice,however,tousetheBayesruledirectlyisunpracticalbecausetoobtain
needsomuchdatatogettherelativefrequenciesofeachgroupsforeachmeasurement.Itis
morepracticaltoassumethedistributionandgettheprobabilitytheoretically.Ifweassume
thateachgrouphasmultivariateNormaldistributionandallgroupshavethesamecovariance
matrix,wegetwhatiscalledLinearDiscriminantAnalysisformula:(see:Derivationofthis
formulahere)

Assignobject togroup thathasmaximum

Ifyounoticecarefullythesecondterm( )isactuallyMahalanobisdistance,whichis
distancetomeasuredissimilaritybetweenseveralgroups.

Anystandardtextbooksindatamining,patternrecognitionorclassificationcangiveyou
moredetailderivationofthisformula.Themeaningofeachvariableisexplainedinthenext
sectionofnumericalexample.

<Previous|Next|Index>

Thistutorialiscopyrighted.

Preferablereferenceforthistutorialis

Teknomo,Kardi(2015)DiscriminantAnalysisTutorial.http://people.revoledu.com/kardi/
tutorial/LDA/

2015KardiTeknomo.AllRightsReserved.

Você também pode gostar