Escolar Documentos
Profissional Documentos
Cultura Documentos
NumericalExcelTutorial
MicroscopicPedestrianSimulation
KardiTeknomo'sTutorial
MicroPedSimFreeDownload
PersonalDevelopmentHandbook
Research LinearDiscriminantAnalysis(LDA)
Public a tions ByKardiTeknomo,PhD.
Tutor ia ls
R e s um e
Se r v ic e
<Previous|Next|Index>
R e s our c e s
Purpose
C onta c t
ThepurposeofDiscriminantAnalysisistoclassifyobjects(people,customers,things,etc.)
intooneoftwoormoregroupsbasedonasetoffeaturesthatdescribetheobjects(e.g.
gender,age,income,weight,preferencescore,etc.).Ingeneral,weassignanobjecttoone
ofanumberofpredeterminedgroupsbasedonobservationsmadeontheobject.
Notethatthegroupsareknownorpredeterminedanddonothaveorder(i.e.nominalscale).
Theclassificationproblemgivesseveralobjectswithasetfeaturesmeasuredfromthose
objects.Whatwearelookingforistwothings:
1.Whichsetoffeaturescanbestdeterminegroupmembershipoftheobject?
2.Whatistheclassificationruleormodeltobestseparatethosegroups?
(Checkthedifferenceofdiscriminantanalysisandclusteranalysis)
Thefirstpurposeisfeatureselectionandthesecondpurposeisclassification.Inthistutorial
wewillnotcoverthefirstpurpose(readerinterestedinthisstepwiseapproachcanuse
statisticalsoftwaresuchasSPSS,SASorstatisticalpackageofMatlab.However,wedocover
thesecondpurposetogettheruleofclassificationandpredictnewobjectbasedontherule.
LinearDiscriminantAnalysis
Forexample,wewanttoknowwhetherasoapproductisgoodorbadbasedonseveral
measurementsontheproductsuchasweight,volume,people'spreferentialscore,smell,
colorcontrastetc.Theobjecthereissoap.Theclasscategoryorthegroup("good"and"bad")
iswhatwearelookingfor(itisalsocalleddependentvariable).Eachmeasurementonthe
productiscalledfeaturesthatdescribetheobject(itisalsocalledindependentvariable).
Thus,indiscriminantanalysis,thedependentvariable(Y)isthegroupandtheindependent
variables(X)aretheobjectfeaturesthatmightdescribethegroup.Thedependentvariableis
alwayscategory(nominalscale)variablewhiletheindependentvariablescanbeany
measurementscale(i.e.nominal,ordinal,intervalorratio).
Ifwecanassumethatthegroupsarelinearlyseparable,wecanuselineardiscriminantmodel
(LDA).Linearlyseparablesuggeststhatthegroupscanbeseparatedbyalinearcombination
offeaturesthatdescribetheobjects.Ifonlytwofeatures,theseparatorsbetweenobjects
groupwillbecomelines.Ifthefeaturesarethree,theseparatorisaplaneandthenumberof
features(i.e.independentvariables)ismorethan3,theseparatorsbecomeahyperplane.
LDAFormula
Usingclassificationcriteriontominimizetotalerrorofclassification(TEC),wetendtomake
theproportionofobjectthatitmisclassifiesassmallaspossible.TECistheperformancerule
inthe'longrun'onarandomsampleofobjects.Thus,TECshouldbethoughtasthe
probabilitythattheruleunderconsiderationwillmisclassifyanobject.Theclassificationruleis
toassignanobjecttothegroupwithhighestconditionalprobability.Thisiscalled
BayesRule.ThisrulealsominimizestheTEC.Ifthereare groups,theBayes'ruleisto
assigntheobjecttogroup where .
Wewanttoknowtheprobability thatanobjectisbelongtogroup ,givenasetof
Fortunately,thereisarelationshipbetweenthetwoconditionalprobabilitiesthatwellknown
asBayesTheorem:
Inpractice,however,tousetheBayesruledirectlyisunpracticalbecausetoobtain
needsomuchdatatogettherelativefrequenciesofeachgroupsforeachmeasurement.Itis
morepracticaltoassumethedistributionandgettheprobabilitytheoretically.Ifweassume
thateachgrouphasmultivariateNormaldistributionandallgroupshavethesamecovariance
matrix,wegetwhatiscalledLinearDiscriminantAnalysisformula:(see:Derivationofthis
formulahere)
Ifyounoticecarefullythesecondterm( )isactuallyMahalanobisdistance,whichis
distancetomeasuredissimilaritybetweenseveralgroups.
Anystandardtextbooksindatamining,patternrecognitionorclassificationcangiveyou
moredetailderivationofthisformula.Themeaningofeachvariableisexplainedinthenext
sectionofnumericalexample.
<Previous|Next|Index>
Thistutorialiscopyrighted.
Preferablereferenceforthistutorialis
Teknomo,Kardi(2015)DiscriminantAnalysisTutorial.http://people.revoledu.com/kardi/
tutorial/LDA/
2015KardiTeknomo.AllRightsReserved.