Você está na página 1de 3

HenryKissingerssentimentsarenotanexception!

WALIDS.SABA,PhD CIO,Pragmatech walid.saba@pragmatech.com http://goo.gl/cFwdJ

Background
IhavereceivedmanycommentsontheshortarticleHenryKissingervs.SentimentAnalysis(Saba, 2012http://goo.gl/5AOsc).Themainpointofthatarticlewasthatthesentencein(1)conveysan extremelypositivesentimentabouttheUnitedStates,althoughapurelyquantitative(statisticalor machinelearning)approachcannotmakethisinferenceduetothesurroundingnegativewordsinthe context. 1. The US is the worst place to live in, until you try living anywhere else Theargumentmadein(Saba,2012)wasthis:sentimentanalysispresupposesunderstandingordinary spokenlanguage,andisevenmuchharderasitrequiresunderstandingmetaphor,sarcasm,irony,etc. Sincewedonotyethavesystemsthatcanevenunderstandsimpleandordinaryspokenlanguage,there cannotbeanysystemthatdoesseriousandmeaningfulsentimentanalysis.Whilemostcomments receivedwereinagreement,thereweresomecommentsthatquestionedthepercentageofsuch examplesineverydaylanguageuse.These(noteworthy)commentscanbesummarizedasfollows: Ifthe(HenryKissinger,thecornertable,andtheiPhone)examplesIprovidedarenotanexception butactuallyconstitutealargepercentageofeverydaylanguageuse,thentheargumentthatthere cannotbesentimentanalysisbeforewehavefullnaturallanguageunderstandingisaccepted.If,on theotherhand,theseexamplesareasmallpercentageofeverydaylanguageuse,thenstatistical andmachinelearningapproachescanindeeddoadecentjobininferringpositiveandnegative sentimentintext. This,Iconfer,isasoundobservationthatisworthyofdiscussion. InwhatfollowsIhopetoshowthatexamplesinvolvingsarcasm,metonymy,irony,metaphor,etc.are notrareorexceptional,butareinfactquitecommonineverydaylanguageuse.Programsthatprocess andcomprehendsuchsentencesmustthereforehaveaccesstoamassiveamountofcommonsense knowledgeandwillneedtomakeverycomplexinferencestodecodewhatishiddenormissingfrom theutterancespeoplemake. Oncetheargumentismadethatsuchutterancesarenotanexceptionineverydaylanguageuse,butare infactquitecommon,thereremainsoneargumentthatcansupporttheclaimthattherearecurrently systemsthatcandomeaningfulsentimentanalysis:toprovethattherearesystemsthatcanunderstand onesimplequestionposedinordinaryspokenlanguage,whichisachallengethatIwillgladlyaccept.

NonLiteralMeaninginEverydayLanguageUse
Thesentencein(1),whichexpressesapositivesentimenttowardstheUS,despitetheapparentuseof negativeterminology,useswhatistechnicallyreferredtoinlinguisticsassarcasm.Otherexamples thatrequiredeepnaturallanguageunderstanding(involvingtheuseofcommonsenseandlogical reasoning)involvetheuseofmetonymy.Forexample,considerthefollowing,quitecommonandnot atallexoticandrareutterances: 2. I dont like Barcelona; Real Madrid was always my favorite 3. Americans do not like to talk about Vietnam, it brings back so much bad memories Clearly,the(veryordinaryandcommon!)utterancein(2)isnotanywayconveyinganegativesentiment aboutthecityofBarcelona,butthefootball(soccer!)teamofthatcity.Thespeakercouldverywelllike thecityofBarcelona,butithappensthathisfavoriteteamisRealMadrid.Similarlyfor(3).Thenegative contextsurroundingVietnamisnoindicationatallofAmericansentimenttowardsthecountry,buta particularevent,namelythelongandverybloodyVietnamWar. Thisuseofoneentitytorefertoanotherisquitecommon.Infact,someresearchersclaimthatthe percentageofmetonymyinordinarylanguageuseis(alone!)somewherebetween17%to20%(foran example,seeMarkertandNissim,2006).Thispercentageismuchhigherinotherformsoftext,suchas politicaltalk,poetry,thearts,etc.Thesameappliestosarcasmandirony,whichChin(2011)saysis practicallytheprimarylanguageinmodernsociety.Likemetonymy,sarcasmisquitecommonyetit presentssentimentanalysiswithamonumentalchallenge.Considertheseveryordinaryandquite commonutterances: 4. Yes, Porsche is too expensive, but lets face it, it is one hell of a car 5. Theres so much to curse and nag about in New York, but if I leave for one week, I miss it ThecontextsurroundingPorscheandNewYorkin(4)and(5)seemstobequitenegative,yetafull understandingthatreliesonbackgroundandcommonsenseknowledgewouldallowustomakethe rightinferencesthatPorscheisagreatcarand,despiteofallthenegatives,NewYorkcityhasalotto offer! Besidessarcasm,ironyandmetonymy,whichaloneaccounttomorethan30%ofordinarylanguageuse, metaphorisalsoquitecommonintext,amountingtomorethan50%ofordinarylanguageuse, accordingtosomeresearchers(see,forexample,LakoffandJohnson,1980;PalomabedaMansilla, 2003)1.Sentencesthatarefullofmetaphoricaluse,suchasthatin(6)arealsoquitecommonandare alsobeyondanyquantitativeandmachinelearningapproachestosentimentanalysis. 6. Man, look at that wild and crazy thing, she is a knockout!
1

Notethatsomeformsofmetonymyarespecialcasesofmetaphor,sothesetwosetsarenotmutuallyexclusive.

ConcludingRemarks
Tosummarize,studiesindicatethat,collectively,metonymy,metaphor,sarcasm,irony,andotherforms ofnonliteraluseofwordsinordinaryspokenlanguagearenotanexceptionbutthenorminordinary languageuse.Infact,ordinary,simpleandstraighttothepointlanguageistheexception(unlessthe targetaudiencewereyoungchildrenthathavenotyetmasteredtheinterpretationofsuchlanguage). Thisisespeciallythecasewhenitcomestosocialmediajargon,wheresarcasm,ironyandotherforms ofnonliterallanguageisthenorm.Assumingtheseformsoflanguageuseareanintegralpartof ordinarytext(say60%oflanguageuse),thenasentimentanalysissystemthathasastatisticaland machinelearningsystemwith80%accuracycanatbestmakearound60%accurateinferencesregarding sentiment2.Thisisonlyslightlybetterthanmakingrandompicksbetweenheads(negativesentiment), ortails(positivesentiment)! OnefinalwordregardingthissubjectLikewementionedinanearlierarticle,wearenotinanyway questioningthevalueofworkinnaturallanguageprocessing.Weourselvesareactivelyworkinginthis fieldandwehaverecentlydevelopedasemanticengine(http://ctrlsearch.com)thatisexcellentat inferringtheaboutnessofapieceoftext,producinganintelligentsummary,identifyingthekeytopics, extractingentitiesandinferringtheirtype,aswellasrelatingtextualobjectssemantically. Inferringtheaboutnessofapieceoftext,andsemantically/topicallyrelatingtext,canbedone,although ithasnotbeenperfected.However,sayingthattherearecurrentlysystemsthatcaninferfromwhatwe writehowwefeelaboutcertainentitiesisnotonlyinaccurate,butisharmful.Whenexpectationsare notmet,itwillnotbeeasytorecoverwhenthetimecomestodorealsentimentanalysis.

References
Chin,R.(2011),TheScienceofSarcasm?Yeah,Right,ScienceandNature,Nov.4,2011 Gibbs,R.W.andColston,H.L.(2001),TheRisksandRewardsofIronicCommunication,InL.Anolli,R. CiceriandG.Riva(Eds.),Newperspectivesonmiscommunication,IOSPress,2001 Lakoff,G.andJohnson,M.(1980),MetaphorsweLiveby, Markert,K.andNissim,M.(2006),MetonymicProperNames:ACorpusbasedAccount,InA. Stefanowitsch,editor,CorporainCognitiveLinguistics.Vol.1:MetaphorandMetonymy.Moutonde Gruyter,2006. PalomabedaMansilla(2003),Metaphoratwork:astudyofmetaphorsusedbyEuropeanarchitects whentalkingabouttheirprojects,IBRICA5 Saba,W,(2012).HenryKissingervs.SentimentAnalysis.September2012,SEOJournalist http://goo.gl/1eQqJ

Notethatwehaveignoredtheerrorsencounteredinentityextraction,wordsensedisambiguation,etc.Thatis, wehaveassumedthatthesefunctionshaveaperfectaccuracy,whichisobviouslynotthecase.

Você também pode gostar