Você está na página 1de 12

Prae1

LiteratureReviewandGeneralObservationofRecentResearchintheEmergingFieldof
SentimentAnalysis

ByPaulPrae
October5
th
,2010

Prae2

Therecentdataexplosionhasspawnedanincredibleincreaseininnovation.Whilemanynew
fieldsareemerging,manyoldfieldshavebeenredefined.Theinternetisthecatalystforthesechanges.
Thismassivenetworkholdsthedatathatsomeofthesenewfieldsarefocusedonleveraging.Muchof
thisdataisorganizedandretrievedthroughmethodsthatfocusondefinitionsandcontext.However,
thesemethodsleaveoutoneofthemostimportantaspectsofthecreatorsofthisdata:emotion.The
emotionalsubjectivityofhumanbeingsdrivesthechoiceswemake.Aconceptthatinvolvestheuseof
thisdigitaldataincreaseandtheemotionsoftheusersandcreatorsofthedataistheareaofsentiment
analysis.Thispaperwillcoverthegeneralconceptsbehindsentimentanalysisandtheusesofthe
conceptincurrentsociety.Itwillalsofocusonareasinvolvingthebenefitsofsentimentanalysisfor
corporationsandconsumers.
Sentimentanalysisisanewerfieldthathasonlyrecentlytraversedfromtheacademicrealmto
corporateuse.Muchofthecurrentpublishedresearchonthesubjectwasdevelopedbyresearch
facilitiesstronglyassociatedwithcompaniessuchasIBM.Thesentimentdetectionoftextshas
witnessedaboominginterestinrecentyears(Tangetal.,2009)with[t]heemergenceofnewsocial
mediasuchasblogs,messageboards,news,andwebcontentdramaticallychangingtheecosystemsof
corporations(Caietal.,2010).Theacademiccontributorstothesubjecthavecombinedmanyspecific
areasoflinguistics,computerscience,artificialintelligence,andpsychology.Morespecificallyitisa
disciplineatthecrossroadsofNLP[naturallanguageprocessing]andIR[informationretrieval],andas
suchitsharesanumberofcharacteristicswithothertaskssuchasinformationextractionand
textmining"(Tangetal.,2009).Machinelearningtechniques,basicstatisticalanalysis,andlinguistic
semanticrepresentationarealsowellrepresentedinthedesignsofthefield.Aswithmanynewfields,
sentimentanalysisisacombinationofafewnovelconceptsreappliedtoawiderangeofspecific

Prae3

aspectsofotherolderfields.
Sentimentanalysisisasystemoftechniquesthatareorganizedandapplieddifferentlydepending
onthedesigner.Beforelookingathowscientistsanddevelopersarecurrentlysearchingforsentimentin
text,itisbesttounderstandwheretheysearchandwhy.Theinternetisaneverexpandingsearch
space.Searchingandanalyzingallpossiblesourcesofrelevantinformationwouldbeenormously
complex.Companies,scientists,andsoftwaredevelopersmustchooseasubsetofthismassivesearch
spacetoapplytheirsoftware.Itisimportantthatasearchspaceischosenthatwillhavethehighest
concentrationofeasilyaccessiblerelevantdata.Thispaperwilldiscusssomeoftheproblemsthathighly
unstructuredtextandnoisy,uselessorirrelevanttext,cancause.ThefollowinggraphfromAltaPlanas
TextAnalytics2009researchstudy,whichsurveyed116companiesthatusetextanalyticssoftware,
listssomeofthetopareasthatcompaniesuseasthesourceofthetext.Noticethatcontentgenerated
bygeneraluserdiscussioninopensocialsettingsdominatesthelist.

TheimportanceofthedatageneratedbytheWeb2.0phenomenaisreadilyapparent.Cai

Prae4

(2010)describesthisimportance,"Thewidespreadavailabilityofconsumergeneratedmedia(CGM)
suchasblogs,messageboards,andnewsarticlespostgreatopportunitiesaswellasriskstotodays
enterprises."Asof2009companieshavealreadybeenapplyingthisrealization.Thecomplexityissueis
stillrelevantevenwhennarrowingthesearchspacetoasinglesourceofinformation.Facebookisa
goodexampleofanextraordinarilypopularsocialmediaplatformthatgeneratesalargeamountoftext
thatcouldbeanalyzedthroughitsAPI.Thesearchspacehereinvolves[m]orethan500millionactive
users,over900million[facebookspecific]objects(pages,groups,eventsandcommunitypages),
andthe[m]orethan30billionpiecesofcontent(weblinks,newsstories,blogposts,notes,photo
albums,etc.)sharedeachmonth(http://www.facebook.com/press/info.php?statistics,2010).The
usefuldataisjustasplentifulastheirrelevant.Thereareendlessamountsofbothbeingproducedin
outletsacrosstheinternet.Itistherelevantsubjectivehumanopinionthatisarichandusefulsourcefor
marketingintelligence,socialpsychologists,andothersinterestedinextractingandminingopinions,
views,moods,andattitudes(Tangetal.,2009).Withthisinformationsentimentanalysiscanbegin.
Thechallengethatexistsafterthesearchspaceisestablishedistolocatetherelevantdata.After
therelevantdataisestablisheditcanthenbeassessedforsentiment.Thesetwostagesarecommonly
referredtoassubjectivityclassificationandsentimentclassification."Subjectivityclassificationisatask
toinvestigatewhetheraparagraphpresentstheopinionofitsauthororreportsfactsSubjectivity
classificationcanpreventthepolarity[i.e.sentiment]classifierfromconsideringirrelevantoreven
potentiallymisleadingtext"(Tangetal.,2009).Dependingontheapplication,contextualmatchingor
similarmaybeappliedtotheresultingdatathatisalreadydeemedsubjective.Guaranteeingthatthe
sectionsoftheoriginaldocumentthatareextractedarecontextualensuresthatthetopicsbeing
discussedinthetextarethosethatareimportanttotheresultsthedesignerisexamining.Thisconceptis

Prae5

commoninautomatedadvertisingdisplays"Contextualadvertisingisamajortypeofonlineadvertising,
inwhichadsareplacedonWebpagesaccordingtotheircontent"(Qiuetal.,2010).Aftertheprocess
hasnarrowedtheinitialdatadowntotherelevantsnippets,theapplicationofsentimentcanbegin.
Sentimentclassificationhassomevariationamongdesignersofeachapproachbutultimately
servesthesameabstractpurpose."Sentimentanalysistraditionallyemphasizesonclassificationofweb
commentsintopositive,neutral,andnegativecategories(Caietal.,2010).Thereareseveralvariations
ofthistradition.Amorecommontrendinrecentresearchistogetmorespecificindefiningthe
sentimentspectrum."Sentimentclassificationincludestwokindsofclassificationforms,i.e.,binary
sentimentclassificationandmulticlasssentimentclassification"(Tangetal.,2009).Thismulticlass
sentimentapproachwilllikelybethestandardofthefuture.Humanemotionspansamuchmore
complicatedspectrumthanthesimpleblackandwhitenotionsofpositiveandnegative.Humanbeings
havethestrangecapabilitytoloveandtohatesomethingatthesametime.Takethissimulatedexample
thatImayhearfromaroommatethatisanewuserwhojustpurchasedarecentvideogame:Ihate
thatIamnotacquiringthesamekilltodeathratiointhenewCallofDuty.Thenewuserinterfaceis
quitefrustrating.Ilovethechallengethough.ItwillbefuntolearnanewUI.Heretheuserportrays
negativeandpositivesentimentsonthesameproduct.Thisiseasyforhumanstodecipherbutmuch
morecomplicatedforamachine.Thisandmanyotherproblemsarebeingaddressedincurrent
research.
Afewdifferentapproacheshavebeendevelopedtocreatemoreaccurateresults.General
polaritybasedsentimentclassificationisagreatstepforwardfromthepreviouscontextualonly
approaches.Cai(2010)mentionsthat[s]uchanalysisisuseful,butitlacksinsightsonthedrivers
behindthesentiments.Hisgroupdevelopedabettersolution:Toaddressthisproblem,weintroduce

Prae6

oursentimentanalysisapproachwhichcombinesauniquesentimentclassificationapproachwithatopic
detectionapproachthatdiscoverstermsthatarehighlycorrelatedtodifferentsentimentclassification
categories.Thisallowsresultsthatcatertotheoriginalreasonsforthegivensentiment.Therearemore
elaboratedesignsthatbreakdownthecontentintogreaterdetailallowingformoreresultsthataremore
specific.
Afterthesentimentsareestablishedeachsentimentanalysissystemwillthenusetheresultsin
waysappropriatetotheapplication.Qiu(2010)developedanideatitledDissatisfactionoriented
AdvertisingSentimentAnalysisorDASAthatcombinestraditionalsentimentanalysiswithbasic
keywordmatching.Inthisapproachthesoftwaredetectsthenegativesentimentofcertainproducts.The
advertisingonthewebpagethatcontainsthetextthendisplaysaproductthathasthepositiveattributes
thattheoriginaltextcomplainedabout.TheexampleusedinQius(2010)paperisoneinwhichthe
writerontheforumcomplainsaboutthesafetyofacar.Afterthecommentispostedandanewuser
loadstheforumpage,theadvertisementsarereestablishedbasedonthenewcomment.Thenew
advertisementsnowhaveaVolvoadthatexemplifiesnewsafetyfeaturesandahistoryofsafe
productionstandards.Thisprocessisshowninthefollowingdiagramfromthesameresearchpaper.

Theusesofsentimentanalysiscanbeappliedtomanyindustries.Anycompanyunderthe

Prae7

scrutinyofpublicopinionshouldbeanalyzingallrelevantdatatheycanobtain.AsNickBiltonofthe
NewYorkTimesmentions,Whenpeoplewanttoknowhowthemediabusinesswilldealwiththe
internet,thebestwaytobegintounderstandthesweepingchangesistorecognizethattheconsumerof
entertainmentandinformationisnowinthecenter."Currentapplicationstakethisintoaccountandfocus
onthesubjectiveuserorconsumerviewsofcertainareasthattheenterpriseswillgenerallybeinterested
insurveying.Themostpopularandbasicuseofsentimentanalysisinvolvesminingtextofwritten
reviewsfromcustomersforcertainproductsorservices,andclassifyingthereviewsintopositiveor
negativeopinions"(Yeetal.,2009).Itisthistypeofclassificationthathasbecomeoneofthefociof
recentresearchendeavorssponsoredbycompaniesthatrealizethepotentialvalueofsentimentanalysis
ontheirdata(Yeetal.,2009).Companieswithaheavyonlinepresencehaveamyriadofdatathat
couldeasilyutilizethisresearch.
Thesesamecompaniescanchoosetousetextanalyticsoftwareindifferentwaystomeet
differentgoals.AnothergraphfromAltaPlanasTextAnalytics2009researchstudyshowsthewide
arrayofendgoalsthatcompaniesmaybelookingtomeetwhenusingtextanalyticsoftware.

Prae8

Thehighestusepercentageshownaboveinvolvesbrandingandreputationmanagement.Most
applicationsofsentimentanalysisinrecentresearchrepresentasimilartrend.Thetechnologies
surroundingtextanalyticswillbedesiredbymanyindustriesandfordifferentapplicationsineach
industry.Takingthisintoconsideration,differentalgorithms,techniques,andsometimesjustsmall
alterationswillberequiredbeforesentimentanalysissoftwarefromoneindustrywillbeabletobe
appliedtoanother.Thisalsomayforeshadowthatthetextanalyticssoftwareindustrymaybeableto
createlucrativeconsultingfirmssimilartothosethatarecurrentlyfaringwellinthegeneralmanagement
informationsystemssector.
Themassiveinformationsystemsthatcorporationsalreadyhavecouldintegrateaspectsof
existingprocesseswithsentimentanalysis.Newlyrefinedsystemscouldextendthecapabilitiesofsearch
engines,classifyreviews,summarizereviews,trackopinionsinonlinediscussions,analyzesurvey

Prae9

responses,implementonlinemessagesentimentfiltering,createemailmessageclassificationsystems,
andmanymoreyettobediscoveredtechniques(Tangetal.,2009).Thismayresultinmoreefficient
communicationforthepublicrelationsdepartmentsandbetterproductscreatedbythedevelopment
teams.Companieswillbeabletonavigatethroughallavailabledataandfindcomparisonsofspecific
productfeaturesfromcompetitors."Foraproductmanufacturer,thecomparisonenablesittoeasily
gathermarketingintelligenceandproductbenchmarkinginformation"(Tangetal.,2009).Sentiment
analysiswillallowbusinessestheabilitytousetheirpreexistingtextdatainwaystobenefitseveral
departmentswithinthetraditionalbusinessstructure.Businessesonlyrequirenewsoftwareplusthe
necessaryhardwaretohandlethenewprocessingtechniquesandstorageoftheresults.
Marketingcompaniesandadvertisingbranchesofbusinessesareeasybenefactorsofthe
resultingconclusionsderivedfromsentimentanalysis.Majorsearchenginesandemailhostssuchas
YahooandGoogle,aswellassocialmediacompaniessuchasFacebook,havebeenimplementing
contextuallyrelevantadvertisingtousersforyearsnow.Thecurrentweblandscapedemandsrelevancy
andpersonalizedinformationforusersandpotentialconsumers."Thetradeoffbetweenfinancialrevenue
andmarketsharetriggerstheemergenceofrelevantadvertisingtoemphasizetherelevancebetweenads
andWebpagesforthesakeofconsumers(Quietal.,2010).Qui(2010)goesontomentionthat
"[t]argetedadvertisingisofgreatimportanceforinternetcompaniestogainrevenuefromboth
advertisersandconsumers.Previousapproachesfocusonlyonthetopicalrelevancewhilethe
consumersattitudesareignored.Theseapproachesfailtomeettheactualneedsofconsumers
especiallywhentheymayhavenegativeattitudestowardsthementionedtopics."Cai(2010)addsthata
companysresistancetothesenewtrendscouldhaveseriousimpactontheircompetitivemarket
advantages.Leveragingthemassiveamountofdatathatisproducedbytheconsumervoicecould

Prae10

catalyzethegrowthofacompany.Theopposingdangertothisconceptisthatignoringthevoicesofthe
everincreasingamountofpublicopinioncouldresultinacompanybeingsociallyoutcast.Itistothe
purebenefitofcompaniestoimplementsentimentanalysisifthesecompanieshavetherelevant
informationavailableforsuchaprocess.Thebrandingandmarketingaspectsofbusinessesrevolve
aroundtheconsumerpsychology.Sentimentanalysiscouldrevealthispsychologyinaformthatcould
beusedforfurtheranalysisandstudy.
Itisimportanttonoticethattheimplementationsofsuchtechnologyonthebusinesssidehave
mutuallybeneficialeffectsfortheconsumer.Dependingontheindustryandthemannerinwhich
sentimentanalysisisbeingapplied,asystemforpresentingtheresultsandorganizedconclusionsfrom
theanalysiscouldbecreated.Ye(2009)mentionsarelationshiphere,Withtheresultsofsentiment
classification,consumerswouldknowthenecessaryinformationtodeterminewhichproductsto
purchaseandsellerswouldknowtheresponsefromtheircustomersandtheperformancesoftheir
competitors.Itthenturnsintoacyclicalsystemthatshouldresultinhigherqualityproductsovertime.It
isanefficientwaytocrowdsourceusefuldatawithouttheusersputtingforthanyextraeffort.Theusers
couldevenbeunawarethattheyareimprovingtheirfutureshoppingexperiences.Theusersand
creatorsofthetexttobeanalyzedwillsilentlybebenefitingtwopartieswhileexpressingtheirnatural
opinions.
Sentimentanalysisisausefultoolforallusersoftheinternet.Emotionalclassificationand
organizationofcontentwillbeabeneficialcontributiontothevastreservoirofdatatheinternetholds.
Thefieldhasmadesteadyachievementoverthelasttwentyyearsbutstillhasmuchroomtogrowand
improve.Thisisanexcitingpursuitforthoseinvolved.Thecompaniesandresearcherssupportingthe
improvementofsentimentanalysiswillbecontributingtoanimprovedenvironmentforallusers.Users

Prae11

shouldenjoycommunicatingwithamachinethatunderstandstheemotionalneedsoftheusersandcan
offereffectivesolutionstotheusersproblems.Thisisthebasicresultofsentimentanalysiseveninthe
currentform.Itwillonlyevolvetolearnhowtomeetourneedsmoreeffectively.

Prae12

Listofreferences
Bilton,N.(2010,September13).ATechWorldthatCenterontheUser.NewYorkTimes:New
YorkEdition.p.B1.
Cai,K.,Spangler,S.,Chen,Y.,&Zhang,Li.(2010).Leveragingsentimentanalysisfortopicdetection.
WebIntelligenceandAgentSystems:AnInternationalJournal,8(2010),291302.
Grimes,S.(2009).TextAnalytics2009:UserPerspectivesonSolutionsandProviders.AltaPlana.
PublishedundertheCreativeCommonsAttribution3.0License.
Kho,N.D.(2010).Customerexperienceandsentimentanalysis.KMWorld,February2010,1020.
Li,N.,Liang,X.,Li,X.,Wang,C.,&Wu,D.(2009).NetworkEnvironmentandFinancialRiskUsing
MachineLearningandSentimentAnalysis.HumanandEcologicalRiskAssessment,15,
227252.
Qiu,G.,He,X.,Zhang,F.,Shi,Y.,Bu,J.,&Chen,C.(2010).DASA:Dissatisfactionoriented
AdvertisingbasedonSentimentAnalysis.ExpertSystemswithApplications,37(2010),
61826191.
Tang,H.,Tan,S.,&Cheng,X.(2009).Asurveyonsentimentdetectionofreviews.ExpertSystems
withApplications,36(2009),1076010773.
Ye,Q.,Zhang,Z.,&Law,R.(2009).Sentimentclassificationofonlinereviewstotraveldestinationsby
supervisedmachinelearningapproaches.ExpertSystemswithApplications,36(2009),
65276535.

Você também pode gostar