Automatic Classification and Summarization

AUTOMATIC CLASSIFICATION AND SUMMARIZATION
OF SENTIMENT IN DOCUMENTS
By
Kiran Sarvabhotla
200402038
A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE
REQUIREMENTS FOR THE DEGREE OF
Master of Science (by Research)
in
Computer Science & Engineering
Search and Information Extraction Lab
Language Technologies Research Center
International Institute of Information Technology
Hyderabad, India
May 2010
Copyright c 2010 Kiran Sarvabhotla
All Rights Reserved
Dedicated to my family and friends.
INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY
Hyderabad, India
CERTIFICATE
It is certied that the work contained in this thesis, titled Automatic Classica-
tion and Summarization of Sentiment in Documents by Kiran Sarvabhotla
(200402038) submitted in partial fulllment for the award of the degree of Master
of Science (by Research) in Computer Science & Engineering, has been carried out
under my supervision and it is not submitted elsewhere for a degree.
Date Advisor :
Dr. Vasudeva Varma
Associate Professor
IIIT, Hyderabad
Acknowledgements
I would like to thank my advisor Dr.Vasudeva Varma for his continual guidance
during my masters degree. He gave me the freedom to explore on different topics
and helped me to zero down to one of the hot topics of research. His sincere efforts
and valuable comments are key factors for my work to get published in one of the
reputed journals in IR. Also, I would like to thank him for his support in writing my
masters thesis.
I would like to express my sincere gratitude to Dr. Prasad Pingali, who was the
rst person I met in the IE Lab. His passion towards research motivated a happy go
lucky guy like me towards research. I would not have been what I am today, if I
had not met him. My special thanks to Mr. Babji, a guy who never says no, for his
help ranging from providing infrastructure, eatables and some tness tips.
Special thanks to my friend P V Sai Krishna, who made me to come to IE lab
and motivated me in very troubled times. My special thanks to K.P Raja Sekhar,
my rst project partner. I would like to thank Surya Ganesh, Kranthi Reddy, Rohit
Bharadwaj and Swathi who helped me during the early stages of my research.
I take this chance to thank all my batch mates for making my life in IIIT so
enjoyable and memorable. I always relish the fun we had in IIIT through out my
life.
I would like to thank my father, Lakshmi Narasimham, my mother, Ramalak-
shmi, and my sister Sravani for their love and support.
Last but not the least, I thank my dear friend Vijaya Kumari, for her support and
motivation by constantly reminding me of my capabilities.
Abstract
Todays World Wide Web has become a major source of information for the peo-
ple. With the advent of customer reviews, blogs and growth of e-commerce in this
decade, user-generated-content has grown rapidly on the web. It has an inherent
property called sentiment; playing an important role in decision-making process of
the people. In order to provide better information access, analysing sentiments and
rating them in terms of satisfaction has become an essential characteristic on the
web.
Sentiment analysis or opinion mining is a web 2.0 problem that aims to deter-
mine the attitude of a speaker or writer towards a particular topic by classifying
the polarity in the text. Sentiment classication can be viewed as a special case
of topical classication applied to subjective portions (sources of sentiment) of the
document. Hence, the key task in sentiment classication is extracting subjectiv-
ity. In this thesis, we classify the overall sentiment of a document using supervised
learning approaches. We focus more on extracting subjective features, current ap-
proaches in extracting them and their limitations.
Existing approaches for extracting subjective features rely heavily on linguistic
resources like sentiment lexicon and some complex subjective patterns based on
Part-Of-Speech (POS) information, thus making the task more resource dependent.
Since, regional language content is growing on the web gradually and people are
interested to express their thoughts in their local language, extending these resource
based approaches for various languages is a tedious job. It requires a lot of human
effort to build sentiment lexicons and frame rules for detecting subjective patterns.
To make the task of subjective feature extraction more feasible, approaches that
reduce the use of linguistic resources are needed. In this thesis, we attempt to
address the problem of resource dependency in subjective feature extraction. We
assume that entire document does not contain subjective information and it has a
misleading text in the form of objective information. We propose a method called
RSUMM that lters objective content from a document. We explore the use of
classic information retrieval models for estimating subjectivity of each sentence in
RSUMM.
We follow a two step ltering methodology to extract subjectivity. We esti-
mate subjectivity at sentence level and retain the most subjective sentences from
each document. In this way, we have an excerpt of a document that preserves sub-
jectivity to a level comparable or better than the total document for efcient sen-
timent classication. Then, we apply well known feature selection techniques on
the subjective extract for obtaining the nal subjective feature set. We evaluate our
methodology on two supervised customer review datasets. We use standard evalua-
tion metrics in classication like accuracy and mean-absolute-error for evaluation.
Our results on those datasets prove the effectiveness of our proposed ltering
methodology. Based on the results, we conclude that subjective feature extraction
is possible with minimal use of linguistic resources.
Although ratings convey the sentiment in a glance, the real essence of it is con-
tained in the text itself. The second part of this thesis explains our approach to sum-
marize sentiments of multiple users towards a topic. We produce an extract based
summary from multiple documents related to a topic preserving the sentiment in
it. We focus more on relating sentiment classication and sentiment summariza-
tion and show how classication helps in summarizing sentiments. We evaluate our
approach on a standard web blog dataset using standard evaluation metrics.
Publications
Kiran Sarvabhotla, Prasad Pingali and Vasudeva Varma, ALexical Similarity
Based Approach for Extracting Subjectivity in Documents, Published in the
Journal of Information Retrieval, Special Issue on Web Mining for Search,
Vol 14-(3), 2011.
Kiran Sarvabhotla, Prasad Pingali and Vasudeva Varma, Supervised Learn-
ing Approaches for Rating Customer Reviews, Published In the Journal of
Intelligent Systems, Vol 19-(1), 2010.
Kiran Sarvabhotla, Kranthi Reddy .B and Vasudeva Varma, Classication
Based Approach for Summarizing Opinions in Blog Posts,In the Proceed-
ings of Indian International Conference on Articial Intelligence (IICAI-09),
Special Track on Web 2.0 and Natural Language Processing, Tumkur, De-
cember, 2009.
Vasudeva Varma, Prasad Pingali, Rahul Katragadda, Sai Krishna, Surya Ganesh,
Kiran Sarvabhotla, Harish Garapati, Hareen Gopisetty, Vijay Bharath Reddy,
Kranthi Reddy, Praveen Bysani and Rohit Bharadwaj, IIIT Hyderabad at
TAC 2008,In the Working Notes of Text Analysis Conference (TAC) at the
joint meeting of the annual conferences of TAC and TREC, USA, November,
2008.
Contents
Table of Contents ix
List of Tables xii
List of Figures xiii
1 Introduction 1
1.1 Introduction to Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 Rating Sentiments . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.3 A Generic Approach to Document Sentiment Analysis . . . . . . . 4
1.2 Extracting Subjectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 Challenges in Extracting Subjectivity . . . . . . . . . . . . . . . . 7
1.2.2 Existing Approaches . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.1 Problem Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.3 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 Overview of the Proposed Methodology . . . . . . . . . . . . . . . . . . . 11
1.4.1 RSUMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4.2 Evaluation and Comparisons . . . . . . . . . . . . . . . . . . . . . 13
1.5 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2 Related Work 15
2.1 Sentiment Classication at Different Levels . . . . . . . . . . . . . . . . . 16
2.1.1 Word or Phrase Sentiment Classication . . . . . . . . . . . . . . . 16
2.1.2 Document Sentiment Classication . . . . . . . . . . . . . . . . . 18
2.1.3 Sentiment Classication at Sentence Level . . . . . . . . . . . . . 20
2.2 Subjectivity Classication . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.1 Min-cut based Subjectivity Classication . . . . . . . . . . . . . . 21
2.3 State-of-the-art Approaches and Benchmarks . . . . . . . . . . . . . . . . 22
ix
CONTENTS
3 Subjective Feature Extraction 24
3.1 Information Retrieval Models and SVM . . . . . . . . . . . . . . . . . . . 25
3.1.1 Vector Space Model . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.1.2 Unigram Language Model . . . . . . . . . . . . . . . . . . . . . . 26
3.1.3 Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . 27
3.2 RSUMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2.1 Lexical Similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2.2 Probabilistic Estimate . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.3 Term Co-occurrence . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.3 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3.1 Mutual Information . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3.2 Fisher Discriminant Ratio . . . . . . . . . . . . . . . . . . . . . . 34
3.3.3 Final Subset Selection . . . . . . . . . . . . . . . . . . . . . . . . 35
3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4 Evaluation 37
4.1 Experimental Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.1.1 Datasets and Classiers . . . . . . . . . . . . . . . . . . . . . . . . 37
4.1.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.1.3 Estimating the Parameter X . . . . . . . . . . . . . . . . . . . . . 40
4.2 Binary Classication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.3 Multi-variant Classication . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.3.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.3.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5 Sentiment Summarization 55
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.2 Classication Based Approach . . . . . . . . . . . . . . . . . . . . . . . . 57
5.2.1 Training the Classier . . . . . . . . . . . . . . . . . . . . . . . . 59
5.2.2 Polarity Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.2.3 Final Ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.3.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.3.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6 Conclusion 63
6.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
6.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
x
CONTENTS
6.2.1 Products comparison . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.2.2 Sentiment summarization . . . . . . . . . . . . . . . . . . . . . . 66
6.2.3 Opinion reason mining . . . . . . . . . . . . . . . . . . . . . . . . 67
6.2.4 Other Applications . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.3 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Bibliography 69
xi
List of Tables
4.1 Statistics of the dataset PDS2 . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2 Results showing CV accuracies for baseline BL and top half TH and bottom
half BH on PDS1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.3 Results showing CV accuracies for RSUMM
LS
,RSUMM
LS
+MI and RSUMM
LS
+FDR
on PDS1 over BL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
PE
,RSUMM
PE
+MI and RSUMM
PE
+FDR
on PDS1 over BL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
CO
,RSUMM
CO
+MI and RSUMM
CO
+FDR
on PDS1 over BL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.6 State-of-the-art Accuracy Values on PDS1 . . . . . . . . . . . . . . . . . . 47
4.7 Table showing the results obtained by Stefano et al. on PDS2 for their
different feature representations with MV as the feature selection method . . 49
4.8 Table showing CV accuracies on PDS2 for different feature representations
using total review with LR as the classication method . . . . . . . . . . . 50
using RSUMM
CO
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
using ADF metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
using RSUMM
CO
with MI and FDR . . . . . . . . . . . . . . . . . . . . . 51
using naive-bayes classier and MI as the feature selection method . . . . . 52
5.1 Results showing average NR, NP and F-Measure values for 22 topics . . . 62
xii
List of Figures
1.1 General methodology adopted in document sentiment analysis . . . . . . . 4
1.2 A sample movie review . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1 Logit Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.1 Plot showing the effect of X on accuracy with RSUMM
LS
. . . . . . . . 41
PE
. . . . . . . . 41
CO
. . . . . . . . 42
5.1 Sample TAC Queries and Targets . . . . . . . . . . . . . . . . . . . . . . 57
5.2 Architecture of our sentiment summarization system . . . . . . . . . . . . 58
xiii
Chapter 1
Introduction
Todays World Wide Web has become a major source of information for the people. The
textual information in the web can be broadly categorized into two: facts and opinions.
Facts are objective expressions of the people about entities, events etc., whereas opinions
are subjective expressions that describe their feelings, sentiments or apprehensions on enti-
ties, events and others [60, 71]. Most of the research work in the areas of natural language
processing, text mining and information retrieval focused on facts particularly news stories
until the early part of this decade. One such application of research is text classication
of news content to politics, movies, sports etc.. This can be attributed to the abundance of
news content and scarcity of opinion content on the web at that time.
With the advent of customer reviews, blogs and growth of e-commerce in this decade,
user-generated-content has grown rapidly on the web. Reviews posted by the people on an
e-commerce website, views of people in blogs, discussion forums etc. can be collectively
called as user-generated-content. It has an inherent property called sentiment. Analysing
these sentiments have received much attention among research community and market an-
alysts in recent years because of its potential business applications such as improving cus-
tomer satisfaction on the product. Hence, sentiment analysis has become one of the hot
topics of research in this decade.
1
CHAPTER 1. INTRODUCTION
1.1 Introduction to Sentiment Analysis
1.1.1 Sentiment Analysis
Sentiment analysis is an area of computational study of opinions, sentiments or emo-
tions [71]. The word sentiment is dened as:
1. A personal belief or judgement that is not founded on proof or certainty.
2. A general thought, feeling or sense.
3. A cognitive perception or emotional feeling towards a particular brand or product.
(market perspective)
Sentiment analysis or opinion mining is thus a paradigmof natural language processing,
text mining and information retrieval that aims to determine the attitude of a speaker or
writer towards a particular topic. The basic task in sentiment analysis is to identify polarity
of the given text. The analysis is done at different text levels: word, phrase, aspect, sentence
and at document level, and a semantic orientation predicted.
As mentioned earlier, the textual information contains both facts (objective informa-
tion) and opinions (subjective information). Objective information does not convey the
sentiments of the people. Hence, for any sentiment analysis task, the key is to extract the
subjective information robustly and analyse it.
1.1.2 Rating Sentiments
Sentiment analysis is a web 2.0 problem that attracted attention with the growth of social
media. Social media (blogs, customer reviews, discussion forums etc.) is playing a promi-
nent role in decision-making process of the people. It has become customary for them to
know what others are saying about a particular product or service they wish to buy or
avail. According to a survey performed on TripAdvisor
1
, among the users who use the web-
1
http://www.tripadvisor.com
2
site, 97% are inuenced by other travellers opinions [38]. Popular websites like IMDb
2
,
Amazon
3
are encouraging users to post reviews so that they can be useful to others.
Since, popular products or services are often commented by the people and also some
reviews have large content, it is very difcult for customers or manufacturers to go through
the entire content for arriving at a decision. To better facilitate them and to provide better
information access, several websites are encouraging people to quantify a particular prod-
uct or service in terms of their satisfaction. Hence, labeling a review with a rating has
become a crucial characteristic on the web. The labeling is generally done on the basis of
overall satisfaction. The ratings convey the summary of the text in a glance and they are of
immense help.
The rating functionality is provided on some popular websites. Most of the blogs, dis-
cussion forums and reviews do not have explicit ratings. Yet, they are valuable and useful
sources of information. But, there is every chance that a customer might skip them because
of their large content (the problem of information overload). Hence, systems that analyse
the sentiments of people in the given text and predict the polarity are gaining popularity.
The polarity orientation is also referred as semantic orientation, sentiment orientation or
opinion orientation.
In our work, we attempt to rate sentiments on two popular scales on the web; posi-
tive/negative or thumbs up/thumbs down
4
and on a ve point scale
5
for the overall doc-
ument
6
. There are also very few systems that rate sentiments at a very ne level (word,
phrase or aspect).
2
http://www.imdb.com
3
http://www.amazon.com
4
A binary scale of polarity
5
A multi-variant scale of 1-5: starred(*) rating with 5 being the top and 1 being the worst.
6
In our case, a document refers to a review or a blog
3
Figure 1.1 General methodology adopted in document sentiment analysis
1.1.3 A Generic Approach to Document Sentiment Analysis
The general methodology adopted for document sentiment analysis is depicted in Fig-
ure 1.1.
The rst phase is preprocessing that includes typical text processing methods like to-
kenization of the text, removing stopwords and stemming. The crucial part for sentiment
analysis system is identifying the subjective expressions in the text. This phase is generally
called as subjective feature extraction. The efciency of the system is highly dependent
on the robustness of this phase. Since, the key task in sentiment analysis is extracting
subjectivity, we focus more on this area in our work.
The third phase is analysing the extracted subjective expressions and predicting their
polarity. Most of the sentiment analysis systems either use linguistic resources especially
sentiment lexicons like Senti Wordnet
7
and General Inquirer
8
or unsupervised/supervised
7
http://sentiwordnet.isti.cnr.it/
8
http://www.wjh.harvard.edu/ inquirer/
4
approaches to predict the semantic orientation.
Some of the popular sentiment analysis systems are:
OPINE
9
Opinion Observer
Red Opal
Social Media Analytics (SAS)
SinoBuzz
1.2 Extracting Subjectivity
As mentioned earlier, the critical step in any sentiment analysis task is extracting subjectiv-
ity. For example, consider the movie review in Figure 1.2 for the movie Iron Man 2. The
overall sentiment of the author towards the movie is negative. Lets examine the features
that prompted the author to arrive at this decision.
Explicit subjective features:
Negative semantic orientation:
- disappointed, What the hell, was nt even epic, Stupidity, boring, no sense, worst,
blah.
Positive semantic orientation:
- liked, great special effects, sophisticated, better, good.
Implicit subjective features:
- Basically all the action you will see in this movie is what you saw in the trailer
(nothing new) - beat 20 guards? you bet, hack system made by a guy who hacked into
important military system in 10 sec
9
http://www.cs.washington.edu/research/knowitall/opine/
5
Figure 1.2 A sample movie review
6
There is more usage of negative features by the writer compared to the positive aspects
of the movie (both implicit and explicit). Hence, the overall sentiment of the author towards
it is negative. The features may not be mere unigrams, higher order n-grams (larger text
units) like great special effects are also subjective.
1.2.1 Challenges in Extracting Subjectivity
There are some interesting observations fromthe sentiment bearing document in Figure 1.2.
1. A document is a mixture of subjective and objective information.
2. There is a subtle difference in expressing sentiment to a topic.
3. It is either structured or unstructured document.
4. It can be an information mixed with different orientations, but the overall sentiment
is biased towards one label.
If we observe the review above, there are sentences like There is an agent played by
Scarlett Johanson that dont describe the feelings of the author. They are facts related to
cast and plot of the movie. There can be sentences or paragraphs in a document that only
conveys the factual information. This information can be regarded as a potential noise or
misleading text, hence needs to be ltered.
There are more ways to express sentiment rather than the topic. Suppose, news stories
related to movies, politics, sports etc., can be categorized using keywords alone. In the
above review, there are sentences with no sentiment bearing word, yet implicitly convey
the sentiment of the author. To capture this subtle difference in expressing sentiment is a
challenging task.
There is no restriction on the part of the user to follow a certain pattern while expressing
his thoughts. Hence, a document can be treated as an unstructured data and mining such
data is difcult. However, there are some documents written by professionals that are
7
structured. For example, a movie reviewer for a popular site will follow a pattern like
explaining the plot rst, discussing the aspects, conveying the overall sentiment of his/her
at the end.
Also, there can be pros or cons of a product mentioned in the same document. Hence,
it has contradictory information. But, the overall sentiment for the document is biased
towards one label and analysing such contradictory patterns is challenging.
1.2.2 Existing Approaches
Most of the existing approaches for extracting subjective features from documents rely
heavily on linguistic resources. A subjective feature can be a word or a phrase as shown
in Figure 1.2. Popular linguistic resources researchers use for subjective feature extraction
are:
Sentiment lexicons
POS tagger
Sentiment lexicons are dictionaries prepared by researchers for analysing the senti-
ments. They contain words and phrases that are subjective with their corresponding orien-
tations [28, 31, 27] as given by them. Senti WordNet and General Inquirer are examples
of such lexicons. Using Part-Of-Speech (POS) tagger, researchers framed rules based on
textual patterns. These patterns are considered to be subjective and the text units that follow
these patterns are extracted. The patterns vary from a simple noun phrase (NP), verb phrase
(VP) to very complex patterns. Brills tagger
10
and NLP toolkit
11
are some of the examples
of resources that are used to extract the POS information [5, 64, 72, 92].
Some of the subjective patterns are:
- NP, VP, JJ NN, RB JJ not NN, JJ JJ not NN, RB VB, NN JJ not NN etc. where NN is a
noun and RB stands for adverb, JJ stands for adjective and VB stands for verb.
10
http://www.cs.cmu.edu/afs/cs/project/airepository/ai/areas/nlp/parsing/taggers/brill/0.html
11
http://nltk.sourceforge.net/index.php/Main Page
8
The text units that match the above patterns are considered as subjective features. There
are also other techniques like clausal extraction and dependency parsing for extracting sub-
jective features [64]. Using clausal extraction tools, researchers extract clauses from the
text. They use POS tagger and frame patterns to discard the text units that dont contribute
to subjectivity from the extracted clauses. More on these will be discussed in Chapter 2 in
detail.
1.3 Problem Description
From the above sections, it is clear that analysing sentiments ins documents is a challenging
task. There is a lot of misleading text in the form of objective information. There are
subtle variations in expressing sentiment by writers. The sentiment bearing document has
contradictory information and it is difcult to analyse these type of information. Also,
we looked at the existing approaches for extracting subjective features from the sentiment
bearing document. In this thesis, we propose methods to extract subjective features fromthe
document with the help of simple frequency based approaches using information retrieval
models.
1.3.1 Problem Scope
Current day research work in subjective feature extraction rely on linguistic resources like
lexicons and POS tagger. Lexicons are very generic and they cant capture subtle variations
in expressing sentiment from context to context and from domain to domain. They contain
subjective features with binary orientation and using them for multi-variate analysis is not
possible. Using POS tagger and other tools, researchers frame complex rules to extract
subjective features. Hence, the task of subjective feature extraction has become more re-
source dependent. Regional language content is growing on the web gradually and people
are more interested to express their thoughts and feelings in their local language. Hence, to
extend the current day approaches in subjective feature extraction across several languages
9
is a tedious job. It requires a lot of human effort to build such tools in each language. So,
to make the task of subjective feature extraction more feasible, we need approaches that
require minimum use of linguistic resources and yet achieve signicant results.
1.3.2 Motivation
We identied two major problems with the existing approaches in subjective feature ex-
traction.
1. They rely heavily on linguistic resources, thus making the task resource dependent.
2. Usage of complex patterns to extract subjectivity.
This motivated us to investigate approaches that make the task of subjective feature
extraction more simple and generic. In our work, we rely more on corpus statistics rather
than complex textual patterns or linguistic resources to extract subjective features. Since,
sentiment analysis addresses the problem of predicting the polarity of a given text unit. It is
often referred as sentiment classication and we use this word often in the rest of the thesis
as a synonym to sentiment analysis.
1.3.3 Problem Statement
Sentiment classication address the problem of predicting the polarity of the text. It can be
viewed as a special case of topical classication applied to subjective portions. Hence, the
key task in sentiment classication is subjective feature extraction. Existing approaches
for extracting subjectivity rely heavily on linguistic resources and complex rule based pat-
terns of subjectivity, thus making the task very much resource dependent and complex.
With regional language content growing on the web, scarcity of such resources should not
prevent people to conduct research on those languages. Hence, approaches that require
minimum usage of language resources yet perform to a level comparable to using them
10
are needed. In this way, we can solve the problem of resource dependency prevalent in
sentiment analysis.
1.4 Overview of the Proposed Methodology
We use supervised learning approaches to classify the overall polarity of the document.
We focus more on extracting subjective features from it and representing them as feature
vector for classication. We approach the problem of removing resource dependency in
subjective feature extraction by making two claims.
1. Entire review does not contain subjective information.
2. If we can successfully lter out objective information, then subjective feature extrac-
tion is achievable with minimum use of linguistic resources and no complex patterns.
We follow a ltering strategy at sentence and word level to extract subjective features
from the document. We view each review a mixture of objective and subjective sentences
where the former has nothing to convey on the feelings of the author. If you observe the
sample review in Fig 1.2, the subjective features are less in number compared to the entire
content. In our analysis on the web manually, we found that most of the reviews have
the same pattern. Hence, we need to discard the potential noise in the form of objective
sentences before converting the document text units as feature vector for the classier.
1.4.1 RSUMM
We propose a method called RSUMM to extract subjective sentences from a review. It
is based on Information Retrieval (IR) models like vector space model, language model
and term co-occurrence model. We use techniques similer to the above IR models to lter
out the objective information in a review. We call the excerpt of a review with objective
information ltered as subjective extract.
Our subjective feature extraction occurs in two steps:
11
RSUMM estimates subjectivity of each sentence and returns most subjective sen-
tences as subjective extract.
Then we apply feature selection techniques on the subjective extract to have the nal
feature set.
In this thesis, we propose three variants of RSUMM to obtain subjective extract. The
rst method is based on lexical similarity between each sentence and two term vectors, we
call it as RSUMM
LS
. We dene two metrics average document frequency (ADF), average
subjective measure (ASM) in RSUMM
LS
to score the lexical similarity. The rst metric
intuitively extracts important terms from a given collection. ASM metric intuitively se-
lects most subjective terms from a given collection. We use both metrics to estimate the
subjectivity of a sentence. We retain sentences that are more subjective in the subjective
extract.
The second method is based on probabilistic estimates rather than raw term similarity.
In this method, we estimate subjectivity of a sentence based on how its terms are observed
from a subjective model. We call this method as RSUMM
PE
. The basis for this method is
unigram language modeling used in information retrieval. In the third method, we use the
meta information like title, pros, cons and aspects of a product available with the review.
We frame target words with the available meta information and use term co-occurrence
model to estimate subjectivity. We call this method as RSUMM
CO
. It assumes that authors
model subjective expressions around entities like aspects, pros and cons of a product.
We retain the top X% of sentences as the subjective extract in each method. We es-
timate the best X value for each method in such a way it preserves sentiment in the
subjective extract at a level comparable or better to the full review.
After obtaining the subjective extract, we need to convert it as a feature vector. We use
n-gram models to present it as a feature vector for the classier. As n-gram modeling is
done on sentences and they are relatively larger text units, there can be large number of
irrelevant features. Hence, for faster learning and better classication a feature selection
12
phase is needed in our case. We employ two state-of-the-art feature selection methods
Mutual Information (MI) and Fisher Discriminant Ratio (FDR). We use support vector
machines (SVM) as the classier in our work. We view the problem of predicting the
sentiment on a binary scale as a problem of support vector classication (SVC) and rating
it on a multi-variate scale as a problem of logistic regression (LR).
1.4.2 Evaluation and Comparisons
We conduct experiments on customer reviews, one of the major entities of social media.
The customer review datasets are from movie and hotel review domains. They are popular
domains among research community in sentiment classication. We evaluate our system
conrming to standard classication evaluation metrics like accuracy and mean-absolute-
error. We support our claims made above, based on the results on the datasets. Through
out our subjective extraction phase, we depend on corpus statistics. We dont use any
complex patterns or rule based approaches in this phase, rather use simple frequency based
metrics. Hence, we are making the task of subjective feature extraction simple and resource
independent that can be extended easily.
There were approaches that minimize or do not use any linguistic resource in the liter-
ature. Pang et al. in [72] used unigrams, bigrams, POS information, sentence position as
features. They viewed sentiment classication as a special case of topical classication and
used standard machine learning techniques like naive-bayes and support vector machines.
Pang et al. in [69] extended their work in [72] by focusing on sentence level subjectivity
detection. They ltered out the objective information using min-cut based classication
method to obtain a subjective extract. They proved that sentence level subjectivity detec-
tion indeed helps document level sentiment classication using unigrams as features on the
subjective extract. Cui et al. in [16] used n-gram model to represent each review. They did
nt use any resource but focused on n-gram model to represent each review and compared
the performance of different classiers.
13
Our RSUMM is inspired from the work by Pang et al. in [69], but the methodology
adopted to lter out objective information is totally different from ours. Also, in addition
to ltering out the objective information, we apply feature selection techniques to obtain the
nal subjective feature set. We also rate the sentiments of customer on a multi-variant scale
which is fairly a new application in sentiment analysis. Most of the work in the literature
focused on predicting the binary orientation of sentiment.
1.5 Thesis Organization
The rest of the thesis is organized as follows:
In chapter 2, we discuss the related work in sentiment classication at different levels
and different techniques used to classify sentiment. We also describe how the research in
this area has evolved in this decade.
Chapter 3 describes our methodology of extracting subjective features using informa-
tion retrieval approaches. We rst present an overview of information retrieval models and
supervised learning methods we use. Then, we describe RSUMM to have a subjective
extract. Also, we discuss the feature selection methods we employ on subjective extract.
In chapter 4, we describe the experiments conducted on movie and hotel review datasets
to validate our methodology. We report the results using our methodology and compare
them with the existing state-of-the-art approaches. We discuss the results and present our
observations in detail in this chapter.
We describe our approach on multi-document sentiment summarization in chapter 5.
What makes sentiment summarization different from automatic text summarization ?. We
focus more on this aspect and explain howwe can use sentiment classication to summarize
sentiments. Finally, we conclude the thesis by outlining our contributions, providing some
insights on how to extend this work in future.
14
Chapter 2
Related Work
In this chapter, we discuss the literature related to sentiment classication, subjectivity
extraction, unsupervised and supervised approaches for classifying sentiment. We discuss
sentiment classication at different levels of text units on both binary and multi-variant
scales. We also discuss the existing literature on subjectivity detection at the sentence
level.
Classication is an age old problem and several classiers were suggested in the last
few decades. Among them naive-bayes, support vector machines, decision tress, rule-
based classiers are important and widely used in several applications. A good review
on classication methods can be found in [39, 58]. Until the early part of this decade,
most of the classication tasks focused on classifying news stories. With the advent of
customer reviews and the growth of e-commerce in the early part of this decade, sentiment
classication has become an emerging and hot area of research for its potential business
applications and market intelligence. As discussed in Chapter 1, analysing sentiments and
predicting their orientation poses very challenging research issues.
15
CHAPTER 2. RELATED WORK
2.1 Sentiment Classication at Different Levels
Sentiment classication dates back to late 1990s [2, 51, 83], but in early part of this decade,
it has become an important discipline in the areas of natural language processing, text
mining and information retrieval [14, 21, 22, 25, 33, 35, 37, 42, 45, 54, 62, 76, 79, 84,
91, 96, 98, 107]. Until the early 2000s, two main approaches to sentiment classication
were based on machine learning and semantic analysis techniques. Later, shallow natural
language processing techniques were used in sentiment classication, especially in overall
document sentiment classication.
2.1.1 Word or Phrase Sentiment Classication
In the early stages of research, word sentiment classication was considered to be the basis
for phrase and document sentiment classication. Manual or semi-manual construction
of lexicons with words and their semantic orientations were developed [40, 41, 59, 73].
The words in them were mostly adjectives or adverbs that have semantic orientation [1,
28, 34, 88, 94] and the orientation was dened by researchers. The approaches to classify
sentiment at word level could be grouped into two: 1) Corpus based approaches and 2)
Dictionary based approaches.
The rst group included methods that depend on syntactic and co-occurrence patterns
of words in large texts to determine their sentiment [40, 93, 109]. The second group used
WordNet
1
information, especially synsets and hierarchies, to acquire sentiment bearing
words or to measure the similarity between candidate words and sentiment-bearing words
like good or bad [44, 52, 49].
Analysis by Conjunctions
In this method semantic orientation of adjectives was predicted using conjunctive words
like and, or, but, either-or, neither-nor. The intuition was that the act of conjoining ad-
1
http://wordnet.princeton.edu/
16
jectives is based on linguistic constraints (and always conjuncts two adjectives with same
orientation whereas but contradicts them) [40]. The steps followed to predict the semantic
orientation of adjectives using conjunctive analysis are:
1. Extract adjective pairs along with their conjunctive words.
2. Train a log-linear regression classier then classify pairs of adjectives as having same
orientation or opposite orientation.
3. Then, a clustering algorithm to partition the set into positive and negative orientation
terms.
Analysis by Lexical Relations
This method used semantic association to determine the orientation. If followed the intu-
ition that two words or phrases tend to have same semantic orientation if they have strong
association [92, 94, 49]. To determine the degree of semantic association, researchers have
used WordNet or web search. The entire process occurred in the following manner:
1. Construct relations using WordNet especially synsets.
2. Dene the distance between two text units.
3. Calculate the semantic orientation by calculating its relative distance from two seed
words that have semantic orientation like good and bad or excellent and poor.
4. The semantic orientation of word or phrase is positive if the relative distance is
greater than zero, negative otherwise.
Analysis by Glosses
This method followed an assumption that if one term was semantically oriented in one
direction, then the terms in its gloss tend to have same semantic orientation [28, 27, 26].
The process occurred in the following steps:
17
1. A seedset representing two categories, positive and negative is provided as input.
2. Expand the seedset to accommodate new terms by using lexical relations.
3. For each term t in the expanded set, it is collated by all the glosses of t and textual
representation is converted to a vector representation for classication.
4. Train a binary classier on the term representation in the expanded seedset and then
applied to the terms in the test set.
Analysis by General Inquirer
General Inquirer (GI) is a system that contains a list of terms with their different senses. For
each sense of the term, it provides a short denition as well as other information. Terms
are tagged as positive and negative. In addition, GI dictionary also contains negations,
intensiers and diminishers like not, fantastic,barely etc.. The occurrence probability
of the term for each sense is also provided. Hence, it is widely used by researchers in
subjective feature extraction [5, 50].
2.1.2 Document Sentiment Classication
Supervised machine learning approaches are popular among researchers in predicting the
overall sentiment of the document [3, 7, 50, 64, 67, 72, 69, 100, 81]. Most of them focused
on labeling a new sample as positive or negative based on previously seen samples an-
notated by humans. Grading a review on a multi-variant scale is fairly a new application in
this area. The entire process is typically composed of two steps: 1) Extracting the subjec-
tive features from the training data and converting them as feature vectors. 2) Training the
classier on the feature vectors and apply the classication on a new sample. Preprocessing
the raw documents before extracting the subjective features is also done. The preprocessing
stage includes removing HTML tags, tokenization of documents.
18
Subjective Feature Extraction
To extract subjective features, researchers used lexicons like Senti WordNet and General
Inquirer [27, 5]. Most of these resources contain words and phrases (rare in number).
In sentiment classication, larger text units also play an important role in predicting the
semantic orientation as shown in Fig. 1.2. Hence, researchers framed rules using POS
information to extract larger text units than simple unigrams that were considered to be
subjective [72, 5, 92, 27, 28].
Researchers used lexical ltering techniques based on hypernymy in WordNet [11, 17,
20, 23, 24, 31, 36, 48, 77, 85] and patterns based on POS tagger [10, 63, 78, 82, 104,
105, 106]. WordNet lter attempted to substitute synonyms by a set of likely synonyms
and hypernymy generalization, because it is unlikely to encounter repetitions of identical
words in the text. POS lters were used to extract the patterns that dont contribute to
subjectivity as in [64]. These patterns are considered to be noise and POS lters would
remove them before converting text units to feature vectors for classication.
Appraising adjective method [99, 100] focused on extraction and analysis of appraisal
adjective groups optionally modied by enhancers or diminishers. Coherent groups that ex-
pressed together a particular attitude are extracted. Examples of appraisal adjective groups
are: extremely boring, not really very good etc.
The steps followed in this method were:
1. Build a lexicon using semi-automatic techniques, gathering and classifying adjectives
and modiers to categories in several taxonomies of appraisal attributes.
2. Extract adjectival appraisal groups from texts and compute their attribute values ac-
cording to this lexicon.
3. Represent documents as vectors of relative frequency features using these groups.
4. Train a support vector machine algorithm discriminating positively from negatively
oriented test documents.
19
Training the Classier
Researchers viewed document sentiment classication as a special case of topical classi-
cation and conducted experiments with machine learning algorithms like naive-bayes, knn,
support vector machines. Pang et al. [72] used machine learning techniques to classify the
overall sentiment in movie reviews. Their best accuracy value (82.9%) was reported using
unigrams as features and SVM as the classier. Among all the classiers, SVM and naive-
bayes were widely used to predict the sentiment orientation. The features used are n-grams,
lexical information, POS information, sentence position, adjectives, appraisal adjectives.
2.1.3 Sentiment Classication at Sentence Level
Since, researchers thought it was too coarse to compute the sentiment at document level,
they investigated approaches to determine the focus of each sentence. They computed the
semantic orientation at sentence level [19, 29, 53]. They extracted opinion bearing terms,
opinion holders and opinion-product aspect association in each sentence and then analysed
the semantic orientation. There was also an area of research called aspect based sentiment
classication where they extract aspects of a product and rate sentiments of people on
its each aspect. Thet et al. in [90] conducted experiments on aspect based classication of
movie reviews. They used information extraction techniques like pronoun resolution, entity
extraction and con-referencing to segment each sentence. They predicted the sentiment of
users towards cast (producers, directors) and also overall sentiment.
2.2 Subjectivity Classication
Subjectivity detection is the task to investigate whether a text unit presents the opinion of
the author or convey facts. The text unit is typically a sentence or a paragraph. Researchers
proved that subjectivity detection at sentence level has a very tight relation with document
sentiment classication [69, 102, 103, 101, 109]. Subjectivity detection helped the senti-
20
ment classier from considering the irrelevant or potentially misleading text. Pang and Lee
in [69] compressed the reviews into much shorter extracts, optimizing the sentiment con-
tent in them to a level comparable to full review. Naive-bayes and min-cut classication are
two popular classiers used in subjectivity detection. We discuss briey on min-cut based
classication, the state-of-the-art approach in subjectivity detection [69, 4].
2.2.1 Min-cut based Subjectivity Classication
Cut-based classication method assumes that text units that occur near each other (within
discourse boundaries) share the same subjectivity status [69]. In [69], they used pair-wise
interaction information and their algorithm used an efcient and intuitive graph-based for-
mula relying on nding minimum cuts.
Suppose there are n items x
1
, x
2
, x
3
, . . . , x
n
to be divided into two classes C
1
and C
2
.
Then, there are two types of penalties individual and association for x
i
and x
j
to be in the
same class.
Individual scores, ind
j
(x
i
): it gives non-negative estimates of each x
i
s preference in
class C
j
using x
i
alone.
Association scores, assoc(x
i
, x
j
): it gives non-negatives of x
i
and x
j
preference to be in
the same class.
The algorithm tries to nd solution to the following optimization problem; assigning x
i
to
C
1
and C
2
minimizing the partition cost:
xC
1
ind
2
(x) +
xC
2
ind
1
(x) +
x
i
C
1
,x
k
C
2
assoc(x
i
, x
k
) (2.1)
The situation is represented in an undirected graph (G) with vertices {v
1
, v
2
, . . . , v
n
, s, t};
the last two are respectively source and sink. Add n-edges (s, v
i
) with weight ind
1
(x
i
) and
n-edges (v
i
, t) with weight ind
2
(x
i
). Finally add edges (v
i
, v
k
) with weight assoc(x
i
, x
k
).
A cut (S, T) of G is a partition of its nodes into sets S = {s} S
and T = {t} T
, where
s / S
,t / T
. Its cost cost(S, T) is the sum of weights of all edges crossing from S to T.
A minimum cut of G corresponds to minimum cost.
21
In [69], each vertex was a sentence. The individual penalties were obtained using naive-
bayes classier for each sentence per each class. Association penalties were obtained using
some proximity relations based on sentence position.
2.3 State-of-the-art Approaches and Benchmarks
We have seen in the above sections, the sentiment classication at different levels, sub-
jectivity detection and resources used in subjective feature extraction. For word or phrase
level sentiment classication, researchers used seed list of words and lexical resources
like WordNet. Document sentiment classication was done using supervised learning ap-
proaches and it is practised now also. Most of the document sentiment classication ap-
proaches predicted the sentiment on a binary scale, whereas the multi-variate classication
is fairly a new application. The document sentiment classication is highly domain spe-
cic [15, 68, 57, 4].
Since, our focus is on extracting subjective features and presenting them as feature
vectors to the classier, we dont discuss the work related to domain transfer problem in
sentiment classication. Popular domains among researchers in document sentiment clas-
sication are: 1) Movie review domain and 2) Hotel review domain. Movie review domain
is highly popular among research community in sentiment classication and many focused
on predicting the polarity of movie reviews [72, 69, 64, 67]. It is due to the popularity of
movies, abundant information about movies on the web and also the challenging nature of
reviews [92]. Movie reviews have a mixture of objective and subjective information and to
mine subjective features from them is a challenging task. Hotel review domain is popular
because of the traveling people do, they tend to enquire about various hotels on the web.
We conduct experiments on reviews in both domains. Among the classiers, SVM is used
by researchers because of its better performance compared to others.
Pang et al. in [72] used supervised machine learning techniques like naive-bayes, sup-
port vector machines with unigrams, bigrams, POS information, sentence position as fea-
22
tures. They concluded that machine learning approaches had outperformed human pro-
duced baselines. SVM performed better compared to naive-bayes with unigrams as fea-
tures. Mullen and Collier [67] used diverse information scores that assigns value to each
word or phrase using WordnNet, topic proximity and syntactic relations. They also used
SVM classier on the same movie review dataset and reported an accuracy of 86%.
Matsumoto et al. [64] observed that word order and syntactic relations play an impor-
tant role in sentiment classication. They proposed a method based on word sub-sequence
mining and dependency parsing to extract the word order and relation. To generate word
sub-sequences, they used clausal extraction tools and used n-gram model with n being 6.
They had set a support threshold of 2 for unigrams, bigrams and 10 for others to be con-
sidered as potential features for classication. They used dependency parsing techniques
to extract larger text units and combined it with POS tagger to remove the non-subjective
items. They reported an accuracy of 88.3% on the same movie review dataset using SVM
classier. All these methods predicted the orientation of polarity at document level on a
binary scale. (positive/negative)
Quantifying the sentiment with a satisfaction score is a fairly recent application in sen-
timent classication. Pang and Lee [70] conducted experiments on scoring movie reviews
on an ordinal scale of four values. Their focus was on using different learning algorithms;
simple multi-label classication, SVR and a meta-algorithm. In [110], a new task was pro-
posed, the prediction of utility of product reviews than scoring a review. They used linear
regression techniques to compute the utility.
Stefano et al. [5] viewed the problem of grading reviews on a scale of one to ve as
a problem of ordinal regression. They focused more on deriving subjective features and
selecting them. They used POS tagger, GI lexicon to extract subjective patterns. They used
feature pruning techniques based on minimum variance (MV) and a variant of MV called
round-robin minimum variance (RRMV) since their dataset was highly skewed. This was
one among the important observations they presented in their work. Among all, support
vector regression (SVR) techniques are popular among researchers to grade reviews.
23
Chapter 3
Subjective Feature Extraction
In this chapter, we describe our approach to mine subjective features from a review. We
view each review r as a combination of subjective and objective sentences (r = (S
subj

S
obj
)). We propose a method called RSUMM to score each sentence for its subjective
nature and extract the set S
subj
from r. Our subjective feature extraction approach follows
two steps:
1. We score each sentence and obtain an extract of a reviewthat preserves subjectivity at
a level comparable or even better to the total review. We call this method as RSUMM.
2. Then, we apply feature selection techniques on the extract of a review to obtain the
nal subjective feature set.
Our RSUMM is based on information retrieval techniques like vector space model,
language model and word co-occurrence model widely used in document retrieval and
other applications. In this thesis, we propose three variants of RSUMM for estimating
subjectivity of each sentence.
1. Lexical Similarity (RSUMM
LS
).
2. Probabilistic Estimates (RSUMM
PE
).
3. Co-occurrence statistics (RSUMM
CO
).
24
CHAPTER 3. SUBJECTIVE FEATURE EXTRACTION
After extracting subjective features using the above methodology, we use supervised
learning approaches to predict the overall sentiment of a document. We use n-gram models
to represent the extracted subjective sentences as feature vectors for classication. We use
SVM classier and view the problem of predicting the sentiment on a binary scale as a
problem of support vector classication (SVC). We use logistic regression (LR) to grade
reviews on a multi-variant scale.
3.1 Information Retrieval Models and SVM
In this section, we describe briey information retrieval models, SVC and LR.
3.1.1 Vector Space Model
Vector space model assigns weights to index terms [6]. It is widely used in information
retrieval to determine the relevance of a document for a given query. Both documents
and the query are represented as weighted vectors of terms and these weights are used to
compute the degree of similarity between the query and a document. Higher the similarity
degree, more relevant is the document to the query.
Formal Denition: Both query q and document d are represented as a weighted vector
of terms. The query vector is dened as q := (w
1,q
, w
2,q
, . . . , w
t,q
) and the document vector
d := (w
1,d
, w
2,d
, . . . , w
t,d
) where t is the total number of index terms.
Then, the degree of similarity between the document d and the query q is the correlation
between the two vectors. The correlation is quantied by a variety of similarity measures,
for instance by the cosine of the angle between the two vectors. The weighting measure
used typically in vector space model is tdf.
TFIDF(t, C) = tf(t, d) X Math.log
N
n
(3.1)
where tf(t, d) denote the frequency of the termin the given document d, N denotes the total
number of documents in collection C, and n denoted the number of documents containing
25
term t in C.
cos =
d.q
dq
=
t
i=1
w
i,d
X w
i,q
t
i=1
w
2
i,d
X
t
i=1
w
2
i,q
(3.2)
3.1.2 Unigram Language Model
A statistical language model is a probabilistic model for generating text. It was proposed in
late 1990s in information retrieval. It estimates the probability of generating a query from
the document model. The basic assumption of this model is that users have a reasonable
idea of terms they want in documents; it directly exploits this idea. Language modeling
consider documents as models and queries as string of texts randomly sampled from these
models [6]. It ranks documents according to the probability that a query Q would be
observed during repeated random sampling from the document model M
D
: P
Q
M
D
.
Unigram language model is the simplest form of language model that discards the con-
ditioning of the text, and estimates the probability of each term independently. It is often
used in information retrieval compared to other types of language models because of its
simplicity.
P
Q
M
d
tQ
P
t
M
d
tQ
tf
(t,d)
dl
d
(3.3)
where M
d
denotes the language model of a document d, tf
(t,d)
denotes the frequency of
term t in document d and dl
d
denotes the total number of tokens in document d.
Language modeling approach suffers due to sparseness in the data. We may not wish to
assign zero probability for one or more query term t that is missing in document d. There
are many smoothing techniques available to address the problem of data sparseness. This
model is fairly a recent model used in information retrieval, machine translation, speech
recognition etc.. It shares some similar characteristics with vector space model, both use
26
term frequencies to estimate the importance of the term, terms are often treated as indepen-
dent. Language modeling is based on probability estimates rather than similarity.
3.1.3 Support Vector Machines
Support vector machines (SVM) is a useful technique for text classication. A classica-
tion task usually involves training the classier with some data instances whose label is
known, and predicting the label on some unknown data instances. As all supervised learn-
ing approaches do, SVM also involves training and testing phase [13, 95]. Each sample in
the training set has one target value (label or class) and several attributes (features). Given
a set of instance-label pairs (feature vectors) (x
i
, y
i
), i = 1, 2, 3, . . . , l where x
i
R
n
and
y {1, +1}
l
, SVM requires the solution of the following optimization problem.
min
w,b,
1
2
w
T
w + C
l
i=1
i
(3.4)
subject to y
i
(wT (x
i
) + b) 1
i
and
i
0
In general, the training vectors are mapped to a high dimensional space by the function .
Then, SVM nds a separating hyperplane with maximum margin in this higher dimensional
space. C > 0 is the penalty parameter. There are several kernel functions used in SVM
like linear, polynomial, radial basis function and sigmoid kernels.
In this thesis, we do not go into the details of supervised learning approaches like what
is the best kernel function and what are the best parameters for learning. We focus on deriv-
ing subjective features and representing them as feature vectors for the learning algorithm
with existing kernel functions and default parameters.
Logistic Regression
In this work, we use logistic regression (LR) to rate sentiments on a ordinal scale of one to
ve along side simple binary classication. This problem is called as ordinal regression in
machine learning [5]. It lies in between simple binary classication and metric regression.
27
CHAPTER 4. EVALUATION
Table 4.6 State-of-the-art Accuracy Values on PDS1
Author and Literature Classier Accuracy
Pang and Lee SVM 87.2
Garg et al. SVM 90.2
Matsumoto et al. SVM 93.7
Aue and Gamon SVM 90.5
Kennedy and Inkpen SVM 86.2
RSUMM
LS
. It could be due to the fact that SDS was also from movie domain and using
lexical similarity indeed helped it.
Among all the three variants, RSUMM
CO
did nt fare well both in isolation as well as
in combination with MI and FDR for each feature representation of reviews. The accu-
racy value of it in isolation was 3.4% below RSUMM
LS
and 2% below RSUMM
PE
. In
this method also, applying MI and FDR increased the accuracy but not to the extent of
RSUMM
LS
and RSUMM
LS
. It could be due to the inadequacy of meta information incor-
porated by us to score subjectivity. Mining such information from a review may help in
better results with this method. But as we mentioned earlier, mining the meta information
is beyond the scope of this thesis.
RSUMM
LS
in combination with FDR as the feature selection method and bigrams as
features reported the maximum accuracy on PDS1 in our methodology. Using unigrams
as features, RSUMM
PE
performed better compared to other variants with FDR as feature
selection method (89.2%). The best accuracy value using both unigrams and bigrams as
features was reported by RSUMM
PE
in combination with FDR. Table 4.6, shows the accu-
racy values reported by researchers till date on PDS1.
To the best of our knowledge, the highest accuracy value reported on PDS1 was 93.7%
by Matsumoto et al. [64]. We reported a maximum accuracy of 94.9%, an increase of
1.2% using a combination of lexical similarity RSUMM
LS
and FDR as the feature selection
47
method. Their methodology included extracting clauses, generating word sub-sequences
using the extracted clauses. Then, they used POS tagger to prune the sub-sequences based
on the patterns that do not contribute to subjectivity and with a minimum support threshold
of two for unigrams, bigrams and ten for higher order sub-sequences. In addition to sub-
sequences, they used dependency parsing techniques to extract phrases and used them as
features for classication.
Our methodology was rather simple compared to them. First we decomposed a review
into subjective extract and then applied feature selection methods on n-gram models as fea-
ture vectors for classication. Through out our approach, we depended on frequency based
approaches and corpus statistics. We reported an accuracy comparable to that reported by
Matsumoto et al. [64] using our methodology. It proved our other claim that sentiment clas-
sication could be done by using simple approaches than complex patterns and linguistic
resources.
Pang et al. in [69] reported an accuracy of 87.2% on PDS1 using unigrams as features
and SVM as the classier. Their assumption was that sentence level subjectivity detection
improves the document sentiment classication. They proved it with the help of their results
(82.6% to 87.2%). But their approach was more inclined towards extracting subjectivity
using contextual information. They assumed that sentences in proximity share the same
subjectivity status. Their subjectivity estimation was based on individual probabilities of
each sentence from the naive-bayes classier trained on SDS. In addition to that, contextual
information that scores proximity between sentences also used. Hence, our approach was
clearly different from them. Our maximum accuracy was 94.9% using bigrams as features
and it was an increase of about 7% from the accuracy reported by them.
4.3 Multi-variant Classication
The baseline (BL) for our multi-variant classication system is using unigram representa-
tion on full review with no RSUMM and feature selection methods. Stefano et al. in [5]
48
Table 4.7 Table showing the results obtained by Stefano et al. on PDS2 for their
different feature representations with MV as the feature selection method
Features MAE
MAE
M
BOW 0.682 1.141
BOW+Expr 0.456 0.830
BOW+Expr+sGI 0.448 1.165
BOW+Expr+sGI+eGI 0.437 0.942
emphasized the fact that it may not be raw unigrams, some times larger text units play a
major role in determining the orientation of sentiment. From the above experiments, it was
evident that using larger text units like bigrams (BI) as features for classication made the
system perform better. Hence, we stick to the assumption that larger text units enhance
the performance of classication as stated in [5]. They used GI lexicon and POS tagger to
extract larger text units, but we use RSUMM with MI and FDR as an alternative to using
linguistic resources.
They extracted text units that contribute to subjectivity using rule based approach with
the help of POS tagger. Some of the patterns include: Art JJ NN, NNVBJJ etc. They called
the text units that follow these patterns as expressions (Expr). Aggregating patterns was
done using GI lexicon. For example, text units like great location and good location are
aggregated as [positive] location. They called this way of aggregating text units as simple
GI expression (sGI). Then, there was a more complex way of aggregating text units called
Enriched GI expression (eGI). The above text units are aggregated as [Strong] [Positive]
location and [Virtue] [Positive] location respectively.
They used minimum variance (MV) as the feature selection method to select important
features. The results obtained by [5] were reported in Table. 4.7, lower values indicate more
accurate prediction with bold values being the best. The baseline for their system was using
bag-of-words (BOW) as the feature vector representation of the review to -SVR method.
49
Table 4.8 Table showing CV accuracies on PDS2 for different feature representa-
tions using total review with LR as the classication method
Features MAE
MAE
M
BL 0.580 0.807
BL+BI 0.540 0.897
BL+BI+TRI 0.528 0.969
Table 4.9 Table showing CV accuracies on PDS2 for different feature representa-
tions using RSUMM
CO
Features MAE
MAE
M
BL+RSUMM
CO
0.598 0.898
BL+RSUMM
CO
+BI 0.495 0.921
BL+RSUMM
CO
+BI+TRI 0.473 0.992
They divided PDS2 into 75% and 25% randomly for training and testing. There was no
cross validation test done, hence the values reported were not statistically signicant.
4.3.1 Results
We limited our selves to using upto trigrams (TRI) in multi-variate classication. Table 4.8
shows our results for various feature representations on PDS2 using full review and logistic
regression as the classication method.
We did nt apply RSUMM
LS
and RSUMM
PE
methods for scoring subjectivity, as SDS
contained subjective and objective sentences from movie review domain. Sentiment anal-
ysis is highly domain dependent and features from one domain would not work in other
domains. It was discussed already in Chapter. 2. Due to the meta information available
along with reviews in PDS2, we used RSUMM
CO
to score subjectivity of each sentence in
50
Table 4.10 Table showing CV accuracies on PDS2 for different feature represen-
tations using ADF metric
Features MAE
MAE
M
BL+ADF 0.585 0.758
BL+BI+ADF 0.531 0.776
BL+BI+TRI+ADF 0.532 0.705
tations using RSUMM
CO
with MI and FDR
Features MI FDR
MAE
MAE
M
MAE
MAE
M
BL+RSUMM
CO
0.569 0.827 0.560 0.847
BL+RSUMM
CO
+BI 0.431 0.781 0.0.435 0.822
BL+RSUMM
CO
+BI+TRI 0.444 0.842 0.477 0.87
PDS2. We set X as 80% in our case to obtain subjective extract. Then we applied MI and
FDR on the subjective extract.
We used ADF metric as a conditional criteria. We associated two or more words if
they have document frequency greater than ADF of the collection PDS2 (ADF
PDS2
). We
applied ADF metric on unigrams as a feature selection method. For example consider text
units like, had a great time, decent location and hotel was very nice. We extracted
features like [great time], [decent location], [hotel very nice] provided each unigram has
document frequency greater than ADF
PDS2
. Table. 4.10 showed the effect of applying
ADF metric on unigrams and as a conditional criteria for bigrams and trigrams. We re-
ported the accuracy values of RSUMM
CO
in Table. 4.9. Results after applying MI and
FDR on the extract of RSUMM
CO
were reported in Table. 4.11
51
tations using naive-bayes classier and MI as the feature selection method
Features MAE
MAE
M
UNI 0.496 0.524
UNI+BI 0.439 0.503
UNI+BI+TRI 0.44 0.444
In addition to LR, we also used naive-bayes classier with different feature representa-
tions and MI as the feature selection method. Bigram and trigram features were obtained
using ADF conditional criteria as described above. Results of this experiment were re-
ported in Table. 4.12.
4.3.2 Discussion
We used unigrams, bigrams, trigrams and combination of them as features for rating re-
views on a scale of one to ve. We obtained a highest MAE
value of 0.431 very much

comparable to what Stefano et al. [5] have obtained (0.437) using their subjective feature
extraction method based on linguistic resources. There was a relative improvement of 1.5%
in MAE
using our approach. The highest MAE
value was reported using RSUMM

CO
for obtaining subjective extract in combination with MI as the feature selection method.
The baseline MAE
value using unigrams as features on total review was 0.580. There

was a relative improvement of 25.7% from BL with unigrams as features which is sig-
nicant. But, unigrams in combination with MI, FDR, ADF as feature selection method
performed slightly below the baseline. It could be due to aggressive thresholds of 10%
that we used to select the nal feature set. It also supported our assumption some times
larger text units like bigrams and trigrams enhance the performance of classication. In
each case, bigrams in combination with unigrams, trigrams in combination with unigrams
and bigrams performed better to BL in MAE
evaluation perspective. Since, the dataset

52
was fairly large compared to PDS1, we went to the extent of trigrams.
The highest accuracy value obtained by Stefano et al. [5] using MAE
M
metric was
0.830. We have not obtained signicant results in this regard using RSUMM
CO
with MI and
FDR as feature selection methods. Also combining bigrams and trigrams with unigrams
declined the performance of classication. It strongly conveyed that usage of higher order
n-grams was dependent on the size of the dataset. As PDS2 was highly skewed towards
labels four and ve, using our ltering methodology based on co-occurrence did nt classify
the samples with labels one, two, three accurately. But using MAE
we were to able to
produce good results, it conveyed that labels which were dense classied better using our
methodology.
We obtained better MAE
M
values using ADF as the feature selection method for un-
igrams, and as a conditional criteria for obtaining bigrams and trigrams. The relative im-
provement of about 14.4% from the BL was obtained using a combination of unigrams,
bigrams and trigrams as features. Naive-bayes classier in combination with MI as the
feature selection method performed better in classifying the labels one, two and three. It
obtained higher MAE
M
values compared to LR method. The best value of 0.444 was re-
ported using it. It conveyed naive-bayes which was popular among topical classication
can still be applied to multi-variate sentiment classication.
4.4 Conclusion
In this chapter, we explained how we evaluated our system. We clearly explained the statis-
tics of the datasets, cross validation tests, evaluation metrics. We used standard evaluation
metrics like accuracy, mean-absolute-error for evaluation. We implemented RSUMM
LS
,
RSUMM
PE
and RSUMM
CO
on PDS1 and only RSUMM
CO
on PDS2 because of the do-
main dependency problem in sentiment analysis. We proved that subjective feature ex-
traction was achievable minimizing linguistic resources through our experimental results.
Using our methodology, we were able to achieve signicant improvements on both PDS1
53
and PDS2 from the baseline and existing state-of-the-art approaches.
54
Chapter 5
Sentiment Summarization
In this chapter, we discuss on how to summarize sentiments of different users towards a
particular topic. Here, we focus on summarizing sentiments of users in multiple docu-
ments unlike RSUMM that focused on single document subjective summary. Also, we re-
late sentiment classication and sentiment summarization and how former helps the latter.
Sentiment summarization is one application where sentiment classication can be applied.
5.1 Introduction
Automated text summarization addresses the problem of information overload by condens-
ing the essence of the text to a level comparable to that of original document. It can be
either abstract or extract based on single and multiple documents. Sentiment summariza-
tion differs from traditional document summarization [80, 87] as it has to optimize an extra
property called sentiment in it. Although, rating is a form of summary for the text, the
real essence of the sentiment is contained in the text itself. In our work, we developed a
system that summarizes the sentiments of different users towards a particular topic from
multiple blog posts. In this work, we view the problem of sentiment summarization as a
two-stage classication problem at sentence level. First, we estimate the subjectivity and
then estimate the polarity of each sentence.
55
CHAPTER 5. SENTIMENT SUMMARIZATION
Most of the existing work in multi-document sentiment summarization focused on gen-
erating aspect based summary of a product. Aspect based summarization followed two
steps:
1. Extract product feature-opinion associations from sentences.
2. Prune them to generate the summary.
Hu and Liu in [43] used POS tagger to extract product features. They assumed that product
features are nouns and noun phrases and extracted them using POS tagger. Then they used
frequent item set mining to prune the product features. They classied a sentence as an
opinion or fact based on these product features. If a sentence has more than two features,
it is likely to contain the sentiment of the user. They determined the polarity orientation of
a sentence using a manual seed list of opinion bearing words. They produced a summary
for each feature of a product providing evidences in the form of opinion sentences from
reviews. Remember, the feature of a product is different from the n-gram features discussed
earlier. Researchers followed the above methodology with different ways of extracting
feature-opinion associations from customer reviews until 2008 [111, 112, 30].
With the introduction of opinion summarization track in Text Analysis Conference
(TAC) 2008
1
, extract based opinion summarization gained popularity. The track focused
on query based opinion summarization of blog posts rather than customer reviews and re-
searchers developed systems and evaluated them using the TAC data as in [46, 9].
Task Denition: TAC 2008 Opinion Summarization task is dened as the automatic gen-
eration of well-organized, uent summaries of opinions about specied targets, as found in
a set of blog documents. Each summary has to address a set of complex questions about
the target, where the question cannot be answered simply with a named entity (or even
a list of named entities). The input to the summarization task comprises a target, some
opinion-related questions about the target (see Figure. 5.1) and a set of documents that
contain answers to the questions. The output is a summary for each target.
1
http://www.nist.gov/tac/tracks/2008/index.html
56
Figure 5.1 Sample TAC Queries and Targets
Our summarization system is illustrated in Figure. 5.2. The input to the system is
a query and a set of blog posts (documents) from which the sentiment summary has to
generated. We assume that each query has a polarity orientation and predict the orientation
as positive/negative that will be used as a lter. For example, the query What features do
people like about vista? expects the positive comments that writers expressed on product
Windows Vista to be returned in the summary. We do not look into complex queries but
instead focused on simple queries that have either positive or negative orientation as in TAC
dataset.
5.2 Classication Based Approach
We view the problem of summarizing sentiments as a two-stage classication problem at
sentence level. We split each document in the document set into sentences and predict
whether a sentence is an opinion or fact. Later, we determine the polarity of opinionated
sentences returned by the above method on a binary scale as positive or negative.
57
Figure 5.2 Architecture of our sentiment summarization system
58
5.2.1 Training the Classier
For training the opinion/fact classier, we used a set of 10,000 sentences that have equal
number of sentences labeled as opinions and facts. For training the polarity classier, we
crawled about 1,28,000 reviews on various topics rated manually on a scale of one to ve.
We used this as the training set for classifying each opinionated sentence as positive or
negative. We tagged each sentence in a review as positive or negative based on the rating
given at the end of each review. Reviews with rating of four and ve are considered to
be positive and others as negative. We used rainbow text classier implemented in [65]
and built classication models using it. It has several in built methods like naive-bayes,
Knn, TFIDF, Probabilistic indexing etc. We trained the classier using unigrams and word
association as features with probabilistic indexing [8] as the method. Probabilistic indexing
method performed better compared to other methods on the training data.
Word association is a simple variant of bigram, it has nothing related to association rule
mining. We tokenize each sentence in opinion/fact and polarity training data into words
and associate each token with all other tokens in the sentence. The motivation behind this
approach is that the characteristic of opinion or polarity of a sentence is not determined by
a single token; rather it is the combination of tokens that determines it. We limit ourselves
to text units of maximum size two while training the classier.
5.2.2 Polarity Estimation
We dene the metric polarity estimation (PE) that estimates the polarity score of a sentence
for a particular orientation. The orientation of query is used as a lter. If the query focuses
on positive aspects of a product, then we estimate polarity for sentences that are labeled as
positive by the polarity classier and vice versa. We use the scores returned by the rainbow
classier to compute PE of a sentence for the query orientation as shown in the eqn. 5.1.
The polarity orientation of the query is also determined using the polarity classier. The
59
smoothing parameters in the eqn. 5.1 are intuitively set to 0.3 and 0.7 respectively.
PE(S|C) = 0.3XP
PI
(S|O) + 0.7XP
PI
(S|C) (5.1)
where PE(S|C) denote the polarity estimate of an opinion sentence S, P
PI
(S|O) denotes
the probability of sentence being an opinion and P
PI
(S|C) denotes the probability of a sen-
tence belonging to class C returned by the opinion/fact and polarity classier respectively
and C can be positive or negative depending on the query.
5.2.3 Final Ranking
In addition to polarity estimate metric, we rank sentences using two other metrics query
dependent (QD) and query independent (QI) metric. Query dependent metric boosts the
sentences that are more relevant to the query as described in [75]. Query independent
metric picks most informative sentence using relevance based language modeling [47]. QI
metric uses KL divergence [56] for estimating the importance of a sentence by observing its
likelihood in relevant and irrelevant distribution respectively. The nal score of a sentence
is a linear combination of the above three metrics and is shown in eqn. 5.2.
FS(S) =
1
QI(S) +
2
QD(S) +
2
PE(S|C) (5.2)
where QI(S),QD(S) and PE(S|C) are the query independent, query dependent and po-
larity scores of sentence S respectively.
1
,
2
and
3
are the smoothing parameters for
each metric.
5.3 Experiments
5.3.1 Dataset
We evaluated our approach to summarize sentiments on TAC 2008 opinion summarization
task dataset. The dataset have 25 topics and each topic has one or two of squishy list
60
questions and a set of documents (blog posts) where the answers are likely to be found. A
descriptive answer is expected for a squishy list question. In this task, we have to preserve
an extra property called sentiment in the summary to the maximum extent. The questions
focused on either positive or negative aspects of a topic.
5.3.2 Evaluation Metrics
We evaluated our system using Nugget Judgements provided for each topic in TAC.
Each judgement had a nugget score or weight that would be used to judge the quality of
the summary. The nugget judgement with maximum weight was considered to be most
relevant. Judgements were provided only for 22 topics out of 25, hence, we evaluated
our system using the 22 judgements only. We used the evaluation metrics Nugget Recall
(NR), Nugget Precision (NP) and F-Measure conrming to standard TAC practices. Those
sentences that have an overlap threshold of at least 40% are considered to be redundant and
subsequently discarded.
Nugget Recall (NR) :
sum of weights of nuggets returned in the summary
sum of weights of all nuggets related to the topic
(5.3)
Nugget Precision (NP) : Allowance/Length (5.4)
where Allowance =100 X number of nuggets returned in the summary and Length =
number of non-white space characters in the summary.
F Measure = (1 +
2
)X
NP.NR
2
.NP + NR
; with = 1 (5.5)
5.3.3 Results
We chose the values of
1
,
2
and
3
in eqn. 5.2 as 0.35, 0.25 and 0.4 respectively after
some manual tuning of weights. The values presented in the Table 5.1 are the average
values over 22 topics. Average F-measure score obtained using our approach is better than
many of the systems submitted to TAC 2008. Out of the thirty six runs submitted to the
task only nine runs performed better than us with the best being 0.489.
61
Table 5.1 Results showing average NR, NP and F-Measure values for 22 topics
Evaluation Metric Avg. Score
NR 0.287
NP 0.164
F-Measure 0.209
5.4 Conclusion
In this work, we were able to present a general overview of a sentiment summarization sys-
tem and how sentiment classication helps in summarizing sentiments. Our summarization
system focused on extract based summaries unlike previous systems that are aspect based.
We built two classiers that classify each sentence in a document as an opinion or fact
and positive or negative respectively using unigrams and word association as features. We
estimated the polarity of a sentence using the classier scores and combined it with QI and
QD in the nal scoring of a sentence. We also took care of redundancy while generating
summary based on overlap threshold. Sentiment summarization particularly extract based,
is still at early stages of research and we believe our approach is a right step in the future
direction to explore more novel methods.
62
Chapter 6
Conclusion
Sentiment classication could be treated as a special case of topical classication applied
to subjective portions of a document. In this thesis, we discussed the problem of document
sentiment classication and subjective feature extraction, the key component in it. We dis-
cussed the challenges in extracting subjectivity, existing approaches and their limitations.
Though, there were many techniques proposed in the last decade for extracting subjective
features from a document, there are still many open problems that needs to be addressed.
Most of the proposed methods relied heavily on linguistic resources like sentiment lexicons
and complex patterns based on POS information, making the task more resource dependent
and complex. It requires a lot of human effort to develop such tools to analyse sentiments
in various domains and languages. Hence, extending these resource dependent approaches
to various domains, languages is not a feasible solution. This motivated us to conduct re-
search on methodologies that require minimum use of linguistic resources and yet achieve
comparable or better results.
Also, most of the existing sentiment analysis systems predict the polarity on a binary
scale. However, in real world applications, expressing the sentiment is too complex and
it cant be simple binary. Hence, we conducted experiments to predict the sentiment on
a multi-variant scale of one to ve popularly known as starred rating on the web. We
adopted a ltering methodology to derive subjective features and used supervised learning
63
CHAPTER 6. CONCLUSION
approaches to analyse the overall sentiment of a document. We proposed a method called
RSUMM in combination with well known feature selection techniques to extract subjective
features. Our RSUMM was based on information retrieval models. Techniques similar to
vector space model, unigram language model and term co-occurrence model were used to
estimate subjectivity of each sentence.
6.1 Contributions
Current day approaches in sentiment analysis lie at the crossroads of NLP and IR, where
subjective feature extraction was highly dominated by linguistic resources. In this thesis,
we attempted to move away from using language resources and investigated approaches
that make the task of subjective feature extraction resource independent. It was the major
contribution of this thesis. We approached the problem by following a two step ltering
methodology and did experiments to predict the sentiment in customer reviews. The basis
for this methodology was our analysis on the web manually, where we had seen many
reviews with less subjective content compared to the total content.
We proposed a method called RSUMM to extract subjective sentences from a docu-
ment. We estimated subjectivity of each sentence using three variants of RSUMM;
RSUMM
LS
, RSUMM
PE
and RSUMM
CO
. All the variants of RSUMM were based
on information retrieval models. We used techniques similar to vector space model,
unigram language model and term co-occurrence model to estimate subjectivity. We
obtained an extract of each review retaining the most subjective sentences from the
document in it. We used the subjective extract to predict the sentiment orientation
rather than full review.
We used n-gram models to convert a sentence into a feature vector for classication.
As n-gram modeling was done on sentences, there could be lot of irrelevant features
that needs to be ltered. We used two state-of-the-art feature selection methods mu-
64
tual information and sher discriminant ratio to remove them.
Logistic regression (LR) method was widely used in patent information retrieval and
also in applications where human preferences play a major role like grading a student.
It was used for the rst time in sentiment classication to the best of our knowledge
for calibrating the customer satisfaction. But, we did nt go into internals of this
method rather focused on deriving features from the document and presenting them
as feature vectors.
We also worked on an application of sentiment classication, sentiment summariza-
tion. We summarized the sentiment in multiple blog posts related to a topic following
a classication based approach. We adopted a two-stage classication procedure to
summarize blog posts at sentence level. A sentence was estimated for its subjectiv-
ity and polarity based on the classier scores. We used a linear combination of the
scores for nal ranking.
We conducted experiments on standard datasets used by many researchers in sentiment
classication and summarization. The classication datasets were form hotel and movie
review domains. The dataset we used for evaluating our summarization system contained
blog posts on different topics. We evaluated our methodology conrming to standard evalu-
ation metrics in both classication and summarization. We used accuracy, mean-absolute-
error (both micro and macro versions of it) and reported results. Precision, recall and
F-measure were used as metrics for evaluating opinion summaries.
We reported the results of experiments on sentiment classication and summarization
in Chapter 4 and Chapter 5 respectively. We were able to achieve good accuracy values
while classifying sentiments on both binary and multi-variant scales. Our results were on
par or better to the state-of-the-art approaches that used linguistic resources for extracting
subjective features. We evaluated our summarization system on TAC 2008 blog dataset and
we were able to obtain good performance using our classication methodology than many
systems participated in TAC.
65
6.2 Applications
In this section, we discuss some real world applications of sentiment classication.
6.2.1 Products comparison
Most of the people are using web to recommend a product or not. Online merchants are
asking customers to review their products and also very curious on their judgements. Re-
searchers are also focusing on classifying views of the people on a product as recommended
or not automatically [72, 18, 89]. A product has several aspects on which people comment
on and probably have short comings in one aspect with merits on another [66, 86]. To ana-
lyze these sentiments in the text and coming up with a comparison of customers opinion on
different products with a mere single glance (rating) can really facilitate a better informa-
tion access for merchants and others. The comparison of products on the web can enable
people to easily gather marketing intelligence and product benchmarking information.
Liu et al. in [61] proposed a novel framework to analyse and compare customer opin-
ions on several competing products. A prototype system called Opinion Observer was
implemented. The process involved two steps: 1) Identifying product features or aspects
that users have commented up on. 2) For each feature extracted, identify the semantic ori-
entation of the sentiment. They presented the comparison output in the form of a visual
summary for each feature (aspect) of the product.
6.2.2 Sentiment summarization
The number of reviews that a product receives is increasing rapidly on the web. Popular
products are often commented by the people. And, some of the reviews are long, contain
less opinion information and redundant. This makes it hard for a potential customer and
also product manufactures to make an informed decision. Sentiment summarization sum-
marizes the opinions by predicting the polarity of the sentiment in the text, quantifying
66
the sentiment and relation between entities [55, 74]. A customer or a manufacturer gets
a complete overview of what different people are saying about a particular product with a
sentiment summary. We conducted experiments on this application of sentiment classica-
tion.
6.2.3 Opinion reason mining
Opinion reason mining is another area where sentiment classication can be applied. In this
area of research, people do a critical in-depth analysis of opinion-assessment. For example,
What are the reasons for the popularity of Windows 7?. For such type of queries, simply
giving some 150 reviews on Windows 7 that are positive and some 50 reviews that have
negative polarity is not sufcient. Reasons such as The product is popular for its look and
feel, and bootable time. convey a in-depth assessment for the customer. In this application,
sentiment classication is used to come up with a general overview of pros and cons of a
product, and also the exact reasons for them.
6.2.4 Other Applications
Online message sentiment ltering, sentiment web search engine, E-mail sentiment detec-
tion and web blog author sentiment prediction are other applications of sentiment classi-
cation.
6.3 Future Directions
Our approach can be considered as a building block for investigating subjective feature
extraction methods that require minimum use of linguistic resources. In this thesis, we ex-
plored two simple metrics( ADF, ASM) and methods based on information retrieval models
(probabilistic estimate, term co-occurrence) for estimating subjectivity. In future, one can
explore on more novel metrics and models in subjective feature extraction and conduct ex-
67
periments. We employed two state-of-the-art feature selection methods but did nt explore
more on other feature selection methods. This area can be treated as one of the possible
future directions to explore particularly in multi-variate classication.
From the results we reported in the above chapters, we believe that our methodology is
a right step in the direction of investigating subjective feature extraction approaches that use
statistical means. But, due to unavailability of standard annotated datasets in different lan-
guages and of large size, we conducted experiments on datasets used by many researchers
in sentiment classication for comparing our results. Based on the accuracy values we have
obtained, we are fairly condent that our approach reduced the resource dependency prob-
lem in subjective feature extraction. In future, one can extend this methodology to conduct
experiments on analysing sentiments in regional languages and also very large datasets.
We followed a naive classication based approach at sentence level for summarizing
opinions in blog posts. In sentiment summarization, aspect based summarization is highly
popular compared to extract based summarization. Extract based summarization is gaining
popularity now a days. We focused on appreciating the need for a sentiment summarization
system and the difference between normal text summarization and sentiment summariza-
tion. We also focused on establishing the relation between sentiment classication and
summarization. Our methodology can be further improved in future to include more novel
techniques for summarizing sentiments.
68
Bibliography
[1] A. Andreevskaia and S. Bergler. Mining wordnet for a fuzzy sentiment: Sentiment
tag extraction from wordnet glosses. In proceedings of EACL., 2006.
[2] S. Argamon, M. Koppel, and G. Avneri. Routing documents according to style.
In proceedings of rst international workshop on innovative information systems.,
1998.
[3] A. Aue and M. Gamon. Customizing sentiment classiers to new domains: A case
study. In proceedings of RANLP., 2005.
[4] B. Avrim and S. Chawla. Learning from labeled and unlabeled data using graph
mincuts. In proceedings of 18th ICML., pages 1926, 2001.
[5] S. Baccianella, A. Esuli, and F. Sebastini. Multi-facet rating of product reviews. In
proceedings of European Conference on Infromation Retrieval, ECIR., pages 461
472, 2009.
[6] Ricardo Baeza-Yates and Berthier Riberio-Neto. Modern Information Retrieval.
Addison-Wesley Longman Publishing Co., 2002.
[7] P. Beineke, T. Hastie, and S. Vaithayanathan. The sentimental factor: Improving
review classication via human-provided information. In proceedings of 42nd ACL.,
2004.
[8] A. Bookstein and D.R. Swanson. Probabilistic methods for automatic indexing.
Journal of ASIS., Vol. 25:312319, 1974.
[9] A. Bossard, M. Genereux, and T. Poibeau. Cbseas, a summarization system, inte-
gration of opinion mining techniques to summarize blogs. In proceedings of EACL.,
pages 58, 2009.
[10] E. Brill. Transformation based error-driven learning and natural language process-
ing. Computational Linguistics., Vol. 21:pp. 543565, 1995.
[11] A. Budanitsky and G. Hirst. Semantic distance in wordnet: An experimental
application-oriented evaluation of ve measures. In proceedings of NAACL work-
shop on WordNet and other lexical resources., 2001.
69
BIBLIOGRAPHY
[12] J.A. Bullinaria. Semantic categorization using simple word co-occurrence statistics.
In proceedings of ESSLLI workshop on Distributional Lexical Semantics., pages 1
8, 2008.
[13] Chih-Chung Chang and Chih-Jen Lin. LIBSVM: a library for support vector ma-
chines, 2001. Software available at http://www.csie.ntu.edu.tw/
cjlin/libsvm.
[14] P. Chaovalit and L. Zhou. Movie review mining: A comparison between supervised
and unsupervised classication approaches. In proceedings of 38th Hawaii interna-
tional conference on system sciences., pages 19, 2005.
[15] E. Charlotta. Topic dependence in sentiment classication., 2004.
[16] H. Cui, V. Mittal, and M. Datar. Comparative experiments on sentiment classication
for online product reviews. In proceedings of American Association for Articial
Intelligence, AAAI., pages 12651270, 2006.
[17] J.R. Curran. Ensemble methods for automatic thesaurus extraction. In proceedings
of EMNLP., pages 222229, 2002.
[18] S.R. Das and M. Chen. Yahoo! for amazon: Sentiment parsing from small talk on
the web. In proceedings of 8th Asia Pacic nance association annual conference.,
2001.
[19] K. Dave, S. Lawrence, and D. Pennock. Mining the peanut gallery: Opinion ex-
traction and semantic classication of product reviews. In proceedings of WWW.,
2003.
[20] A. Devitt and C. Vogel. The topology of wordnet: Some metrics. In proceedings of
global WordNet conference, GWC., 2004.
[21] M. Dimitrova, A. Finn, N. Kushmeric, and B. Smyth. Web genre visualization. In
proceedings of conference on human factors in computing systems., 2002.
[22] S.D. Durbin, J. Neal Richter, and D. Warner. A system for effective rating of texts.
In proceedings of OTC-3, workshop on operational text classication., 2003.
[23] P. Edmonds. Semantic representations of near-synonyms for automatic lexical
choice. PhD thesis, University of Toronto, 1999.
[24] P. Edmonds and G. Hirst. Near-synomymy and lexical choice. Computational Lin-
guistics, Vol. 28:pp. 105144, 2002.
[25] M. Efron. Cultural orientation: Classifying subjective documents by cociation anal-
ysis. In proceedings of AAAI fall symposium on style and meaning in language.,
pages 4148, 2004.
70
BIBLIOGRAPHY
[26] A. Esuli and F. Sebastiani. Determining the semantic orientation of terms through
gloss classication. In proceedings of CIKM., 2005.
[27] A. Esuli and F. Sebastini. Determining the term subjectivity and term orientation for
opinion mining. In proceedings of 11th conference of the european chapter of the
association for computational linguistics, EACL., 2006.
[28] A. Esuli and F. Sebastini. Sentiwordnet: A publicly available lexical source for
opinion mining. In proceedings of LREC 2006., 2006.
[29] Z. Fei, J. Liu, and G. Wu. Sentiment classication using phrase patterns. In pro-
ceedings of 4th international conference on computer and information technology,
CIT., 2004.
[30] O. Feiguina and G. Lapalme. Query based summarization of customer reviews. In
proceedings of Canadian AI., pages 452463, 2007.
[31] C. Fellbaum. WordNet: An electronic lexical database. MIT Press, 1998.
[32] R.M. French and C. Labiouse. Four problems with extracting human semantics from
large text corpora. In proceedings of 24th annual conference of the cognitive science
society., pages 316322, 2002.
[33] M. Gamon. Sentiment classication on customer feedback data: Noisy data, large
feature vectors and the role of linguistic analysis. In proceedings of 20th interna-
tional conference on computational linguistics., pages 841847, 2004.
[34] M. Gamon and A. Aue. Automatic identication of sentiment vocabulary exploiting
low association with known sentiment terms. In proceedings of ACL workshop on
feature engineering in machine learning in NLP., 2005.
[35] N.S. Glance, M. Hurst, and T. Tomokiyo. Blog pulse: Automatic trend discovery
for weblogs. In proceedings of WWW workshop on the weblogging ecosystem: Ag-
gregation, analysis and dynamics., 2004.
[36] G. Grefenstette. Explorations in automatic thesaurus discovery. Kluwer Academic
Press, 1994.
[37] G. Grefenstette, Y. Qu, J.G. Shanahan, and D.A. Evans. Coupling niche browsers
and affect analysis for an opinion mining application. In proceedings of RIAO-04.,
pages 186194, 2004.
[38] U. Gretzel and K.Y. Yoo. Use and impact of online travel review. In proceedings of
the 2008 International Conference on Information and Communication Technology.,
pages 3546, 2008.
[39] Jiawei Han and Micheline Kamber. Data Mining: Concepts and Techniques. Morgan
Kauffman, 2001.
71
BIBLIOGRAPHY
[40] V. Hatzivassiloglou and K.R. McKeown. Predicting the semantic orientation of ad-
jectives. In proceedings of 35th ACL., 1997.
[41] V. Hatzivassiloglou and J. Wiebe. Effects of adjective orientation and gradability on
sentence subjectivity. In proceedings of 18th international conference on computa-
tional linguistics., 2000.
[42] D. Hillard, M. Ostendorf, and E. Shriberg. Detection of agreement vs disagreement
in meetings: Training with unlabeled data. In proceedings of HLT/NAACL., 2004.
[43] M. Hu and B. Liu. Mining and summarizing customer reviews. In proceedings of
SIGKDD., 2004.
[44] M. Hu and B. Liu. Mining opinion features in customer reviews. In proceedings of
AAAI., pages 755760, 2004.
[45] D.J. Inkpen, O. Feiguina, and G. Hirst. Generating more positive and more negative
text. Computing attitude and affect in text: Theory and applications. The information
retrieval series., Vol. 20:pp. 187196, 2004.
[46] G.C. Jack, L.L. Jochen, F. Schilder, and K. Ravi. Query-based opinion summariza-
tion for legal blog entries. In proceedings of ICAIL., 2009.
[47] J. Jagadeesh, P. Prasad, and Vasudeva Varma. A relevance-based language modeling
approach to duc 2005. In working notes of DUC., 2005.
[48] J. Justeson and K. Slava. Technical terminology: some linguistic properties and an
algorithm for identication in text. Natural Language Engineering., Vol. 1:pp. 927,
1993.
[49] J. Kamps, M. Marx, R.J. Mokken, and M. de Rijke. Using wordnet to measure se-
mantic orientation of adjectives. In proceedings of LREC., pages 11151118, 2004.
[50] A. Kennedy and D. Inkpen. Sentiment classication of movie reviews using contex-
tual valence shifters. Computational Intelligence, Vol. 22:pp. 110125, 2006.
[51] B. Kessler, G. Nunberg, and H. Schautze. Automatic detection of text genre. In
proceedings of 35th ACL., pages 3238, 1997.
[52] S.-M. Kim and E. Hovy. Determining the sentiment of opinions. In proceedings of
COLING., pages 13631373, 2004.
[53] S-M. Kim and E. Hovy. Automatic detection of opinion bearing words and sen-
tences. In proceedings of IJCNLP., 2005.
[54] N. Kobayashi, T. Inui, and K. Inui. Dictionary based acquisition of the lexical knowl-
edge for p/n analysis (in japanese). In proceedings of Japanese society for articial
intelligence., pages 4550, 2001.
72
BIBLIOGRAPHY
[55] W. Ku, i, L-Y. Lee, T. Wu, and H-H. Chen. Major topic detection and its application
to opinion summarization. In proceedings of SIGIR., pages 627628, 2005.
[56] S. Kullback and R.A. Leibler. On information and sufciency. Annals of Mathemat-
ical Statistics., Vol. 22, 1951.
[57] J. Laffetry, A. McCallum, and F. Pereira. Conditional random elds: Probabilistic
models for segmenting and labeling of sequence data. In proceedings of ICML.,
2001.
[58] Tjen-Sien Lim, Wei-Yin Loh, and Yu-Shan Shih. A comparison of prediction accu-
racy, complexity, and training time of thirty-three old and new classication algo-
rithms. In Machine Learning., pages 203228, 2000.
[59] D. Lin. Automatic retrieval and clustering of similar words. In proceedings of
COLING-ACL., pages 768774, 1998.
[60] B. Liu. Sentiment analysis and subjectivity. Handbook of Natural Language Pro-
cessing, 2010.
[61] B. Liu, M. Hu, and J. Cheng. Opinion observer: Analyzing and summarizing opin-
ions on the web. In proceedings of WWW., pages 1014, 2005.
[62] H. Liu, H. Lieberman, and T. Selker. A model of textual affect sensing using real-
world knowledge. In proceedings of 8th international conference on intelligent user
interfaces., pages 125132, 2003.
[63] R. Losee. Natural language processing in support of decision-making. phrases and
part-of-speech tagging. Information processing and management., Vol. 37:pp. 769
787, 2001.
[64] S. Matsumoto, H. Takamura, and M. Okumura. Sentiment classication using word
sub-sequences and dependency sub-tress. In proceedings of Pacic Asia Conference
on Knowledge Discovery and Data Management, PAKDD., pages 301311, 2005.
[65] Andrew Kachites McCallum. Bow: A toolkit for statistical language modeling, text
retrieval, classication and clustering. http://www.cs.cmu.edu/ mccallum/bow, 1996.
[66] S. Morinaga, K. Yamanishi, K. Tateishi, and T. Fukushima. Mining product reputa-
tions on the web. In proceedings of ACM SIGKDD., pages 341349, 2002.
[67] T. Mullen and N. Collier. Sentiment analysis using support vector machines using
diverse information scores. In proceedings of EMNLP., pages 412418, 2004.
[68] K. Nigam, A. McCallum, and S. Thrun. Text classication from labeled and unla-
beled documents using em. Machine Learning., Vol. 39:pp. 103134, 2000.
73
BIBLIOGRAPHY
[69] B. Pang and L. Lee. A sentimental education: Sentiment analysis using subjectivity
summarization based on minimum cuts. In proceedings of Association for Compu-
tational Linguistics, ACL., page 271278, 2004.
[70] B. Pang and L. Lee. Seeing stars: Exploiting class relationships for sentiment cate-
gorization with respect to rating scales. In proceedings of 43rd ACL., pages 115124,
2005.
[71] B. Pang and L. Lee. Opinion mining and sentiment analysis. Foundations and Trends
in Information Retrieval, Vol. 2:pp. 1135, 2008.
[72] B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up? sentiment classication us-
ing machine learning techniques. In proceedings of Association for Computational
Linguistics, ACL., pages 7986, 2002.
[73] F.C.N. Pereira, N. Tishby, and L. Lee. Distributional clustering of english words. In
proceedings of ACL., pages 183190, 1993.
[74] B. Philip, T. Hastie, M. Christopher, and V. Shivakumar. Exploring sentiment sum-
marization. In proceedings of AAAI symposium on exploring attitude and effect in
text., 2004.
[75] P. Prasad, K. Rahul, and Vasudeva Varma. Iiit hyderbad at duc07. In working notes
of DUC., 2007.
[76] A. Rabuer and A. Muller-Kogler. Integrating automatic genre analysis into digital
libraries. In proceedings of 1st ACM-IEEE joint conference on digital libraries.,
2001.
[77] R. Rapp. A freely available automatically generated thesaurus of related words. In
proceedings of LREC., 2004.
[78] A. Ratnaparkhi. A maximum entropy model for part-of-speech tagging. In proceed-
ings of EMNLP., pages 133142, 1996.
[79] E. Riloff and J. Wiebe. Learning extraction patterns for subjective expressions. In
proceedings of EMNLP., pages 105112, 2003.
[80] G. Salton, A. Singhal, C. Buckley, and M. Mitra. Automatic text decomposition us-
ing text segments and text themes. In proceedings of ACM conference on Hypertext.,
1996.
[81] F. Salvetti, S. Lewis, and C. Reichenbach. Automatic opinion polarity classication
of movie reviews. Colorado research in linguists., Vol. 17, 2003.
[82] H. Schmid. Probabilistic part-of-speech tagging using decision trees. In proceedings
of international conference on new methods in language processing., 1994.
74
BIBLIOGRAPHY
[83] E. Spertus. Automatic recognition of hostile messages. In proceedings of IAAI.,
1997.
[84] P. Subasic and J. Huettner. Affect analysis of text using fuzzy text typing, fuzzy
systems. IEEE transactions, Vol. 9:pp. 483496, 2001.
[85] M. Taboada, A. Caroline, and V. Kimberly. Creating semantic orientation dictionar-
ies. In proceedings of 5th LREC., 2006.
[86] M. Taboada, M.A. Gillies, and P. McFetridge. Sentiment classication techniques
for tracking literary reputation. In proceedings of LREC Workshop Towards Com-
putational Models of Literary Analysis., pages 3643, 2006.
[87] J. Tait. Automatic Summarizing of English Texts. PhD thesis, University of Cam-
bridge, 1983.
[88] H. Takamura, T. Inui, and M. Okumura. Extracting semantic orientation of words
using spin model. In proceedings of 43rd ACL., 2005.
[89] L. Terveen, W. Hill, B. Amento, and J. Creter. Phoaks. a system for sharing recom-
mendations. In Communications of the ACM., pages 5962, 1997.
[90] T.T. Thet, C-J. Na, and S.G. Christopher Khoo. Sentiment classication of movie
reviews using multiple perspectives. In proceedings of ICADL., pages 184193,
2008.
[91] R.M. Tong. An operational system for detecting and tracking opinions in online
discussion. In proceedings of SIGIR workshop on operational text classication.,
2001.
[92] P.D Turney. Thumbs up or thumps down? semantic orientation applied to unsu-
pervised classication of reviews. In proceedings of Association for Computational
Linguistics, ACL., pages 417424, 2002.
[93] P.D. Turney and M.L. Littman. Unsupervised learning of semantic orientation froma
hundred-billion-word corpus. Technical report: National research council, Canada.,
2002.
[94] P.D. Turney and M.L. Littman. Measuring praise and criticism: Inference of se-
mantic orientation from association. In ACM transactions on information systems.,
pages 315346, 2003.
[95] V. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, New York.,
1995.
[96] S. Vegnaduzzo. Acquisition of subjective with limited resources. In proceedings of
AAAI spring symposium on exploring attitude and affect in text., 2004.
75
BIBLIOGRAPHY
[97] S. Wang, D. Li, Y. Wei, and H. Li. A feature selection method based on shers
discriminant ratio for sentiment classication. In proceedings of WISM., pages 88
97, 2009.
[98] J. Weibe and E. Riloff. Creating subjective and objective sentence classiers from
unannotated texts. In proceedings of 6th international conference on intelligent text
processing and computational linguistics., 2005.
[99] C. Whitelaw, S. Argamon, and N. Garg. Using appraisal taxonomies for sentiment
analysis. In proceedings of rst computational systemic functional grammar confer-
ence., 2005.
[100] C. Whitelaw, N. Garg, and S. Argamon. Using appraisal groups for sentiment anal-
ysis. In proceedings of CIKM., pages 625631, 2005.
[101] J. Wiebe. Learning subjective adjectives from the corpora. In proceedings of AAAI.,
pages 735740, 2000.
[102] J. Wiebe, R. Bruce, and .T. OHara. Development and use of gold standard data for
subjectivity classications. In proceedings of 37th ACL., pages 246253, 1999.
[103] J. Wiebe, R. Bruce, and T. OHara. Development and use of gold standard data for
subjectivity classications. In proceedings of 37th ACL., pages 246253, 1999.
[104] J. Wiebe, T. Wilson, and M. Bell. Identifying collocations for recognizing opinions.
In proceedings of ACL/EACL workshop on collocation., 2001.
[105] J. Wiebe, T. Wilson, and C. Cardie. Annotating expressions of opinions and emo-
tions in language. In proceedings of LREC., 2005.
[106] Y. Wilks and M. Stevenson. The grammar of sense: Using part-of-speech tags as
a rst step in semantic disambiguation. Natural language engineering., Vol. 4:pp.
135144, 1998.
[107] T. Wilson, J. Wiebe, and P. Hoffmann. Recognizing contextual polarity in phrase
level sentiment analysis. In proceedings of HLT/EMNLP., 2005.
[108] Y. Yang and J.O. Pederson. A comparitive study on feature selection methods in text
categorization. In proceedings of ICML., pages 412470, 1997.
[109] H. Yu and V. Hatzivassiloglou. Towards answering opinion questions: Separating
facts from opinions and identifying the polarity of opinion sentences. In proceedings
of EMNLP., pages 129136, 2003.
[110] Z. Zhang and B. Varadarajan. Utility scoring of product reviews. In proceedings of
15th CIKM., pages 5157, 2006.
76
BIBLIOGRAPHY
[111] L. Zhuang, F. Jing, and Y.X. Zhu. Movie review mining and summarization. In
proceedings of CIKM., 2006.
[112] L. Zhuang, F. Jing, and Y.X. Zhu. A joint model of text and aspect ratings for
sentiment summarization. In proceedings of ACL., pages 308316, 2008.
77

Automatic Classification and Summarization

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Automatic Classification and Summarization

Enviado por

Direitos autorais:

Formatos disponíveis

AUTOMATIC CLASSIFICATION AND SUMMARIZATION

value of 0.431 very much

using our approach. The highest MAE

value was reported using RSUMM

value using unigrams as features on total review was 0.580. There

evaluation perspective. Since, the dataset

Você também pode gostar