The document presents an approach for extracting aspects and associated sentiments from user feedback data using a rule-based approach. It involves extracting aspects from sentences, associating sentiment terms to the aspects using SentiWordNet, and classifying sentiments according to linguistic rules. The approach uses part-of-speech tagging and WordNet to identify aspects and group related ones. Sentiment scores are normalized to account for intensifiers, negations, and ambiguity. The approach was tested on 65,000 responses from a hospital survey to extract and classify aspects and sentiments at the sentence level.
1. An Approach to Extract Aspects and Sentence Level Sentiment From User Feedback
Using a Rule Based Approach in conjunction with SentiWordNet and POS-Tagging
Aaruna G {aaruna.g@imaginea.com}, Ramachandra Kousik A.S {kousik.r@imaginea.com}
Imaginea Technologies, a BU of Pramati Technologies.
Abstract
The most integral part of our work is to extract Aspects from User Feedback and associate Sentiment
and Opinion terms to them. The dataset we have at our disposal to work upon, is a set of feedback
documents for various departments in a Hospital in XML format which have comments represented in
tags. It contains about 65000 responses to a survey taken in a Hospital. Every response or comment is
treated as a sentence or a set of them. We perform a sentence level aspect and sentiment extraction
and we attempt to understand and mine User Feedback data to gather aspects from it. Further to it,
we extract the sentiment mentions and evaluate them contextually for sentiment and associate those
sentiment mentions with the corresponding aspects. To start with, we perform a clean up on the User
Feedback data, followed by aspect extraction and sentiment polarity calculation, with the help of POS
tagging and SentiWordNet[1] filters respectively. The obtained sentiments are further classified
according to a set of Linguistic rules and the scores are normalized to nullify any noise that might be
present. We lay emphasis on using a rule based approach; rules being Linguistic rules that correspond
to the positioning of various parts-of-speech words in a sentence.
Keywords : Aspect Mining, Opinion Mining, Sentiment Analysis, Polarity Classification.
1. Introduction:
The primary focus area of our work is on Aspect Extraction and grouping. Aspects form an important
part of any classification and they essentially also define the context in which a certain opinion or a
response is expressed. We perform grouping on aspects in order to achieve closeness i.e. to put aspects
that are linguistically related to each other together in a common bucket. Our work also focuses on
extracting relevant sentiment for an aspect. We perform this analysis at the sentence level using a rule
based approach, where the rules are English language rules.
Recently there has been a change of attitude in the field from plainly extracting positive or negative
opinions to introducing opinion weights and classifying opinions as neutral. Therefore, it is not
anymore focused on the binary classification of positive or negative as referred to in [3] by Tuney.
Corpus-based methods work by utilizing dictionary-based approaches and these approaches depend on
existing lexicographical resources (such as WordNet) to provide semantic data in regards to individual
senses and words [4] . We lay emphasis on Language rules rather than just a look-up from sources like
WordNet or SentiWordNet because we base our work on the fact that the meaning of the word is
relevant only in a context and the presence of other words along with a particular word in an
expression changes the intensity of the whole expression like an adjective or an adverb intensifying or
further deprecating the intensity of a noun or a verb etc. This is also the distinguishing factor between
2. text mining and Information Retrieval (IR), where the latter is only information access while the
former involves pruning the complete text data, as also argued by Hearst in [2] . In contrast to the
other works our work presents sentence level lexical/dictionary knowledge base method to tackle the
domain adaptability problem for different types data as shown by Khan et. al in [5].
The dataset used for our work is a set of documents that are a response to a survey conducted in a
Hospital. And these responses have been categorized with respect to the department. Overall, we have
conducted our study on about 65.000 responses. We did not chose to do any subjectivity analysis
because on a Feedback form or on a 'Post your comment' section the number of objective expressions
are negligible and wouldn't constitute to any significant noise. One of the main reasons why we have
limited it to Sentence Level Classification is because an aspect that appears in a response
predominantly contributes to that response alone but not to the whole document. Hence there is no
need for document-level(paragraph-level) analysis, should document-level analysis be done, the simple
aggregation of all the sentence level results would be pretty accurate. We as well wanted to remain as
domain independent as possible which could only be achieved by sentence level classification. We
analyze the performance of our approach by comparing our results (shown in Section 4) against the
manually annotated results.
Section 2 presents the approach we followed to solve the problem of aspect extraction and sentiment
association and classification. Section 3 will detail our score calculation metrics along with the pseudo-
code followed by performance analysis in Section 4 and conclusion and future work in Section 5, with
references in Section 6.
2. Our Approach
Each response is about an aspect or a group of aspects and these aspects form the theme of a comment
or a response. As the comments or responses were obtained through a survey, it is intuitive that a
certain aspect would appear in a response only when the user who has written that response thinks
that particular aspect is deemed fit. And every aspect that appears in a sentence has a part to play in
the overall sentence's expression quotient. Keeping this in mind, certain metrics like the TF-IDF were
ruled out because a) A response is not as elaborate as a document and b) When two aspects appear in
a response, we treat them both equally important to that response and we completely rule out the
concept of relative importance (where TF-IDF would've come in handy) as a response is typically not
more than a couple of sentences long and the chances of an aspect appearing multiple times in such a
response are negligible. We lay emphasis on sentence level classification to be able to increase the
efficiency of the model and to keep it as generic as possible.
Aspects that are one hop neighbors from each other in a dictionary more or less mean the same thing.
We use WordNet[8] to gather one hop neighbors for an aspect. The co-occurrence of aspects in a
linguistic sense implies that two aspects could mean the same thing but in no way suggests that they
should always co-occur in that context. We only use these to group aspects together which will help us
in filtering out the redundant aspects while we are focused on calculating top aspects.
3. We follow a four-step process to accomplish our objective.
1) Extracting the entities from the corpus/text-base that are potential sentiment holders and are
objects for potential sentiment - which we term as Aspects.
2) Filtering out noise by separating stop words and other irrelevant terms in a User comment
using a Parts of Speech Tagger in [6].
3) Associating respective sentiment terms to the corresponding Aspect.
4) Assigning normalized sentiment scores to a feature/entity to keep the output unaffected from
changes in the algorithm or the weights we assign through our rules.
Another case in point in our work is the user-profile. It is not imperative that everyone has to record
their response in adherence to correct grammar and the language conjugate. So we have chosen our
rules (explained in section 3) in way that we only consider rules that are more generic than extremely
specific language rules. For instance, there is a rule to be applied in case of intensifiers being present
('Very' good) but we ignore the usage of exclamation marks and emoticons (! or :D :P etc) respectively
because of the tendency to use them at will and sometimes rather arbitrarily.
Our approach essentially consists of two agents and these agents operate serially -:
1) The Aspect Extraction Agent (AEA)
2) The Sentiment Association Agent (SAA)
Before AEA takes over, we remove noise from the user feedback. By that we mean, we remove all the
special symbols, stop words like {a, hmm, is, yea etc} and blank spaces and we feed a filtered dataset
to the AEA.
The Aspect Extraction Agent : The Aspect Extraction Agent (AEA) makes use of a POS tagger [6] to
separate the subjects from the opinion part of the comment. The aspects form the subjective entities
about which a respective sentiment is expressed. The AEA also filters out the noise by not ranking the
special symbols, if any left out or the obvious objective features. The assumption is that, on a
feedback form the number of comments are predominantly subjective. The POS tagger also helps us
identify the opinion words present in the sentence. Intuitively it constructs a {key:value} pair where
Key is the aspect or the context of the expression or a set of aspects/contexts of an expression. The
value is the list of opinion words related to that aspect i.e. those used to express that particular
aspect. And the input is further supplied to the SAA. The Aspects are further grouped to remove the
redundant aspects and to re-adjust the weights of the aspects in the context of the department, which
will further aid us in prioritizing the aspects for a department.
The Sentiment Association Agent : The sentiment association agent receives a set of {Key: Value}
pairs from the FEA. It then makes use of SentiWordNet, detailed in [1] to compute the opinion score
of each word and produces an aggregate opinion score for that particular feature. The scores that are
thus outputted are collected and are pruned to a set of language rules such as the presence of
intensifiers or otherwise and also the presence of negations, adverbs and adjectives to obtain the
4. sentiment score for that particular feature. The sentiment score thus obtained has three components
namely, Positive, Negative and Neutral. It is assumed with considerable conviction that almost every
positive opinion term, phrase or entity will have some sort of a negative and neutral score when used
in different senses and vice-versa.
Eg: - Its incredible. I absolutely love it. [incredible is positive here]
Ah, what incredibly awful stuff that is! [incredible is negative here]
The reason for normalizing is that any rescaling of an input vector can be effectively undone by
changing the corresponding weights and biases, leaving us with the exact same outputs as we had
before. However, there are a variety of practical reasons why standardizing the inputs can make
training faster and reduce the chances of getting stuck in local optima. Also, weight decay and
Bayesian estimation can be done more conveniently with standardized inputs. We can always tell the
system by how much the value has changed since the previous input. Also the sum of respective scores
of opinion words in SentiWordNet converge to 1. So it makes a case in point to normalize our
aggregate of opinion word scores to converge to 1 to ensure consistency without regard to change in
the input vector scales. The score calculation metrics and pseudo code are further detailed in Section
3.
However, there is a catch with the way sentiment scores are organized on SentiWordNet. Every term
in the SentiWordNet database is classified into a number of senses, each sense ranked according to the
frequency of its usage in general (with the help of a sense-number), indicating in how many different
contexts that particular term could be used. There might be cases where a term could carry
ambiguous scores in the same sense. Table 1 illustrates this case.
Synset SentiWordNet Score Gloss
huffy, mad#1, sore (roused to (0.0, 0.125) "she gets mad when you wake
anger) her up so early"; "mad at his
friend"; "moreover a remark"
brainsick, crazy, demented, (0.0, 0.5) "a man who had gone mad"
disturbed, mad#2, sick,
unbalanced, unhinged
delirious, excited, frantic, (0.375, 0.125) "a crowd of delirious baseball
mad#3, unrestrained fans"; “a mad whirl of pleasure"
harebrained, insane, mad#4 (0.0, 0.25) "harebrained ideas"; "took insane
(very foolish) risks behind the wheel";
“a completely mad scheme to
build a bridge between two
mountains"
Table 1 : Example of Multiple Scores for a same term from SentiWordNet
5. In Table 1 the word mad belonging to adjective part of speech has got ambiguous positive and
negative senses and the disambiguation becomes a very primitive problem. It could be related to the
Word Sense Disambiguation problem in some sense. Due to the limited time and the complexity of
introducing WSD in this model, a simple approach is proposed to solve this problem
• Evaluate scores for each term in a given sentence
• If there are conflicting scores i.e. different sense scores for the same word – calculate the
weighted average of all positive scores and all negative scores. By doing so, we deprecate the
individual sense scores as the sense number increases.
3. Score Calculation and Pseudo Code
Our model is a blend of traditional bag of words and intelligent look up and priority evaluation using
a set of Language rules. The simplest of rules to start with is 'The Negation Rule'. When a negation or
a negative word like, “not”, “neither” etc is found in a response, the polarity of the opinion associated
with the aspect in context is reversed. If R is a response and {A} is the set of aspects associated with
that response in that context Θ S and if {N} denotes the set of negative words then,
If [ ∃n∈Θ S ] where n∈N then A InversePolarities
Rule 1 : The Negation Rule
The second rule is 'The Modal Rule'. Modals are the trickiest to deal with. To understand why Modals
are important, consider the following cases.
Case 1 : “The doctor could have been more positive”
A response like that in Case 1 would be tagged as a positive response, for there is no exact negative
inference there.
Similarly,
Case 2 : “I would have not gotten as much attention in any other hospital”
A response like this in Case 2 would be tagged as negative for the same reason mentioned in case 1.
Pertaining to the usage of modals extensively in the Language spoken or written, it makes a huge
difference to the results if they are not handled appropriately. So the following rule is proposed to deal
with Modals. If R is a response and {A} is the set of aspects associated with that response in that
context Θ S and if {M} denotes the set of words like 'would have', 'could have' etc, which we term as
Modals then
6. If [ ∃m∈ΘS ] where m ∈M then A InversePolarities
Rule 2 : The Modal Rule
The adjustment of polarity with respect to adjectives and adverbs becomes a very important aspect of
sentence level sentiment extraction. We take into account the intensifier and the re-prune our polarity
scores according to the score of the intensifier. If the score of the intensifier is I = [ i p ,i n ,i o ] where
i p ,i n ,i o denote the positive, negative and objectiveness scores of the intensifier and if [ Ψ p ,Ψ n ,Ψ o ]
denote the values of positive, negative and objectiveness of the quantity that is being intensified by
the intensifier or reducer I. The re-prune values are given by the following rules R3 and R4 for
intensifiers and R5 and R6 for reducers.
If I p >I n and Ψ p >Ψ n then the resultant re-prune score for Ψ is given by
Ψ newNegative = I p∗Ψ n ÷ ∑ Ψ k −Ψ o
Ψ newPositive = ∑ Ψ k −Ψ o −Ψ n
Rule R3 : Rule to intensify the positive quotient
If I p >I n and Ψ n >Ψ p then the resultant re-prune score for Ψ is given by
Ψ newPositive = I p∗Ψ p ÷ ∑ Ψ k −Ψ o
Ψ newNegative = ∑ Ψ k −Ψ o −Ψ p
Rule R4 : Rule to intensify the negative quotient
If I n >I p and Ψ p >Ψ n then the resultant re-prune score for Ψ is given by
Ψ newPositive = I n∗Ψ p ÷ ∑ Ψ k −Ψ o
Ψ newNegative = ∑ Ψ k −Ψ o −Ψ p
Rule R5 : Rule to reduce the positive quotient
If I n >I p and Ψ n >Ψ p then the resultant re-prune score for Ψ is given by
Ψ newNegative = I n∗Ψ n ÷ ∑ Ψ k −Ψ o
Ψ newPositive = ∑ Ψ k −Ψ o −Ψ n
Rule R6 : Rule to reduce the negative quotient
7. The above division rules are valid only when Ψ p +Ψ n >Ψ o . Otherwise, the value of the denominator
in the division rules becomes 1. The intuition is to reduce the impact of the opinion word in case of a
reducer and vice-versa for intensifier and the denominator component in division ensures that the
values aren't scaled down or scaled up by a huge margin. We equate the denominator to 1 in case of
sum of positive and negative scores for an opinionated term being less than the objective score of that
term, to tackle the problem of Polarity Inversion. The above rules are applied to amplify or reduce
the impact of the intensifier and a reducer respectively, on an opinionated word and sufficient care is
taken that the values converge to 1, to ensure domain and overall consistency. The following cases
illustrate our Rules R1...R6
Eg 1 : Could not ask for any better place or doctor for any cancer patient.
Aspects : doctor, place, cancer, patient
Positive Sentiment Value: 0.30319
Negative Sentiment Value: 0.00736
(Before R1 and R2)
Positive Sentiment Value: 0.00736
Negative Sentiment Value: 0.30319
(After R1)
Positive Sentiment Value: 0.30319
Negative Sentiment Value: 0.00736
(After R1 and R2)
Eg 2 : The doctors were very approachable and easy to talk to, understood my problem, and I could
clearly understand them.
Aspects: doctor, problem
Positive Sentiment Value: 0.21824
Negative Sentiment Value: 0.09563
Eg 4 : Effect of Adverbs on Adjectives
(approx to two Positive Sentiment Negative Sentiment Neutral Sentiment
decimal places) Val Val Val
good 0.5 0.25 1-(0.5+0.25)=0.25
very 0.25 0.17 1-(0.25+0.17)=0.58
very good (0.5+0.25)-0.083= 0.667 (0.25*0.25)/ 0.25
(0.5+0.25)=0.083
TABLE 2 : Demonstrating Effect of Adverbs on Adjectives
8. The following algorithm, Algorithm 1 shows the pseudo code of the implementation.
1. Start
2. Map sentimentMap = Load SentiWordNet
3. For each comment c in Comments C
4. boolean hasNegation = false
5. NounBag = {}
6. SentimentValue = {}
7.. For each word w in comment c:
8. If(negation(w) == true)
9. Set hasNegation = ~hasNegation
10. EndIf
11. If Pos(w) = NOUN:
12. NounBag.append(w)
13. SentimentValue = getSentimentValue(SentimentMap, w)
14.. EndIf
15. Elif (Pos(w) == Adj or Verb):
16. SentimentValue = getSentimentValue(SentimentMap, w)
17. Reprune();
18. End Elif
19. Elif (Pos(w) == AdVerb):
20. SentimentValue = getSentimentValue(SentimentMap, w)
21. RepruneAdverb();
22. End Elif
23. If (hasNegation):
24. InversePolarities();
25. EndIf
26. If (hasModals):
27. PruneModals();
28, EndIf
29. EndFor
30. EndFor
31. End
ALGORITHM 1 : Semi-Rule Based Model for Sentence Level Sentiment Extraction.
The steps explain the order of operations which start with extracting every comment and in-turn,
every opinion word in the comment, finding out its part-of-speech and then later finding out the
presence of intensifiers and the subsequent re-pruning. A lot of work has to be still done in evaluating
Modals like 'would've been, could've been' and conjunctions. Currently the sentiment score on both
side of a conjunction is aggregated, but efforts have to be put into finding out a metric to efficiently
evaluate the opinion weights. Current efforts are put into disambiguating senses and finding out
9. linguistic rules in the presence of conjunctions and Modals to prune opinion weights.
4. Performance Analysis
For sentiment terms association and classification we have run our algorithm in four iterations and at
each iteration we achieved better results and outperformed our previous iteration. Of the dataset we
have, we have considered four departments {SECTSCHE, SECTCARE, SECTFACI, SECTOVER}
respectively as our data sample. The reason to run it in four iterations is to understand how our
algorithm was getting us better results with respect to addition of rules. The following table illustrates
our strategy at each Iteration.
Iteration 1 Gathering Adjectives and Adverbs from a response using POS Tagging and using
SentiWordNet to look up for the scores and associating those scores with the
corresponding aspects.
Iteration 2 1) The impact of different senses was realized and nouns, adjectives, adverbs and
verbs have been separated from the rest.
2) Each of the above 4 POS have been looked up differently from the SentiWordNet
and the impact of one POS term on another is considered {Rules R3.. R6}
3) The notion of positive, negative and neutral scores to each entity was introduced
which essentially means, every positive word has some amount of negative sense and
vice-versa. These scores have been normalized to adjust to the changes in input
scales.
Iteration 3 Similar to Second Iteration, but nouns have been ignored during Sentiment
Association and classification process owing to the notion that Nouns usually form
the Aspect/Context of a Response but they don't greatly influence the Opinion
Quotient of a response.
Iteration 4 Rule R2 to deal with Modals has been introduced.
TABLE 3 : List of Iterations for Sentiment Extraction and Classification.
The following tabular representations in FIG 1 and FIG 2 shows the results of all the four iterations.
At each iteration the false positives and false negatives have been calculated. False positives are those
negative opinions mis-calculated as positive and false negatives are vice-versa. These False Positives
and False Negatives are denoted by FPOS and FNEG in FIG 1. At each iteration the percentage error
in positive and negative have also been shown. The false positives and false negatives are mainly due
to the fact that, every opinion however negative it is, is expressed more so using positive words than
the negative words. The columns T-Positive and T-Negative denote the total number of original
positive and negative responses as manually annotated. We check our calculated results against these
manually annotated results to determine the accuracy and efficiency of our model. The results are
detailed in the following figures.
10. FIG 1 : The sentiment classification results at various iterations.
As it can be seen from the tabular column, with every iteration, the number of false positives and false
negatives have come down. (Refer to TABLE 3 for iteration details). The following figure FIG 2
gives the percentage errors in every iteration after first.
FIG 2 : Percentage Errors with each iteration
The above figure shows the percentage errors with respect to each iteration. % Error in Positive are
the number of positive opinions mis-calculated as negative or otherwise they are the false negatives
and the % error in Negative are vice-versa. The percentage errors could be still brought down by
improving rules for Modals and using a domain specific lexicon or building a domain specific lexicon
and using it. As it could be seen from FIG 2 the percentage of negative error has got to do with
people expressing weak negatives with the help of positive qualifiers. And the increase in percentage
positive error from Iteration 3 to Iteration 4 could be tackled by fine tuning Rule R2. Part of the mis-
calculations can be attributed to the limitations in POS Tagger [6] that in-turn has got to do with our
datasets not having responses in a proper grammatical structure.
5. Conclusion and Future Work
Our model is a rule based approach proposed for aspect extraction and the association of opinion to
those respective aspects. The contextual information and the sense of each individual sentence are
extracted according to the pattern structure of the sentence using a Parts of Speech Tagger. The first
11. stage opinion score for the extracted sense is assigned to the sentence using SentiWordNet. The
eventual opinion score is calculated after checking linguistic orientation of each term in a sentence
with the help of Rules R1..R6 explained in Section 3 and normalizing the results to ensure that the
eventual score of an aspect in a response converges to 1, irrespective of the number of opinion words
associated with that aspect.
The accuracy of our model could be improved by having a lexicon that is specific to the domain or by
employing a learning mechanism with the help of a feedback loop which could also be manual.
However, natural language processing is just not black and white. A lot of work still has to be done in
disambiguating the word senses and weights associated with the subjects and objects in the sentence
as both of them don't necessarily have the same impact on the sentence's sentiment. Work is also
being carried out to separate weak positives and negatives from Strong positives and negatives to
provide the customer with potential stand points to improve their product quality. An approach to
deal with conjunctions has to be worked upon for better accuracy.
5. References:
[1] SentiWordNet 3.0 – An Enhanced Lexical Resource for Sentiment Analysis and Opinion
Mining. In Proc. Of LREC10(2010)
[2] M.A. Hearst, “Untangling text data mining,” Proceedings of the 37th annual meeting of the
Association for Computational Linguistics on Computational Linguistics, 1999, pp. 3-10.
[3] P. Turney, “Thumbs up or thumbs down? Semantic orientation applied to unsupervised
classification of reviews,” Proceedings of the 40th Annual Meeting of the Association for
Computational Linguistics (ACLʼ02), 2002, pp. 417-424.
[4] A. Andreevskaia and S. Bergler, “When specialists and generalists work together: Overcoming
domain dependence in sentiment tagging,” Proceedings of ACL-08: HLT, 2008, pp. 290-298.
[5] A. Khan, B. Baharudin, and K. Khan, “Sentiment Classification from Online Customer Reviews
Using Lexical Contextual Sentence Structure,” Communications in Computer and Information
Science, Software Engineering and Computer Systems, Springer Verlag, 2011, pp. 317-331.
[6] Kristina Toutanova, Dan Klein, Christopher Manning, and Yoram Singer. 2003. Feature Rich Part-
of-Speech Tagging with a Cyclic Dependency Network. In Proceedings of HLT-NAACL 2003, pp. 252-
259.
[7] Pang B., Lee L., and Vaithyanathan, S. (2002). Thumbs up? Sentiment Classification using
Machine Learning Techniques. Proceedings of EMNLP, 2002.
[8] Miller G. A., Beckwith R., Fellbaum C, Gross D, Miller K. J. (1990). Introduction to WordNet: An
On-line Lexical Database. International Journal of Lexicography. Vol. 3, No. 4 (Jan. 1990), 235-244.