This document summarizes a research paper that proposes an unsupervised approach to adapt existing sentiment lexicons to the context and language used on Twitter. It captures the contextual semantics of words based on their surrounding context in tweets. This is used to update the prior sentiment orientation and strength of words in an existing Twitter sentiment lexicon called Thelwall-Lexicon. Experiments show the adapted lexicons improve sentiment classification performance on two Twitter datasets compared to the original lexicon.
Adapting Sentiment Lexicons using Contextual Semantics
1. Adapting Sentiment Lexicons using
Contextual Semantics for Sentiment
Analysis of Twitter
Hassan Saif, Yulan He, Miriam Fernandez and Harith Alani
Knowledge Media Institute, The Open University,
Milton Keynes, United Kingdom
1st Workshop on Semantic Sentiment Analysis
Greece, Crete 2014
3. “Sentiment analysis is the task of identifying
positive and negative opinions, emotions and
evaluations in text”
3
Opinion OpinionFact
Sentiment Analysis
yes, It is sunny, but
also very humid :(
The weather is
great today :)
I think its almost
30 degrees today
4.
5.
6. I had nightmares all night long last night :(
Negative
Sentiment Lexicon
Text Processing
Algorithm
Sentiment Analysis
The Lexicon-based Approach
great
sad
down
wrong
horrible
love
Sentiment Analysis
7. Sentiment Lexicons
- Lists of Opinionated:
- Words and Phrases (MPQA, SentiWordNet, etc)
- Common Sense Concepts (SenticNet)
- Built:
- Manually
- Dictionary-based Approach
- Corpus-based Approach
- Applied to Conventional Text
- Movie Reviews, News, Blogs, Open Forums, etc.
8. Sentiment Lexicons on Twitter
Twitter Data
- Language Variations
- New Words
- Noisy Nature
- lol, gr8, :), :P
Traditional Lexicons
- Not tailored to Twitter
noisy data
- Fixed number of words
9. Twitter-specific Sentiment Lexicons
- Such as: Thelwall-Lexicon
- Built to specifically work on social data
- Contain lists of emoticons, slangs, abbreviations, etc.
- Coupled with rule-based method, SentiStrength
- Apply text pre-processing routine on tweets
10. Twitter-specific Sentiment Lexicons
Offer Context-Insensitive Prior Sentiment Orientations and Strength of words
..and Traditional Lexicons
Great
Problem Smile
Sentiment Lexicon
great
sad
down
wrong
horrible
love
Positive
12. Contextual Semantic Adaptation Approach
Unsupervised Approach
Captures the Contextual
Semantics of words
To assign Contextual
Sentiment
13. Contextual Semantics of Words
“Words that occur in similar context tend to have similar meaning”
Wittgenstein (1953)
Great
Problem
Look Smile
Concert
Song
Weather
Loss
Game
Taylor Swift
Amazing
Great
14. Capturing Contextual Semantics
Term (m) C1 C2 Cn….
Context-Term Vector
Degree of Correlation
Prior SentimentSentiment
Lexicon
(1)
(2)
Great
Smile Look
SentiCircles Model
(3)
Contextual Sentiment
Strength
Contextual Sentiment
Orientation
Positive,
Negative
Neutral
[-1 (very negative)
+1 (very positive)]
15. Capturing Contextual Semantics
Term (m) C1
Degree of Correlation
Prior Sentiment
Great
Smile
SentiCircles Model
X = R * COS(θ)
Y = R * SIN(θ)
Smile
X
ri
θi
xi
yi
Great
PositiveVery Positive
Very Negative Negative
+1
-1
+1-1 Neutral
Region
ri = TDOC(Ci)
θi = Prior_Sentiment (Ci) * π
17. Overall Contextual Sentiment
Ci
X
ri
θi
xi
yi
m
PositiveVery Positive
Very Negative Negative
+1
-1
+1-1 Neutral
Region
nwhicheachtermisused. Tocomputethenewsentiment of
tiCircleweusetheSenti-Median metric. Wenow havethe
hichiscomposedbytheset of (x, y) Cartesiancoordinatesof
wherethey valuerepresentsthesentiment andthex value
ength. Aneffectiveway toapproximatetheoverall sentiment
y calculatingthegeometricmedianof all itspoints. Formally,
(p1, p2, ..., pn ) inaSentiCircle⌦, the2Dgeometricmedian
g = arg min
g2 R2
nX
i = 1
k|pi − g||2, (5)
Senti-Median of SentiCircle
Sentiment Function
18. Lexicon Adaptation Method
• A set of Antecedent-Consequent Rules
• Decides on the new sentiment of a term
based on:
– How Weak/Strong its Prior Sentiment
– How Weak/Strong its Contextual Sentiment
• Based on the Position of the term’s SentiMedian
19. Thelwall-Lexicon
Case Study
fiery -2
fiery -2
vex*-3
fiery -2
witch -1
inspir* 3
fiery* -2
trite* -3
fiery -2
cunt* -4
fiery -2
fiery* -2
intelligent* 2
fiery -2
joll* 3
fiery* -2
fiery* -2
suffers -4
fiery -2
loved 4
insidious* -3
despis* -4
fiery* -2
hehe* 2
398
1919
229
0
500
1000
1500
2000
2500
Positive Negative Neutral
• Consists of 2546 terms
• Coupled with prior sentiment strength between |1| and |5|
[-2, -5] negative term
[2, 5] positive term
[-1, 1] neutral term
24. Adapted Lexicons on HCR
Performance
35
37
39
41
43
45
Precision Recall F1
Positive Sentiment Detection
Original Updated Updated+Expanded
Sentiment Class Distribution
0.35
0.4
0.45
0.5
0.55
0.6
OMD HCR STS-Gold
Positive to Negative Ratio
Impact on Thelwall-Lexicon
10
15
20
25
30
OMD HCR STS-Gold
New Words Added To Thelwall-Lexicon
25. Conclusion
• We proposed an unsupervised approach for sentiment
lexicon adaptation from Twitter data.
• It update the words’ prior sentiment orientations and/or
strength based on their contextual semantics in tweets
• The evaluation was done on Thelwall-Lexicon using three
Twitter datasets.
• Results showed that lexicons adapted by our approach
improved the sentiment classification performance in both
accuracy and F1 in two out of three datasets.
Early work on Sentiment analysis focused mainly on extracting sentiment from conventional text such as movie reviews, blogs, news articles and open forums
Textual content in these type of media sources is linguistically rich, consists of well structured and formal sentences, and discusses specific topic or domain (e.g., movie reviews)
However, with the emergent of social media networks and microblogging platforms, especially Twitter, research interests shifted to analyzing and extracting sentiment from theses new sources.
Nevertheless, One of the key challenges that Twitter sentiment analysis methods have to confront is the noisy nature of Twitter generated data. Twitter allows only for 140 characters in each post, which influences the use of abbreviations, irregular expressions and infrequent words.
This phenomena increases the level of data sparsity, affecting the performance of Twitter sentiment classifiers
There are several approaches to sentiment analysis.
One common approach is the lexicon-based approach. This approach assumes that the sentiment orientations of a given
Words in the lexicons have fixed prior sentiment orientations, i.e. each term has always the same associated sentiment orientation independently of the context in which the term is used.
SentiCircles
SentiCircles
To build rules we need to look at the characteristics of the sentiment lexicon that we want to adapt.
in our work we use thelwall-lexicon as a case study and therefore, we built our adaptation rules base don the characteristics of this lexicon
As a case study
To build rules we need to look at the characteristics of the sentiment lexicon that we want to adapt.
To build rules we need to look at the characteristics of the sentiment lexicon that we want to adapt.