OEG at TASS 2017: Spanish Sentiment Analysis of tweets at document level

OEG at TASS 2017: Spanish
Sentiment Analysis of tweets
at document level
María Navas-Loro, Víctor Rodríguez-Doncel
Universidad Politécnica de Madrid
mnavas@fi.upm.es
TASS - SEPLN, 19th September 2017

Outline
Outline
1. Background
2. Our approach
1. Labeling Strategy
2. Linguistic
Considerations
3. Means
3. Final systems
4. Conclusions

Analysis proposal
Post
Meaningful
Brands
Marketing
Mix
Sentiment
Analysis
Purchase
Funnel
Marketplace
Personal
Wellbeing
Collective
Wellbeing
AwarenessEvaluation
Purchase
Postpurchase
Review
Product
Price
Promotion
Place
Hate / Love
Satisfaction / Dissatisfaction
Hapiness / Sadness
Trust / Fear

Analysis proposal
Post
Meaningful
Brands
Marketing
Mix
Sentiment
Analysis
Purchase
Funnel
Marketplace
Personal
Wellbeing
Collective
Wellbeing
AwarenessEvaluation
Purchase
Postpurchase
Review
Product
Price
Promotion
Place
Hate / Love
Hapiness / Sadness
Trust / Fear
Purchase
Funnel
AwarenessEvaluation
Purchase
Postpurchase
Review
When?

Analysis proposal
Post
Meaningful
Brands
Marketing
Mix
Sentiment
Analysis
Purchase
Funnel
Marketplace
Personal
Wellbeing
Collective
Wellbeing
AwarenessEvaluation
Purchase
Postpurchase
Review
Product
Price
Promotion
Place
Hate / Love
Hapiness / Sadness
Trust / Fear
Meaningful
Brands
Marketplace
Personal
Wellbeing
Collective
Wellbeing
Where?

Analysis proposal
Post
Meaningful
Brands
Marketing
Mix
Sentiment
Analysis
Purchase
Funnel
Marketplace
Personal
Wellbeing
Collective
Wellbeing
AwarenessEvaluation
Purchase
Postpurchase
Review
Product
Price
Promotion
Place
Hate / Love
Hapiness / Sadness
Trust / Fear
Marketing
MixProduct
Price
Promotion
Place
What?

Post
Meaningful
Brands
Marketing
Mix
Sentiment
Analysis
Purchase
Funnel
Marketplace
Personal
Wellbeing
Collective
Wellbeing
AwarenessEvaluation
Purchase
Postpurchase
Review
Product
Price
Promotion
Place
Hate / Love
Hapiness / Sadness
Trust / Fear
Sentiment
Analysis
Hate / Love
Hapiness / Sadness
Trust / Fear
Emotion?
Analysis proposal

Post
Meaningful
Brands
Marketing
Mix
Sentiment
Analysis
Purchase
Funnel
Marketplace
Personal
Wellbeing
Collective
Wellbeing
AwarenessEvaluation
Purchase
Postpurchase
Review
Product
Price
Promotion
Place
Hate / Love
Hapiness / Sadness
Trust / Fear
Sentiment
Analysis
Hate / Love
Hapiness / Sadness
Trust / Fear
Analysis proposal

LABELING STRATEGY
Our approach
11

• Sentiment Analysis, usually a binary classification task
 Our case is more complex (P, N, NEU and NONE).
• We focused on testing different strategies:
Labeling Strategy
P N Neu None
Max
P N
F(x)
1) 2)
F(x) =
𝑃 − 𝑁 < 𝑡 𝑑 𝑁𝐸𝑈
0 < 𝑡 𝑛 𝑚𝑖𝑛
< 𝑃, 𝑁 < 𝑡 𝑛 𝑚𝑎𝑥
𝑁𝐸𝑈
0 ≤ 𝑃, 𝑁 ≤ 𝑡 𝑛 𝑚𝑖𝑛
𝑁𝑂𝑁𝐸
G(x) =
0 ≤ 𝑋 < 𝑡− 𝑁
𝑡− ≤ 𝑋 ≤ 𝑡+ 𝑁𝐸𝑈
𝑡+ < 𝑋 ≤ 1 𝑃
3)
Emotion
map
4)
N, Neu, P
G(x)
satisfaction, trust,
happiness, love
dissatisfaction, fear
sadness, hate
P NNeu
NONE  just the14% of InterTASS corpus!
Sentiment?
P, N, Neu
5) Two-stages:
1. Is there any
sentiment?
2. If so, which
one?
None

LINGUISTIC
CONSIDERATIONS
Our approach
13

Features:
• Tokens
• Lemmas
• Words from lexicons
Linguistic Considerations
Negation treatment:
• Presence of NEG constituents (“no”,
“nunca”…)
 If present within a verbal group,
polarity is inverted.
• Double negation is not considered.

Preprocessing
• Laugh patterns
• URLs
• Slang expressions:
• Q, k, qu, ke, qe
• d,tb, lol
• Xq, pq, porq
• Repeated letters
• Suppresions of numbers
• Emoticons
• Stopword list
• Manual
• TF-IDF
Linguistic Considerations

MEANS:
Resources and Algorithms
Our approach
16

External resources:
• IXA-Pipes for NLP
• WEKA for Machine Learning
• Different lexicons.
Means
Algorithms:
• MNB (Multinomial Naïve Bayes).
• SMO (Sequential Minimal
Optimization for SVMs).
* Previous efforts: Logistic
Regression, trees,
WordEmbeddings…

laOEG
• 2nd labeling strategy.
• MNB (slightly better
than SMO), tokens.
Our systems
• Parameters:
0.3 for MNB
(0.01-0.5)
0.10-0.15
• PRO: it is versatile and fast
• CONS: it is the simpliest and shallowest approach, it
does not even consider negation.
F(x) =
𝑃 − 𝑁 < 0,10 𝑁𝐸𝑈
0 < 0,10 < 𝑃, 𝑁 < 0,15 𝑁𝐸𝑈
0 ≤ 𝑃, 𝑁 ≤ 0,10 𝑁𝑂𝑁𝐸
P N
F(x)
tokens

victor0
• Same as above (2nd
labeling strategy,
MNB), but:
• Using lemmas.
Our systems
• PRO: it is versatile and detects negation.
• CONS: much slower than laOEG.
• Handling negation at the verbal group at different
constituent levels, new features:
Eg: ‘don’t like’ is considered now as a single feature.
P N
F(x)
lemmas

victor2
• Same as above, but:
• With stopwords.
• Using a special dataset
to better distinguish
NEU and NONE.
Our systems
P N
F(x)
lemmas
Hinojosa et
al. 2016
victor3b
• victor2 combined with
IBM Watson NLU
module when good
confidence (0,75).
victor2
W NLU
> 0,75
< 0,75

Our systems
laOEG
victor0
victor2
W NLU
victor3b
Hinojosa et
al. 2016

Results
InterTASS
System M-P M-R M-F1 Acc
victor2 0,400 0,389 0,395 0,451
victor0 0,388 0,378 0,383 0,433
laOEG 0,383 0,370 0,377 0,505
Max. result 0,497 0,490 0,493 0,607
Min. result 0,291 0,322 0,306 0,479
General 1k
victor3b 0,402 0,337 0,367 0,486
victor2 0,361 0,370 0,366 0,412
laOEG 0,348 0,345 0,346 0,448
Max. result 0,559 0,595 0,577 0,645
Min. result 0,302 0,348 0,324 0,434
General TASS
victor2 0,395 0,384 0,389 0,496
laOEG 0,350 0,342 0,346 0,407
Max. result 0,559 0,595 0,577 0,645
Min. result 0,302 0,348 0,324 0,434

CONCLUSIONS
Contributions and future work
24

Conclusions and future lines
Conclusions on this first participation:
• There is room for improvement, but…
• … we had stable results and versatile systems.
• … out-of-the-box external software (Watson) worked
any better.
Future lines:
• Fine-grained emotions is not the right way.
• More resources.
• Use of concepts instead of simple words.

Bibliography
IXA pipes: Rodrigo Agerri, Josu Bermudez and German Rigau (2014):
"IXA pipeline: Efficient and Ready to Use Multilingual NLP tools", in:
Proceedings of the 9th Language Resources and Evaluation Conference
(LREC2014), 26-31 May, 2014, Reykjavik, Iceland.
WEKA: Eibe Frank, Mark A. Hall, and Ian H. Witten (2016). The WEKA
Workbench. Online Appendix for "Data Mining: Practical Machine
Learning Tools and Techniques", Morgan Kaufmann, Fourth Edition,
2016.
(Hinojosa et at. 2016) dataset: Hinojosa, J. A., N. Martínez-García, C.
Villalba-García, U. Fernández-Folgueiras, A. Sánchez-Carmona, M. A.
Pozo, and P. Montoro. 2016. Affective norms of 875 spanish words for
five discrete emotional categories and two emotional dimensions.
Behavior research methods, 48(1):272-284.
IBM Watson NLU:
https://www.ibm.com/watson/developercloud/doc/natural-language-
understanding/

OEG at TASS 2017: Spanish Sentiment Analysis of tweets at document level

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie OEG at TASS 2017: Spanish Sentiment Analysis of tweets at document level

Ähnlich wie OEG at TASS 2017: Spanish Sentiment Analysis of tweets at document level (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

OEG at TASS 2017: Spanish Sentiment Analysis of tweets at document level