Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
OEG at TASS 2017: Spanish Sentiment Analysis of tweets at document level
1. OEG at TASS 2017: Spanish
Sentiment Analysis of tweets
at document level
María Navas-Loro, Víctor Rodríguez-Doncel
Universidad Politécnica de Madrid
mnavas@fi.upm.es
TASS - SEPLN, 19th September 2017
12. • Sentiment Analysis, usually a binary classification task
Our case is more complex (P, N, NEU and NONE).
• We focused on testing different strategies:
Labeling Strategy
P N Neu None
Max
P N
F(x)
1) 2)
F(x) =
𝑃 − 𝑁 < 𝑡 𝑑 𝑁𝐸𝑈
0 < 𝑡 𝑛 𝑚𝑖𝑛
< 𝑃, 𝑁 < 𝑡 𝑛 𝑚𝑎𝑥
𝑁𝐸𝑈
0 ≤ 𝑃, 𝑁 ≤ 𝑡 𝑛 𝑚𝑖𝑛
𝑁𝑂𝑁𝐸
G(x) =
0 ≤ 𝑋 < 𝑡− 𝑁
𝑡− ≤ 𝑋 ≤ 𝑡+ 𝑁𝐸𝑈
𝑡+ < 𝑋 ≤ 1 𝑃
3)
Emotion
map
4)
N, Neu, P
G(x)
satisfaction, trust,
happiness, love
dissatisfaction, fear
sadness, hate
P NNeu
NONE just the14% of InterTASS corpus!
Sentiment?
P, N, Neu
5) Two-stages:
1. Is there any
sentiment?
2. If so, which
one?
None
14. Features:
• Tokens
• Lemmas
• Words from lexicons
Linguistic Considerations
Negation treatment:
• Presence of NEG constituents (“no”,
“nunca”…)
If present within a verbal group,
polarity is inverted.
• Double negation is not considered.
15. Preprocessing
• Laugh patterns
• URLs
• Slang expressions:
• Q, k, qu, ke, qe
• d,tb, lol
• Xq, pq, porq
• Repeated letters
• Suppresions of numbers
• Emoticons
• Stopword list
• Manual
• TF-IDF
Linguistic Considerations
19. laOEG
• 2nd labeling strategy.
• MNB (slightly better
than SMO), tokens.
Our systems
• Parameters:
0.3 for MNB
(0.01-0.5)
0.10-0.15
• PRO: it is versatile and fast
• CONS: it is the simpliest and shallowest approach, it
does not even consider negation.
F(x) =
𝑃 − 𝑁 < 0,10 𝑁𝐸𝑈
0 < 0,10 < 𝑃, 𝑁 < 0,15 𝑁𝐸𝑈
0 ≤ 𝑃, 𝑁 ≤ 0,10 𝑁𝑂𝑁𝐸
P N
F(x)
tokens
20. victor0
• Same as above (2nd
labeling strategy,
MNB), but:
• Using lemmas.
Our systems
• PRO: it is versatile and detects negation.
• CONS: much slower than laOEG.
• Handling negation at the verbal group at different
constituent levels, new features:
Eg: ‘don’t like’ is considered now as a single feature.
P N
F(x)
lemmas
21. victor2
• Same as above, but:
• With stopwords.
• Using a special dataset
to better distinguish
NEU and NONE.
Our systems
P N
F(x)
lemmas
Hinojosa et
al. 2016
victor3b
• victor2 combined with
IBM Watson NLU
module when good
confidence (0,75).
victor2
W NLU
> 0,75
< 0,75
25. Conclusions and future lines
Conclusions on this first participation:
• There is room for improvement, but…
• … we had stable results and versatile systems.
• … out-of-the-box external software (Watson) worked
any better.
Future lines:
• Fine-grained emotions is not the right way.
• More resources.
• Use of concepts instead of simple words.
26. Bibliography
IXA pipes: Rodrigo Agerri, Josu Bermudez and German Rigau (2014):
"IXA pipeline: Efficient and Ready to Use Multilingual NLP tools", in:
Proceedings of the 9th Language Resources and Evaluation Conference
(LREC2014), 26-31 May, 2014, Reykjavik, Iceland.
WEKA: Eibe Frank, Mark A. Hall, and Ian H. Witten (2016). The WEKA
Workbench. Online Appendix for "Data Mining: Practical Machine
Learning Tools and Techniques", Morgan Kaufmann, Fourth Edition,
2016.
(Hinojosa et at. 2016) dataset: Hinojosa, J. A., N. Martínez-García, C.
Villalba-García, U. Fernández-Folgueiras, A. Sánchez-Carmona, M. A.
Pozo, and P. Montoro. 2016. Affective norms of 875 spanish words for
five discrete emotional categories and two emotional dimensions.
Behavior research methods, 48(1):272-284.
IBM Watson NLU:
https://www.ibm.com/watson/developercloud/doc/natural-language-
understanding/
27. OEG at TASS 2017: Spanish
Sentiment Analysis of tweets
at document level
María Navas-Loro, Víctor Rodríguez-Doncel
Universidad Politécnica de Madrid
mnavas@fi.upm.es
TASS - SEPLN, 19th September 2017