ISCRAM 2013: A Fine-Grained Sentiment Analysis Approach for Detecting Crisis Related Microposts
A Fine-Grained Sentiment Analysis Approach for
Detecting Crisis Related Microposts
Axel Schulz, Tung Dang Thanh,
Dr. Heiko Paulheim, Dr. Immanuel Schweizer
May, 14 2013
The work is partly funded by a grant of the German Federal Ministry for Education and
Research
Telecooperation Lab
Technische Universität Darmstadt
Motivation
Problem: Fragmented Situational Picture
?
Decision making based on
information from:
• Onsite rescue squads
• Traditional data sources
Bystanders report
additional information
about current situation
Decision makers must be aware of all relevant information in their environment
Valuable information from user-generated content is not usable for decision
makers because of:
• Heterogeneous and unstructured nature of the data
• Lack of time to analyze flood of data
A Fine-Grained Sentiment Analysis Approach for Detecting Crisis Related Microposts 2
Vision
Vision: Increasing the situational picture by making user-generated content usable for
decision makers
Sentiment analysis can help to differentiate important information from unimportant
one
Current approaches focus on a three-class problem (Negative, Positive, Neutral)
A more fine-grained differentiation could help detecting relevant tweets
7 classes [Ekman]: Anger, Disgust, Fear, Sadness, Surprise, Happiness, Neutral
Enhanced Situational Picture
!
Decision making based on
information from:
• Onsite rescue squads
• Traditional data sources
Bystanders report
additional information
about current situation
A Fine-Grained Sentiment Analysis Approach for Detecting Crisis Related Microposts 3
Approach: Reference Pipeline
Feature Extraction
•Unigram Extraction
•Extraction of Part-of-speech
Features
•Character Trigram/Fourgram
•Syntactic Features
•Sentiment Features
Preprocessing
•Extraction of irrelevant
words, links, and user
mentions
•Handling Negations
•Abbreviation Resolution
•Category Extraction
Tweets
Classification
•Naïve Bayes Binary Model
•Naïve Bayes Multinomial
Model
•Support Vector Machine
A Fine-Grained Sentiment Analysis Approach for Detecting Crisis Related Microposts 4
Testing all combinations of
Machine learning methods
and features
Metrics for performance evaluation
Accuracy, Precision, Recall
Results calculated using stratified 10-fold cross validation
Evaluation: Approach
A Fine-Grained Sentiment Analysis Approach for Detecting Crisis Related Microposts 6
7 Classes: 6 basic emotions + neutral
SET1: 114 English tweets (Seattle)
Labeled by at least eight persons and more than 50% agreement
SET2: 1951 English tweets (Seattle)
Surprise: positive surprise, negative surprise
Each tweet labeled by one person using MTurk
3 Classes: Negative, Positive, Neutral
SET2_GP: grouping SET2
“Disgust”, “Fear”, “Sadness”, “Surprise with negative meaning” into the negative
class,
“Happiness”, “Surprise with positive meaning” into the positive class
“None” into the neutral class
872 positive tweets, 598 negative tweets, and 481 neutral tweets
Evaluation: Datasets
A Fine-Grained Sentiment Analysis Approach for Detecting Crisis Related Microposts 7
Evaluation: Results
7-classes
SET 1 SET2
3-classes
SET2_GP
Accuracy 0.658 0.605 0.657 0.564 0.503 0.535 0.641 0.566 0.626
Avg. Precision 0.615 0.519 0.597 0.482 0.45 0.489 0.645 0.565 0.625
Avg. Recall 0.658 0.605 0.658 0.564 0.504 0.535 0.641 0.566 0.625
F-Measure 0.61 0.525 0.598 0.492 0.394 0.505 0.64 0.564 0.624
Class. Method NBB NBM SVM NBM NBB SVM NBM NBB SVM
Unigram x x x x x x x x
Syntactic Features x x x
Sentiment Features x x x x
POS Tagging x x x x x
Character tri-gram x
A Fine-Grained Sentiment Analysis Approach for Detecting Crisis Related Microposts 8
Evaluation: Crisis Related Results
Tweets collected during “hurricane” Sandy in October 2012
60 situational awareness tweets, 140 random, non-contributing tweets
Results: using “Fear” tweets outperforms “Negative” and random baseline (30%
Accuracy)
Found with 7-classes, but not with 3-classes
"Day 3. No power. Limited Food. Limited shelter. Must survive. #Sandy" [7-classes:
Fear, 3-classes: Neutral]
Detected Contributing to SA Accuracy Recall
7-classes
Fear 96 38 0.395 0.633
Disgust 41 10 0.243 0.166
Fear & Disgust 137 48 0.35 0.80
3-classes Negative 41 12 0.292 0.20
A Fine-Grained Sentiment Analysis Approach for Detecting Crisis Related Microposts 9
Example Application and Outlook
• For detecting tweets contributing to situational awareness
• Combination with geolocalization approaches, filtering etc.
A Fine-Grained Sentiment Analysis Approach for Detecting Crisis Related Microposts 10
Contribution
• Novel sentiment analysis approach for detecting seven sentiment
classes
• Preliminary evaluation: shows promising results towards detecting
crisis related tweets
Future Work
Larger training set needed
Combination of different means necessary for more valuable
pipeline
Conclusion & Outlook
A Fine-Grained Sentiment Analysis Approach for Detecting Crisis Related Microposts 11
THANK YOU!
Questions?
Can also be addressed to:
aschulz@tk.informatik.tu-darmstadt.de
12A Fine-Grained Sentiment Analysis Approach for Detecting Crisis Related Microposts
Agarwal, A., Xie, B., Vovsha, I., Rambow, O., & Passonneau, R. (2011). Sentiment analysis of Twitter data.
Proceedings of the Workshop on Languages in Social Media. Portland, Oregon.
Barbosa, L., & Feng, J. (2010). Robust sentiment detection on Twitter from biased and noisy data.
Proceedings of the 23rd International Conference on Computational Linguistics. Beijing, China.
Ekman, P. (1992) An argument for basic emotions. Cognition & Emotion, 6, 3-4, 169-200.
Esuli, A., & Sebastiani, F. (2006). SentiWordNet: A Publicly Available Lexical Resource for Opinion Mining.
Proceedings of the 5th Conference on Language Resources and Evaluation. Genova, IT.
Jiang, L., Yu, M., Zhou, M., Liu, X., & Zhao, T. (2011). Target-dependent Twitter Sentiment Classification.
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics.
Portland, Oregon.
Nagy, A. and Stamberger, J (2012) Proceedings of the 9th International ISCRAM
Conference, Vancouver, CA.
Nielsen, A. (2011). A new ANEW: Evaluation of a word list for sentiment analysis in microposts. Journal of
the International Linguistic Association, 93-98.
Pang, B., & Lee, L. (2006). Opinion Mining and Sentiment Analysis. Foundations and Trends in Information
Retrieval, 91-231.
Vieweg, S., Hughes, A. L., Starbird, K., Palen, L. (2010) Micropostging during two natural hazards events.
Proceedings of the 28th international conference on Human factors in computing systems, Atlanta, GA.
Witten, I. H. and Frank, E. (2005). Data Mining: Practice Machine Learning Tools and Techniques, 2nd
Edition, San Francisco, Morgan Kaufmann Publishers.
A Fine-Grained Sentiment Analysis Approach for Detecting Crisis Related Microposts 13
Bibliography