Using digital technologies to build employee resilienceDr. Ali Fenwick
Ähnlich wie Sentiment Analysis of Social Media Content: A multi-tool for listening to your audience and developing sentimental content strategies. (20)
Sentiment Analysis of Social Media Content: A multi-tool for listening to your audience and developing sentimental content strategies.
1. Machine Learning for Big Data
Prof. Dr. Eirini Ntoutsi
Leibniz University Hannover & L3S Research Center
Sentiment Analysis of Social Media Content
A multi-tool for listening to your audience and
developing sentimental content strategies
EUMade4All Workshop, Hannover, 29.9.2017
2. Outline
A world of opinions
Analyzing opinions for sentiment
Using sentimental content
2Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content
3. A Web/World of opinions
With the advent of Web 2.0 and its social character a lot of opinion-rich
resources have arise
Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content 3
7. Why we care?
Opinions are produced at a constant basis and are (most of
the times) freely available
Free feedback from our customers/ users
Valuable source of information for companies, politicians1,
decision makers
Companies turn into social media monitoring in order to
optimize and strengthen their products and brands
An opportunity for marketers to pay attention to
consumers’ feelings towards their brand
People have the power to influence each other in their
decisions
Product design could be driven by user requests
Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content 7
1https://motherboard.vice.com/en_us/article/mg9vvn/how-our-likes-helped-trump-win
8. Sentiment analysis
Opinions on Vodafone
Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content
What we are interested in?
(Automatically) Identifying the negative tweets (and reacting … customer care)
8
9. Aspect-oriented sentiment analysis
Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content
It‘s not ALL good or bad
Reviews from TripAdvisor on Vienna Marriott Hotel
2/5/2014: Great hotel, very nice rooms, perfect location, very nice staff except for a mid-aged female receptionist who tried to
charge me extra for wifi fees when checking out. It was waived at the desk when I checked-in. And she started treating me with
an attitude after she found out that I got a great deal through priceline.com. ….
26/1/2014: Spent a long weekend here. Rooms clean and functional without being spectacular and a nice pool etc. Staff in pool
weren't Good and I found them actually quite rood. Executive lounge was ok and not busy but selection of wine and beer wasn't
great. The reception has many shops and a bar at the end which kind of males it feel like a shopping centre. Overall great for
business travel but not sure id come again for leisure.
7/5/2013: The Vienna Marriott has all you expect; no frills, but solid service and they get all the basic stuff done right.
It's in a fine location, maybe 10 minute walk from the major city attractions while being in a quiet area. Breakfast buffet
exceptional and good fitness center. Very helpful and happy staff.
Lobby lounge just okay. Not a good wine selection and the Sinatra-like singer adds nothing.
Maybe just a little more expensive than it should be, too.
What we are interested in?
What people are talking about (items and item aspects)
The attitude of people towards these items and aspects
9
10. (Sentiment- & aspect-based) opinion summarization
Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content 10
11. (Sentiment- & aspect-based) opinion summarization
Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content 11
12. Sentiment analysis: an umbrella term
The Sentiment Analysis task
Is a given text positive, negative, or neutral?
Text = a sentence, a tweet, a customer review, a document …
The Emotion Analysis task
What emotion is being expressed in a given piece of text?
Basic emotions: joy, sadness, fear, anger,…
Other emotions: guilt, pride, optimism, frustration,…
The Aspect-oriented Sentiment Analysis task
What are the product/entity aspects discussed in a text?
What is the sentiment of those aspects?
The Summarization task
What are the key aspects in users’ opinions? What is the predominant
sentiment?
Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content 12
13. Outline
A world of opinions
Analyzing opinions for sentiment
Using sentimental content
13Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content
14. Building a sentiment classifier
Building a sentiment classifier requires data and algorithms
14Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content
Algorithm
Model
f(x)
15. Challenges of sentiment analysis in social media
Language-related & medium-related challenges
Informal
Short, 140 characters for tweets
Abbreviations and shortenings
Wide array of topics and large vocabulary
Spelling mistakes and creative spellings
Special strings like hashtags, emoticons, conjoined words
Data properties
Large amounts of opinions (Volume)
Continuous flow of opinions (Velocity)
Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content 15
16. Challenges of sentiment analysis in social media
Sentiment-related challenges
The unambiguous identification of sentiment
Sarcasm
Bipolarity
Dealing with colloquial language
tweets containing colloquial slang
Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content 16
17. Building a sentiment classifier
Building a sentiment classifier requires data and algorithms
Two challenging parts
Learning: How to build a classifier?
Labeling: How to create a (class-labeled) training set?
17Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content
Algorithm
Model
f(x)
18. How to build a classifier
18Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content
Preprocessing part
Negations
Colloquial language
Superfluous words
Emoticons
Learning part/ Classifiers
Naïve Bayes
SVMs
Ensembles
Deep Neural Networks
KNNs
…
19. Preprocessing - Negations
19Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content
Tagging negations with verbs
27.222.287 found verb negations (0.4%)
Tagging negations with adjectives
2-part adjective co-occurrences
3-part adjective co-occurrences
4.832.573 found adjective negations (0.1%)
I do not like I NOT_like
It didn't fit It NOT_fit
not pretty ugly
not bad good
not very young old
Verbs negation list: www.vocabulix.com
Adverbs negation list: www.scribd.com
85%
15%
Negation verbs Negation adjectives
Iosifidis & Ntoutsi, “Large scale sentiment learning with limited labels”, KDD 2017
20. Preprocessing effect – Overall view (distinct words)
21Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content
0
50.000.000
100.000.000
150.000.000
200.000.000
250.000.000
300.000.000
original slang links & mentions negations Emoticons Stopwords
Iosifidis & Ntoutsi, “Large scale sentiment learning with limited labels”, KDD 2017
21. (back to) Building a sentiment classifier
Building a sentiment classifier requires data and algorithms
Two challenging parts
Learning: How to build a classifier?
Labeling: How to create a (class-labeled) training set?
22Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content
Algorithm
Model
f(x)
22. How to create a (class-labeled) training set
Big Data but few labels
Human labelling at this scale is impossible
What other (machine-based) resources can we exploit to label (part of)
our data?
At the data level
Labels through emoticons
Labels through sentiment dictionaries (like SentiWordNet)
At the machine learning model level
use both labeled and unlabeled data for learning semi-supervised learning
23Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content
23. Labels through emoticons
Implicit labels, through emoticons
We assembled a list of positive, negative emoticons
#72 positive class emoticons :-) :) :o) =) ;) (: (; (= <3 :D :-D :oD =D ;D
#70 negative emoticons :( :-( :o( =( ;( ;-( ): ); )=
We classified tweets based on their emoticons
Positive only positive emoticons (10%)
Negative only negative emoticons (2%)
Mixed both positive and negative (1%)
No emoticon (88%)
In total, 57.340.286 (12%) are pure-labeled.
24Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content
10%
88%
2% 0%
emoticons_positive no_emoticons
emoticons_negative emoticons_mixed
24. Labels through SentiWordNet
SentiWordNet: a lexical resource for supporting sentiment classification
Tweet sentiment as an aggregation of the sentiment of its member words
SentiWordNet labeling results
Positive: only positive words
Negative: only negative words
Neutral: only neutral words
Zero-sum: mix of positive and negative
No decision: words do not exist in the lexicon
e.g., #Iloveobama, #refugeecrisis etc
25Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content
25. Emoticons vs SentiWordNet
For the intersection (57.340.286 = 12% tweets with pure sentiment-based labels),
we checked agreement in the labels
Causes of disagreement
Emoticons-based labeling
Prone to errors: existence of positive emoticons does not imply positive words
SentiWordNet-based labeling
SentiWordNet is a static dictionary
Twitter is very dynamic
Words change polarity (also based on context)
New words are created (e.g. hashtags) which are not part of the dictionary
26Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content
Emoticon-based
labeling
SentiWordNet-based labeling
Positive Negative Neutral Zero sum No-decision
Positive 28.104.677
(49%)
10.756.225
(19%)
4.908.237
(9%)
23.297
(0.04%)
3.140.978
(5%)
Negative 4.929.947
(9%)
3.885.983
(7%)
930.075
(2%)
7.527
(0.01%)
653.340
(1%)
• We need a hybrid approach:
Campero et al, “Tracking Ephemeral Sentiment
Entities in Social Streams”, submitted 2017
26. Challenges and opportunities
Multilinguality
486.627.464 (English tweets) out of 1.882.387.310 total tweets we utilize
only 26% of the dataset.
Add multilingual content
Transfer learning
Exploit the content similarity
Not everyone uses emoticons
If tweets are similar, “inherit” the sentiment from the “neighboring” tweets
Exploit the hashtags
Start with a seed of positive, negative hashtags
Data augmentation
Iosifidis & Ntoutsi, “Data Augmentation for Polarized Textual Data for Dealing with Class
Imbalance”, Submitted 2017
27Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content
27. Challenges and opportunities
Dealing with class imbalance
Most of the opinions/ reviews are positive (5*, respectively). How can we build
models that learn best all classes (not just the majority)?
Dealing with changes
How sentiment changes with time? How can we build classifiers that react to
change (concept drifts)?
Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content 28
28. Reacting to change
29Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content
Part of our ongoing work on the OSCAR project
DFG project OSCAR: “Opinion Stream
Classification with Ensembles and Active
leaRners”
29. Outline
A world of opinions
Analyzing opinions for sentiment
Using sentimental content
30Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content
30. Changing perspectives: Serving emotional content
Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content
"At the constitutional level where we work, 90% of any decision is emotional.
The rational part of us supplies the reasons for supporting our predilections.”
----Justice William O. Douglas
31
32. Emotional appeals
You will be happier, smarter or better looking if you have this item.
33Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content
33. The cultural challenge
Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content
A case study of FIAT
FIAT released an ad in Italy in which actor
Richard Gere drives a Lancia Delta from
Hollywood to Tibet.
Gere is hated in China for being an
outspoken supporter of the Dalai Lama
There was a huge online uproar on
Chinese message boards commenting that
they would never buy a FIAT car.
34
34. The ephemeral sentiment challenge
Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content
Sentiment trajectory for refugees topic
35
Source: Multilingual Sentiment Analysis on Data of the Refugee Crisis in Europe, Shalunts and Backfried, Data Analytics 2016
35. To summarize
Opinions convey more than just information
They comprise a great (and free, most of the times) resource for getting to
know your audience students
You can use opinionated words/ emotions to connect to your audience
students
Many tools for sentiment analysis exist out there (some for free, but also
professional ones)
From an ML point of view
A challenging problem due to language, lack of labeled data, noisy data,
change and context
36Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content
36. Thank you! Questions/ Thoughts?
Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content 37
37. Contact
38Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content
Prof Dr. Eirini Ntoutsi
FG Intelligent Systems
Faculty of Electrical Engineering and Computer Science
Leibniz University Hannover & L3S Research Center
http://www.kbs.uni-hannover.de/~ntoutsi/
ntoutsi@l3s.de