4. Intro & motivation
What is aspect based sentiment analysis?
[Definition from Semeval 2015]:
mining and summarizing opinions from
text about specific entities and their
aspects
I like the food but the waiters were rude
sentiment
ABSA Food
Service
+1 (+0.65)
5. Intro & motivation
Why do we need it?
- Higher resolution analysis
https://www.researchgate.net/publication/301408174_Twitter_sentiment_analysis http://159.89.224.205/wp-content/uploads/2016/05/tumblr_inline_o72ropcTfR1u37g00_540.png
6. Intro & motivation
Why do we need it?
- Higher resolution analysis
- It is a more principled way to perform
sentiment analysis … What would you do as a
human? (Specially when most reviews have
more than one sentiment)
7. Intro & motivation
What do these aspects look like?
They come in two flavors:
1- Dictionaries of aspect terms
https://www.researchgate.net/figure/Most-likely-words-from-4-topics-in-LDA-from-the-AP-corpus-the-topic-titles-in-quotes-are_fig1_220766449
8. Intro & motivation
What do these aspects look like?
They come in two flavors:
1- Dictionaries of aspect terms
2- Highlighting in the text
I like the food but the waiters were rude
9. Intro & motivation
What are some of the issues we need to take
into consideration?
- Ambiguity
Topic words like:
time/experience: could be used without meaning
to provide an opinion
kids: family or noise?
kind, noisy: sentiment ‘and’ topic
10. Intro & motivation
What are some of the issues we need to take
into consideration?
- Ambiguity
- multi word terms
greek yogurt with cucumber dill and garlic taste
crab salad with passion fruit vinaigrette
store events
the lady at the door (staff)
11. Intro & motivation
What are some of the issues we need to take
into consideration?
- Ambiguity
- multi word terms
- granularity
17. LDA
graph model
- α and η are parameters over the prior distributions over θ and β
- θd is the distribution of topics for document d (real vector of length K)
- βk is the distribution of words for topic k (real vector of length V)
- zd,n is the topic of the nth word of the dth document
- wd,n is the nth word of the dth document
18. LDA
graph model
- α and η are parameters over the prior distributions over θ and β
- θd is the distribution of topics for document d (real vector of length K)
- βk is the distribution of words for topic k (real vector of length V)
- zd,n is the topic of the nth word of the dth document
- wd,n is the nth word of the dth document
Observed
https://www.utdallas.edu/~nrr150130/cs6347/2015sp/lects/Lecture_17_LDA.pdf
19. LDA
training
How do we train this?
Gibbs Sampling ..What does that mean?
https://www.utdallas.edu/~nrr150130/cs6347/2015sp/lects/Lecture_17_LDA.pdf
20. LDA
training
Why does it work at all?
Topic words co-occurrence .. Let’s try to remember that.
https://www.utdallas.edu/~nrr150130/cs6347/2015sp/lects/Lecture_17_LDA.pdf
21. LDA
outcome
This is what we keep at the end
What does it look like?
https://www.utdallas.edu/~nrr150130/cs6347/2015sp/lects/Lecture_17_LDA.pdf
24. LDA
recap
https://www.utdallas.edu/~nrr150130/cs6347/2015sp/lects/Lecture_17_LDA.pdf
Pros:
- Strong mathematical basis -> consistent results
- Many existing of-the-shelf tools
- No domain dependence
- Unsupervised-ish, why not say fully unsupervised? Let’s
look at the cons
Cons:
- Requires list of stop words and preprocessed text
- Why is that an issue?
- Unigrams (can be worked around in a limited way)
- Outcome: list of words out of context
- Doesn’t work well for short text
- Why?
- What do people do to use LDA for short text?
38. ABAE
experimental setup
z: aspect
Sz: Set of words in z
D1(w): doc freq
D2(w1,w2): co-doc freq
Higher is more semantically coherent
Data:
- Citysearch: 50k+ Restaurant reviews where 3400 are manually labeled (6
aspects)
- BeerAdvocate: 1.5M reviews where 1k are manually labeled (5 aspects)
Performance measures:
(Precision, Recall, F1) & Coherence score
44. ABAE
second look
The authors initialized this with k-means
but this could be anything you want. For
instance …
https://aclweb.org/anthology/D18-1403
Summarizing opinions:
Aspect Extraction meets Sentiment Prediction and they are both weakly Supervised.
Angelidis and Lapata
46. ABAE
second look
pt is used for a linear
combination ofT’s entries
The only non-linearity in town.
47. ABAE
second look
The only non-linearity in town.
pt is used for a linear
combination ofT’s entries
The topic words are obtained
through NN overT’s entries
What the algorithm really does is search for
‘n’ points in the embedding space
representative of the topics + provide a
mechanism to mask stop words in the text
So how can we best use this for topic detection?
+
+
=
49. In conclusion
- Topic detection is difficult because it is domain and use
case specific. We need, however, for a proper inference of
“brand” profile.
- existing approaches fail to think about both the inference
of dictionaries and their use in specific context
- LDA provided a strong approach for language independent(-
ish) and unsupervised(-ish) topic modelling.
- … However, it is more likely that extensions and variations of ABAE will take
over in the future for the rich mechanisms they offer.