Iulia Pasov is a senior Data Scientist working for Sixt SE, as well as a PhD student in Artificial Intelligence and Psychology and a WiDS Ambassador. As a Data Scientist, Iulia focuses on building AI-based services meant to optimize car rental processes, as well as pipelines for automatic training and deploying of machine learning models. For her studies, she searches ways to improve learning in online knowledge building communities with the use of artificial intelligence.
Speech Overview:
Sentiment analysis is one of the most known sub-domains of Natural Language Processing (NLP), especially used in the classification of feedback messages. This talk will condense over 15 years of research on different approaches in sentiment analysis, as they evolved during time. The audience will be guided through the advantages and disadvantages of each method, in order to understand how to approach the topic given their needs.
A Beginners Guide to Building a RAG App Using Open Source Milvus
Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule-based systems to transformers
1. Trends in Sentiment Analysis
and Opinion Mining
Iulia Pasov
Data Scientist
Munich-Lviv, October 2020
2. Motivation
• Machines process opinionated text to extract an opinion on a particular topic
• Over 80% of the available data are unstructured
• People prefer to express their thoughts in text (written or spoken)
𝑓(𝑡𝑒𝑥𝑡) = 𝑠𝑒𝑛𝑡𝑖𝑚𝑒𝑛𝑡
4. Machine
Reading
“Machine Reading-the autonomous understanding of text […]
By ‘understanding text‘ we mean the formation of a coherent
set of beliefs based on a textual corpus and a background
theory. Because the text and the background theory may be
inconsistent, it is natural to express the resultant beliefs, and
the underlying reasoning process, in probabilistic terms.“
Oren Etzioni, Michele Banko, and Michael J. Cafarella. Machine
reading. In Proceedings of the 21st National Conference on
Artificial Intelligence, 2006.
5. Machine
Reading
„In Predictably Irrational: The Hidden Forces That Shape Our Decisions, Ariely
has an impressive resume, and he isn’t shy about mining it for anecdotes to
support his argument. Readers are treated to many stories from his extensive
back catalog of research experiments. The accounts aren’t just limited to his
professional life, either. In addition to innumerable colleagues, readers are
introduced to wife Sumi and daughter Amit, discovering intimidate details
such as how Suri came to the decision to use an epidural during childbirth.“
from https://medium.com/west-stringfellow/predictably-irrational-summary-
and-review-6c3f5eeee346
Opinion?
Positive | Neutral | Negative
[-1, 1]
[1, 2, 3, 4, 5]
6. Philosophy of Artificial Intelligence
1950: Alan Turing publishes the paper Computing Machinery and
Intelligence
• Can machines think? (Turing’s test)
• The Imitation Game: 2 anonymized players, A and B can communicate with
judge C through a terminal. From the discussion, C must decide which of A
and B is a woman and which is a man
• Variation: one of the subjects is a machine. If the judge cannot tell which
player is a machine, the computer wins
• Speech - human exclusive
• Hacking the main AI test (humans pretending to be machines)
• Difficult to build
• Difficult to integrate
7. Trends over time
2006: Unsupervised approaches
• Lexicon-based approaches (e.g. SentiWordNet)
• Each word is associated with a sentiment score
• E.g. love (positive, 1), hate (negative, -1), pineapple (neutral, 0)
• Some generalisation needed
• f(document) = f(words in document)
• Most common average or time representation
• Pros
• Lexicons are publicly available on the Internet
• Do not require training or domain knowledge
• Can be computed very fast
• Good performance on short input
• Cons
• Order of words is not used
• ‘I like cats, not dogs’
• Context not used
• Words are not simple (e.g. terrible, goofy)
• Difficult to evaluate
Not the save icon
8. Trends over time
2006: old trends
“I really like my new phone because it’s fast and the battery lasts long
but I find it too big”
1. really neutral
like positive
new neutral
phone neutral
fast neutral
battery neutral
last neutral
long neutral
find neutral
big neutral
Positive
• Improvements with n-grams (e.g. “too big” is
negative)
• Additional rule-based assumptions required
• Performs worse on long texts (paragraph and
document level)
• Name entity recognition required (e.g. “Shaun
of the Dead”, “Mean Girls”)
• Difficult to extract understanding:
• Phone battery – positive
• Phone speed – positive
• Phone size – negative
• No understanding about context or relations
9. Trends over time
“A nice guy is an informal term, commonly used with either a literal or a sarcastic meaning,
for an man (often a young adult).
• In the literal sense, the term describes a man who is agreeable, gentle, compassionate,
sensitive and vulnerable […] In the context of a relationship, it may also refer to traits of
honesty, loyalty, romanticism, courtesy, and respect. When used negatively, a nice guy
implies a male who is unassertive or otherwise non-masculine. The opposite of a
genuine "nice guy" is commonly described as a "jerk", a term for a mean, selfish and
uncaring person.
• However, the term is also often used sarcastically, particularly in the context of dating, to
describe someone who believes himself to possess genuine "nice guy" characteristics,
even though he actually may not, and who uses acts of friendship and basic social
etiquette with the unstated aim of progressing to a romantic or sexual relationship”
• Source: https://en.wikipedia.org/wiki/Nice_guy
10. Trends over time
2006: old trends
• Interesting words
• Terrific
• Pos: My trip to Paris was terrific (great)
• Neg: I woke up due to terrific noise (related to terror)
• Nice
• Pos: My colleague spent 2hrs to explain the project. He’s such a nice guy…
• Neg: He befriended all the women in the office and pulled a nice guy act.
• Killer
• Pos: I just downloaded this killer app.
• Neg: ########################## (only good vibes in this presentation)
• Sick
• Pos: I’d love to do that, it sound sick
• Neg: I’d love to do that, but I sound sick
11. Trends over time
2006: Supervised approaches (based on BoW)
• Machine Learning based
• Pros
• Higher accuracy
• Customisable for different contexts
• Can be evaluated
• Cons
• Requires labelled data
• Very slow (in 2006)
• Does not respect the order of words
Remove Stop Words
Tokenization
POS Tagging
Syntactic Parsing
Semantic Analysis
Relation Extraction
Classifier (SVM, Bayes, Linear,
Random Forest, etc.)
Positive, Negative
or score
12. Trends over time
2006: old trends
“I really like my new phone because it’s fast and the battery lasts long
but I find it too big”
2. cat 0
dog 0
battery 1
phone 1
science 0
like 1
dislike 0
hate 0
find 1
big 1
???
• Improvements with n-grams (e.g. “too
big” vs “big too”)
• Difficult to extract understanding:
• Phone battery – positive
• Phone speed – positive
• Phone size – negative
• No understanding about relations or
context
• Order of words is ignored
• “I like cats, not dogs” and “I like
dogs, not cats” end up the same in
BoW
13. Problems with old approaches
• Lexicons preferred when there are no training data but:
• Difficult to compute mathematically
• 13 years of collecting data
• Humans never think at words independently
• Language is composed in time and order plays an important role
• Humans never think in tokens, lemmas or POS when identifying sentiments
and stop words give more meaning
• Linguistics and psychology are not that simple
Document Paragraphs Sentences Clauses Phrases Words Characters
14. Trends over time
2013: Supervised approaches – Deep Learning
• Find f such that f(Input text) = Sentiment
• Importance on embeddings:
• Similar meaning of words implies similar representations
• Neural computed embeddings
• More interest on similarities -> Word2Vect
• Pretrained and can be used as it is
• Good results on LSTMs (or any Seq2Seq) or even CNNs
Text (X) Embeddings Deep Neural Network
CNN, RNN, LSTM
Dense Output Sentiment (y)
15. Trends over time
2013: Supervised approaches – Deep Learning
• Neural word embeddings became an option which incapsulates semantics
• Fast retrieval but small memory footprint
• Which composition functions to use for complex language? (tree,
sequence, other)
• Long range dependencies are difficult to capture
• Fit for both long and short text
• Focus on architectures that infer meaning
• RNN – word associated with vector and context
• CNN – all words associated with all context on limited history
• Self-attention – all words associated with all contexts
17. Current trends
• Now (2017+): Transformers
• Word representation should rely on context
• Self attention layer: decides for each part of the sequence which other parts of the
sequence are important
• Similar to humans?
• Word embeddings -> contextualised word embeddings
RNN (LSTM)
• Pros:
• Unlimited context
• Recency bias
• Cons:
• Slow
• Strong recency bias
• Long range dependency
CNN
• Pros:
• Fast
• Computes local ngrams
• Cons:
• Limited context
• Strong local bias
• Long-range dependency
Self-Attention
• Pros:
• Fast
• Long range dependency
• Cons:
• Difficult to train
• Difficult hyperparameter optimization
• Memory intensive
18. Current trends
• Now (2017+): Transformers
• Captures references & syntactic dependency
• Vaswani et al., NiIPS’17
Figure from “Attention is All You Need” by Vaswani et al.Coreference Visualisation from :
https://ai.googleblog.com/2017/08/transformer-novel-neural-network.html
Famous architectures
• ELMo (2018)
• BERT (2018)
• XLNet (2019)
• T5 (2020)
19. Current trends
• How do transformers learn? (Bert)
• Randomly mask words – predict original value
• Too much mask (context) vs too little mask (expensive)
• Solution: mask, random or real
• Pairs of consecutive sentences – prediction next sentence
• Data: Wikipedia, Book corpus, Scientific publications (billions of words)
• Initial: TPU training 4 days, 2.5B words, 1M steps
I am giving a talk about sentiment analysis I am giving a [mask] about sentiment [mask]
A: I am giving a talk.
B: It is about sentiment analysis.
Label: IsNextSentence
A: I am giving a talk.
B: My dog is adorable.
Label: NotNextSentence
21. Future
• Q1: What are the cost & gain for using complex architectures on
sentiment analysis?
• GPU or infinite time - training on personal devices
• Similar to Word2Vect, not everyone needs to train such models
• Q2: Where do we stop?
• Better performance compared to humans for multiple tasks
• Research becomes difficult in small centres (big companies have an
advantage)
• Q3: What is the next big thing?
• More on context?
• Reducing size and computation
• More experiments on streams of attention