SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Downloaden Sie, um offline zu lesen
Trends in Sentiment Analysis
and Opinion Mining
Iulia Pasov
Data Scientist
Munich-Lviv, October 2020
Motivation
• Machines process opinionated text to extract an opinion on a particular topic
• Over 80% of the available data are unstructured
• People prefer to express their thoughts in text (written or spoken)
𝑓(𝑡𝑒𝑥𝑡) = 𝑠𝑒𝑛𝑡𝑖𝑚𝑒𝑛𝑡
Analyze product
feedback
Obtain trends in
public opinion /
Brand monitoring
Competitor watch
Tracking influencers Prioritize replies Predict churn
Predict stock trend
Usage
Machine
Reading
“Machine Reading-the autonomous understanding of text […]
By ‘understanding text‘ we mean the formation of a coherent
set of beliefs based on a textual corpus and a background
theory. Because the text and the background theory may be
inconsistent, it is natural to express the resultant beliefs, and
the underlying reasoning process, in probabilistic terms.“
Oren Etzioni, Michele Banko, and Michael J. Cafarella. Machine
reading. In Proceedings of the 21st National Conference on
Artificial Intelligence, 2006.
Machine
Reading
„In Predictably Irrational: The Hidden Forces That Shape Our Decisions, Ariely
has an impressive resume, and he isn’t shy about mining it for anecdotes to
support his argument. Readers are treated to many stories from his extensive
back catalog of research experiments. The accounts aren’t just limited to his
professional life, either. In addition to innumerable colleagues, readers are
introduced to wife Sumi and daughter Amit, discovering intimidate details
such as how Suri came to the decision to use an epidural during childbirth.“
from https://medium.com/west-stringfellow/predictably-irrational-summary-
and-review-6c3f5eeee346
Opinion?
Positive | Neutral | Negative
[-1, 1]
[1, 2, 3, 4, 5]
Philosophy of Artificial Intelligence
1950: Alan Turing publishes the paper Computing Machinery and
Intelligence
• Can machines think? (Turing’s test)
• The Imitation Game: 2 anonymized players, A and B can communicate with
judge C through a terminal. From the discussion, C must decide which of A
and B is a woman and which is a man
• Variation: one of the subjects is a machine. If the judge cannot tell which
player is a machine, the computer wins
• Speech - human exclusive
• Hacking the main AI test (humans pretending to be machines)
• Difficult to build
• Difficult to integrate
Trends over time
2006: Unsupervised approaches
• Lexicon-based approaches (e.g. SentiWordNet)
• Each word is associated with a sentiment score
• E.g. love (positive, 1), hate (negative, -1), pineapple (neutral, 0)
• Some generalisation needed
• f(document) = f(words in document)
• Most common average or time representation
• Pros
• Lexicons are publicly available on the Internet
• Do not require training or domain knowledge
• Can be computed very fast
• Good performance on short input
• Cons
• Order of words is not used
• ‘I like cats, not dogs’
• Context not used
• Words are not simple (e.g. terrible, goofy)
• Difficult to evaluate
Not the save icon
Trends over time
2006: old trends
“I really like my new phone because it’s fast and the battery lasts long
but I find it too big”
1. really neutral
like positive
new neutral
phone neutral
fast neutral
battery neutral
last neutral
long neutral
find neutral
big neutral
Positive
• Improvements with n-grams (e.g. “too big” is
negative)
• Additional rule-based assumptions required
• Performs worse on long texts (paragraph and
document level)
• Name entity recognition required (e.g. “Shaun
of the Dead”, “Mean Girls”)
• Difficult to extract understanding:
• Phone battery – positive
• Phone speed – positive
• Phone size – negative
• No understanding about context or relations
Trends over time
“A nice guy is an informal term, commonly used with either a literal or a sarcastic meaning,
for an man (often a young adult).
• In the literal sense, the term describes a man who is agreeable, gentle, compassionate,
sensitive and vulnerable […] In the context of a relationship, it may also refer to traits of
honesty, loyalty, romanticism, courtesy, and respect. When used negatively, a nice guy
implies a male who is unassertive or otherwise non-masculine. The opposite of a
genuine "nice guy" is commonly described as a "jerk", a term for a mean, selfish and
uncaring person.
• However, the term is also often used sarcastically, particularly in the context of dating, to
describe someone who believes himself to possess genuine "nice guy" characteristics,
even though he actually may not, and who uses acts of friendship and basic social
etiquette with the unstated aim of progressing to a romantic or sexual relationship”
• Source: https://en.wikipedia.org/wiki/Nice_guy
Trends over time
2006: old trends
• Interesting words
• Terrific
• Pos: My trip to Paris was terrific (great)
• Neg: I woke up due to terrific noise (related to terror)
• Nice
• Pos: My colleague spent 2hrs to explain the project. He’s such a nice guy…
• Neg: He befriended all the women in the office and pulled a nice guy act.
• Killer
• Pos: I just downloaded this killer app.
• Neg: ########################## (only good vibes in this presentation)
• Sick
• Pos: I’d love to do that, it sound sick
• Neg: I’d love to do that, but I sound sick
Trends over time
2006: Supervised approaches (based on BoW)
• Machine Learning based
• Pros
• Higher accuracy
• Customisable for different contexts
• Can be evaluated
• Cons
• Requires labelled data
• Very slow (in 2006)
• Does not respect the order of words
Remove Stop Words
Tokenization
POS Tagging
Syntactic Parsing
Semantic Analysis
Relation Extraction
Classifier (SVM, Bayes, Linear,
Random Forest, etc.)
Positive, Negative
or score
Trends over time
2006: old trends
“I really like my new phone because it’s fast and the battery lasts long
but I find it too big”
2. cat 0
dog 0
battery 1
phone 1
science 0
like 1
dislike 0
hate 0
find 1
big 1
???
• Improvements with n-grams (e.g. “too
big” vs “big too”)
• Difficult to extract understanding:
• Phone battery – positive
• Phone speed – positive
• Phone size – negative
• No understanding about relations or
context
• Order of words is ignored
• “I like cats, not dogs” and “I like
dogs, not cats” end up the same in
BoW
Problems with old approaches
• Lexicons preferred when there are no training data but:
• Difficult to compute mathematically
• 13 years of collecting data
• Humans never think at words independently
• Language is composed in time and order plays an important role
• Humans never think in tokens, lemmas or POS when identifying sentiments
and stop words give more meaning
• Linguistics and psychology are not that simple
Document Paragraphs Sentences Clauses Phrases Words Characters
Trends over time
2013: Supervised approaches – Deep Learning
• Find f such that f(Input text) = Sentiment
• Importance on embeddings:
• Similar meaning of words implies similar representations
• Neural computed embeddings
• More interest on similarities -> Word2Vect
• Pretrained and can be used as it is
• Good results on LSTMs (or any Seq2Seq) or even CNNs
Text (X) Embeddings Deep Neural Network
CNN, RNN, LSTM
Dense Output Sentiment (y)
Trends over time
2013: Supervised approaches – Deep Learning
• Neural word embeddings became an option which incapsulates semantics
• Fast retrieval but small memory footprint
• Which composition functions to use for complex language? (tree,
sequence, other)
• Long range dependencies are difficult to capture
• Fit for both long and short text
• Focus on architectures that infer meaning
• RNN – word associated with vector and context
• CNN – all words associated with all context on limited history
• Self-attention – all words associated with all contexts
Trends over time: Transformers
Current trends
• Now (2017+): Transformers
• Word representation should rely on context
• Self attention layer: decides for each part of the sequence which other parts of the
sequence are important
• Similar to humans?
• Word embeddings -> contextualised word embeddings
RNN (LSTM)
• Pros:
• Unlimited context
• Recency bias
• Cons:
• Slow
• Strong recency bias
• Long range dependency
CNN
• Pros:
• Fast
• Computes local ngrams
• Cons:
• Limited context
• Strong local bias
• Long-range dependency
Self-Attention
• Pros:
• Fast
• Long range dependency
• Cons:
• Difficult to train
• Difficult hyperparameter optimization
• Memory intensive
Current trends
• Now (2017+): Transformers
• Captures references & syntactic dependency
• Vaswani et al., NiIPS’17
Figure from “Attention is All You Need” by Vaswani et al.Coreference Visualisation from :
https://ai.googleblog.com/2017/08/transformer-novel-neural-network.html
Famous architectures
• ELMo (2018)
• BERT (2018)
• XLNet (2019)
• T5 (2020)
Current trends
• How do transformers learn? (Bert)
• Randomly mask words – predict original value
• Too much mask (context) vs too little mask (expensive)
• Solution: mask, random or real
• Pairs of consecutive sentences – prediction next sentence
• Data: Wikipedia, Book corpus, Scientific publications (billions of words)
• Initial: TPU training 4 days, 2.5B words, 1M steps
I am giving a talk about sentiment analysis I am giving a [mask] about sentiment [mask]
A: I am giving a talk.
B: It is about sentiment analysis.
Label: IsNextSentence
A: I am giving a talk.
B: My dog is adorable.
Label: NotNextSentence
Current trends
Source: https://gluebenchmark.com/leaderboard Date: 09.10.2020
Future
• Q1: What are the cost & gain for using complex architectures on
sentiment analysis?
• GPU or infinite time - training on personal devices
• Similar to Word2Vect, not everyone needs to train such models
• Q2: Where do we stop?
• Better performance compared to humans for multiple tasks
• Research becomes difficult in small centres (big companies have an
advantage)
• Q3: What is the next big thing?
• More on context?
• Reducing size and computation
• More experiments on streams of attention
T
H
A
N
K
Y
O
U
!

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Fypca4
Fypca4Fypca4
Fypca4
 
Fypca4
Fypca4Fypca4
Fypca4
 
Dcla13 discourse, computation and context – sociocultural dcla
Dcla13 discourse, computation and context – sociocultural dclaDcla13 discourse, computation and context – sociocultural dcla
Dcla13 discourse, computation and context – sociocultural dcla
 
Multimedia data minig and analytics sentiment analysis using social multimedia
Multimedia data minig and analytics sentiment analysis using social multimediaMultimedia data minig and analytics sentiment analysis using social multimedia
Multimedia data minig and analytics sentiment analysis using social multimedia
 
Weakly Supervised Machine Reading
Weakly Supervised Machine ReadingWeakly Supervised Machine Reading
Weakly Supervised Machine Reading
 
Natural Language Processing: Parsing
Natural Language Processing: ParsingNatural Language Processing: Parsing
Natural Language Processing: Parsing
 
NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2
 
natural language processing help at myassignmenthelp.net
natural language processing  help at myassignmenthelp.netnatural language processing  help at myassignmenthelp.net
natural language processing help at myassignmenthelp.net
 
Kishaloy Haldar and Wenqiang Lei - WESST - Sentiment Analysis of Social Media
Kishaloy Haldar and Wenqiang Lei - WESST - Sentiment Analysis of Social MediaKishaloy Haldar and Wenqiang Lei - WESST - Sentiment Analysis of Social Media
Kishaloy Haldar and Wenqiang Lei - WESST - Sentiment Analysis of Social Media
 
Sarcasm Detection: Achilles Heel of sentiment analysis
Sarcasm Detection: Achilles Heel of sentiment analysisSarcasm Detection: Achilles Heel of sentiment analysis
Sarcasm Detection: Achilles Heel of sentiment analysis
 
Machine Learning in NLP
Machine Learning in NLPMachine Learning in NLP
Machine Learning in NLP
 
Text classification & sentiment analysis
Text classification & sentiment analysisText classification & sentiment analysis
Text classification & sentiment analysis
 
Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4
 
[ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community
[ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community[ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community
[ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community
 
Lecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language TechnologyLecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language Technology
 
Natural language procssing
Natural language procssing Natural language procssing
Natural language procssing
 
Watson at RPI - Summer 2013
Watson at RPI - Summer 2013Watson at RPI - Summer 2013
Watson at RPI - Summer 2013
 
TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
TwitIE: An Open-Source Information Extraction Pipeline for Microblog TextTwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
 
Networks and Natural Language Processing
Networks and Natural Language ProcessingNetworks and Natural Language Processing
Networks and Natural Language Processing
 
UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2
 

Ähnlich wie Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule-based systems to transformers

02 naive bays classifier and sentiment analysis
02 naive bays classifier and sentiment analysis02 naive bays classifier and sentiment analysis
02 naive bays classifier and sentiment analysis
Subhas Kumar Ghosh
 
Advanced topics research
Advanced topics researchAdvanced topics research
Advanced topics research
kieran122
 

Ähnlich wie Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule-based systems to transformers (20)

DH Tools Workshop #1: Text Analysis
DH Tools Workshop #1:  Text AnalysisDH Tools Workshop #1:  Text Analysis
DH Tools Workshop #1: Text Analysis
 
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
 
Rigourous evaluation of nlp models in real world deployment
Rigourous evaluation of nlp models in real world deploymentRigourous evaluation of nlp models in real world deployment
Rigourous evaluation of nlp models in real world deployment
 
Introduction to NLP.pptx
Introduction to NLP.pptxIntroduction to NLP.pptx
Introduction to NLP.pptx
 
Core Methods In Educational Data Mining
Core Methods In Educational Data MiningCore Methods In Educational Data Mining
Core Methods In Educational Data Mining
 
02 naive bays classifier and sentiment analysis
02 naive bays classifier and sentiment analysis02 naive bays classifier and sentiment analysis
02 naive bays classifier and sentiment analysis
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
Nlp presentation
Nlp presentationNlp presentation
Nlp presentation
 
Why Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspectiveWhy Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspective
 
Social Media Analytics
Social Media AnalyticsSocial Media Analytics
Social Media Analytics
 
Braun, Clarke & Hayfield Thematic Analysis Part 3
Braun, Clarke & Hayfield Thematic Analysis Part 3Braun, Clarke & Hayfield Thematic Analysis Part 3
Braun, Clarke & Hayfield Thematic Analysis Part 3
 
Objective Fiction, i-semantics keynote
Objective Fiction, i-semantics keynoteObjective Fiction, i-semantics keynote
Objective Fiction, i-semantics keynote
 
Language of Politics on Twitter - 03 Analysis
Language of Politics on Twitter - 03 AnalysisLanguage of Politics on Twitter - 03 Analysis
Language of Politics on Twitter - 03 Analysis
 
Joint Rumour Stance and Veracity
Joint Rumour Stance and VeracityJoint Rumour Stance and Veracity
Joint Rumour Stance and Veracity
 
Text analysis
Text analysisText analysis
Text analysis
 
Sentiment Analysis (GDSCTU).pdf
Sentiment Analysis (GDSCTU).pdfSentiment Analysis (GDSCTU).pdf
Sentiment Analysis (GDSCTU).pdf
 
Conversational sensemaking Preece and Braines
Conversational sensemaking   Preece and BrainesConversational sensemaking   Preece and Braines
Conversational sensemaking Preece and Braines
 
The Science and Art of Qualitative Writing and Analysis UWI 2023
The Science and Art of Qualitative Writing and Analysis UWI 2023The Science and Art of Qualitative Writing and Analysis UWI 2023
The Science and Art of Qualitative Writing and Analysis UWI 2023
 
Braun, Clake & Hayfield Foundations of Qualitative Research 1 Part 1
Braun, Clake & Hayfield Foundations of Qualitative Research 1 Part 1Braun, Clake & Hayfield Foundations of Qualitative Research 1 Part 1
Braun, Clake & Hayfield Foundations of Qualitative Research 1 Part 1
 
Advanced topics research
Advanced topics researchAdvanced topics research
Advanced topics research
 

Mehr von IT Arena

Mada Seghete, Branch. Mobile Growth Trends
 Mada Seghete, Branch. Mobile Growth Trends Mada Seghete, Branch. Mobile Growth Trends
Mada Seghete, Branch. Mobile Growth Trends
IT Arena
 
Kostiantyn Bokhan, N-iX. CD4ML based on Azure and Kubeflow
Kostiantyn Bokhan, N-iX. CD4ML based on Azure and KubeflowKostiantyn Bokhan, N-iX. CD4ML based on Azure and Kubeflow
Kostiantyn Bokhan, N-iX. CD4ML based on Azure and Kubeflow
IT Arena
 

Mehr von IT Arena (20)

Shalini Agarwal, LinkedIn. Engineering excellence: marathon, not a sprint
Shalini Agarwal, LinkedIn. Engineering excellence: marathon, not a sprintShalini Agarwal, LinkedIn. Engineering excellence: marathon, not a sprint
Shalini Agarwal, LinkedIn. Engineering excellence: marathon, not a sprint
 
Dave Karow, Split. Powering Progressive Delivery With Data
Dave Karow, Split. Powering Progressive Delivery With DataDave Karow, Split. Powering Progressive Delivery With Data
Dave Karow, Split. Powering Progressive Delivery With Data
 
Ihar Mahaniok, Angel Investor. Hunting unicorns for early stage investments
Ihar Mahaniok, Angel Investor. Hunting unicorns for early stage investmentsIhar Mahaniok, Angel Investor. Hunting unicorns for early stage investments
Ihar Mahaniok, Angel Investor. Hunting unicorns for early stage investments
 
Yuriy Zaremba, AXDRAFT. How to sell your startup
Yuriy Zaremba, AXDRAFT. How to sell your startupYuriy Zaremba, AXDRAFT. How to sell your startup
Yuriy Zaremba, AXDRAFT. How to sell your startup
 
John Griffin, Ford Credit Europe. Normalising failure and making way for succ...
John Griffin, Ford Credit Europe. Normalising failure and making way for succ...John Griffin, Ford Credit Europe. Normalising failure and making way for succ...
John Griffin, Ford Credit Europe. Normalising failure and making way for succ...
 
Vitaliy Diatlenko, Uklon. Transforming your business with machine learning. T...
Vitaliy Diatlenko, Uklon. Transforming your business with machine learning. T...Vitaliy Diatlenko, Uklon. Transforming your business with machine learning. T...
Vitaliy Diatlenko, Uklon. Transforming your business with machine learning. T...
 
Chris Cassarino, SoftServe. Stop Fixating on Fixing – Solving the global enga...
Chris Cassarino, SoftServe. Stop Fixating on Fixing – Solving the global enga...Chris Cassarino, SoftServe. Stop Fixating on Fixing – Solving the global enga...
Chris Cassarino, SoftServe. Stop Fixating on Fixing – Solving the global enga...
 
Michael Labate, Intellias. EDI in the DNA: Why Equity, Diversity and Inclusio...
Michael Labate, Intellias. EDI in the DNA: Why Equity, Diversity and Inclusio...Michael Labate, Intellias. EDI in the DNA: Why Equity, Diversity and Inclusio...
Michael Labate, Intellias. EDI in the DNA: Why Equity, Diversity and Inclusio...
 
Beth Anne Katz, Microsoft. How to Product Manage Your Mental Health
Beth Anne Katz, Microsoft. How to Product Manage Your Mental HealthBeth Anne Katz, Microsoft. How to Product Manage Your Mental Health
Beth Anne Katz, Microsoft. How to Product Manage Your Mental Health
 
Sally Foote, GoCompare & Look After My Bills. Magic Goggles: the tools you ne...
Sally Foote, GoCompare & Look After My Bills. Magic Goggles: the tools you ne...Sally Foote, GoCompare & Look After My Bills. Magic Goggles: the tools you ne...
Sally Foote, GoCompare & Look After My Bills. Magic Goggles: the tools you ne...
 
Colleen Graneto, Airbnb. 3 steps to better product decision making
Colleen Graneto, Airbnb. 3 steps to better product decision makingColleen Graneto, Airbnb. 3 steps to better product decision making
Colleen Graneto, Airbnb. 3 steps to better product decision making
 
Vasyl Zadvornyy, Prozorro. The Future of Governance: Can a Script Replace the...
Vasyl Zadvornyy, Prozorro. The Future of Governance: Can a Script Replace the...Vasyl Zadvornyy, Prozorro. The Future of Governance: Can a Script Replace the...
Vasyl Zadvornyy, Prozorro. The Future of Governance: Can a Script Replace the...
 
Godard Abel, G2. The SaaS Trust Crisis
Godard Abel, G2. The SaaS Trust CrisisGodard Abel, G2. The SaaS Trust Crisis
Godard Abel, G2. The SaaS Trust Crisis
 
Zeb Evans, ClickUp. From $0 to $20M ARR in 2 Years: Bootstrapping to Natural ...
Zeb Evans, ClickUp. From $0 to $20M ARR in 2 Years: Bootstrapping to Natural ...Zeb Evans, ClickUp. From $0 to $20M ARR in 2 Years: Bootstrapping to Natural ...
Zeb Evans, ClickUp. From $0 to $20M ARR in 2 Years: Bootstrapping to Natural ...
 
Namir Anani, ICTC. Economic Resiliency in The Face of Adversity
Namir Anani, ICTC. Economic Resiliency in The Face of AdversityNamir Anani, ICTC. Economic Resiliency in The Face of Adversity
Namir Anani, ICTC. Economic Resiliency in The Face of Adversity
 
Mada Seghete, Branch. Mobile Growth Trends
 Mada Seghete, Branch. Mobile Growth Trends Mada Seghete, Branch. Mobile Growth Trends
Mada Seghete, Branch. Mobile Growth Trends
 
Julia Petryk, MacPaw. Product PR: a how-to guide
Julia Petryk, MacPaw. Product PR: a how-to guideJulia Petryk, MacPaw. Product PR: a how-to guide
Julia Petryk, MacPaw. Product PR: a how-to guide
 
Yaroslav Ravlinko, Intellias. You don’t need Kubernetes. You need to understa...
Yaroslav Ravlinko, Intellias. You don’t need Kubernetes. You need to understa...Yaroslav Ravlinko, Intellias. You don’t need Kubernetes. You need to understa...
Yaroslav Ravlinko, Intellias. You don’t need Kubernetes. You need to understa...
 
Yaroslav Novytskyy, Anton Vasylenko, N-iX. Migrating to the cloud: options an...
Yaroslav Novytskyy, Anton Vasylenko, N-iX. Migrating to the cloud: options an...Yaroslav Novytskyy, Anton Vasylenko, N-iX. Migrating to the cloud: options an...
Yaroslav Novytskyy, Anton Vasylenko, N-iX. Migrating to the cloud: options an...
 
Kostiantyn Bokhan, N-iX. CD4ML based on Azure and Kubeflow
Kostiantyn Bokhan, N-iX. CD4ML based on Azure and KubeflowKostiantyn Bokhan, N-iX. CD4ML based on Azure and Kubeflow
Kostiantyn Bokhan, N-iX. CD4ML based on Azure and Kubeflow
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 

Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule-based systems to transformers

  • 1. Trends in Sentiment Analysis and Opinion Mining Iulia Pasov Data Scientist Munich-Lviv, October 2020
  • 2. Motivation • Machines process opinionated text to extract an opinion on a particular topic • Over 80% of the available data are unstructured • People prefer to express their thoughts in text (written or spoken) 𝑓(𝑡𝑒𝑥𝑡) = 𝑠𝑒𝑛𝑡𝑖𝑚𝑒𝑛𝑡
  • 3. Analyze product feedback Obtain trends in public opinion / Brand monitoring Competitor watch Tracking influencers Prioritize replies Predict churn Predict stock trend Usage
  • 4. Machine Reading “Machine Reading-the autonomous understanding of text […] By ‘understanding text‘ we mean the formation of a coherent set of beliefs based on a textual corpus and a background theory. Because the text and the background theory may be inconsistent, it is natural to express the resultant beliefs, and the underlying reasoning process, in probabilistic terms.“ Oren Etzioni, Michele Banko, and Michael J. Cafarella. Machine reading. In Proceedings of the 21st National Conference on Artificial Intelligence, 2006.
  • 5. Machine Reading „In Predictably Irrational: The Hidden Forces That Shape Our Decisions, Ariely has an impressive resume, and he isn’t shy about mining it for anecdotes to support his argument. Readers are treated to many stories from his extensive back catalog of research experiments. The accounts aren’t just limited to his professional life, either. In addition to innumerable colleagues, readers are introduced to wife Sumi and daughter Amit, discovering intimidate details such as how Suri came to the decision to use an epidural during childbirth.“ from https://medium.com/west-stringfellow/predictably-irrational-summary- and-review-6c3f5eeee346 Opinion? Positive | Neutral | Negative [-1, 1] [1, 2, 3, 4, 5]
  • 6. Philosophy of Artificial Intelligence 1950: Alan Turing publishes the paper Computing Machinery and Intelligence • Can machines think? (Turing’s test) • The Imitation Game: 2 anonymized players, A and B can communicate with judge C through a terminal. From the discussion, C must decide which of A and B is a woman and which is a man • Variation: one of the subjects is a machine. If the judge cannot tell which player is a machine, the computer wins • Speech - human exclusive • Hacking the main AI test (humans pretending to be machines) • Difficult to build • Difficult to integrate
  • 7. Trends over time 2006: Unsupervised approaches • Lexicon-based approaches (e.g. SentiWordNet) • Each word is associated with a sentiment score • E.g. love (positive, 1), hate (negative, -1), pineapple (neutral, 0) • Some generalisation needed • f(document) = f(words in document) • Most common average or time representation • Pros • Lexicons are publicly available on the Internet • Do not require training or domain knowledge • Can be computed very fast • Good performance on short input • Cons • Order of words is not used • ‘I like cats, not dogs’ • Context not used • Words are not simple (e.g. terrible, goofy) • Difficult to evaluate Not the save icon
  • 8. Trends over time 2006: old trends “I really like my new phone because it’s fast and the battery lasts long but I find it too big” 1. really neutral like positive new neutral phone neutral fast neutral battery neutral last neutral long neutral find neutral big neutral Positive • Improvements with n-grams (e.g. “too big” is negative) • Additional rule-based assumptions required • Performs worse on long texts (paragraph and document level) • Name entity recognition required (e.g. “Shaun of the Dead”, “Mean Girls”) • Difficult to extract understanding: • Phone battery – positive • Phone speed – positive • Phone size – negative • No understanding about context or relations
  • 9. Trends over time “A nice guy is an informal term, commonly used with either a literal or a sarcastic meaning, for an man (often a young adult). • In the literal sense, the term describes a man who is agreeable, gentle, compassionate, sensitive and vulnerable […] In the context of a relationship, it may also refer to traits of honesty, loyalty, romanticism, courtesy, and respect. When used negatively, a nice guy implies a male who is unassertive or otherwise non-masculine. The opposite of a genuine "nice guy" is commonly described as a "jerk", a term for a mean, selfish and uncaring person. • However, the term is also often used sarcastically, particularly in the context of dating, to describe someone who believes himself to possess genuine "nice guy" characteristics, even though he actually may not, and who uses acts of friendship and basic social etiquette with the unstated aim of progressing to a romantic or sexual relationship” • Source: https://en.wikipedia.org/wiki/Nice_guy
  • 10. Trends over time 2006: old trends • Interesting words • Terrific • Pos: My trip to Paris was terrific (great) • Neg: I woke up due to terrific noise (related to terror) • Nice • Pos: My colleague spent 2hrs to explain the project. He’s such a nice guy… • Neg: He befriended all the women in the office and pulled a nice guy act. • Killer • Pos: I just downloaded this killer app. • Neg: ########################## (only good vibes in this presentation) • Sick • Pos: I’d love to do that, it sound sick • Neg: I’d love to do that, but I sound sick
  • 11. Trends over time 2006: Supervised approaches (based on BoW) • Machine Learning based • Pros • Higher accuracy • Customisable for different contexts • Can be evaluated • Cons • Requires labelled data • Very slow (in 2006) • Does not respect the order of words Remove Stop Words Tokenization POS Tagging Syntactic Parsing Semantic Analysis Relation Extraction Classifier (SVM, Bayes, Linear, Random Forest, etc.) Positive, Negative or score
  • 12. Trends over time 2006: old trends “I really like my new phone because it’s fast and the battery lasts long but I find it too big” 2. cat 0 dog 0 battery 1 phone 1 science 0 like 1 dislike 0 hate 0 find 1 big 1 ??? • Improvements with n-grams (e.g. “too big” vs “big too”) • Difficult to extract understanding: • Phone battery – positive • Phone speed – positive • Phone size – negative • No understanding about relations or context • Order of words is ignored • “I like cats, not dogs” and “I like dogs, not cats” end up the same in BoW
  • 13. Problems with old approaches • Lexicons preferred when there are no training data but: • Difficult to compute mathematically • 13 years of collecting data • Humans never think at words independently • Language is composed in time and order plays an important role • Humans never think in tokens, lemmas or POS when identifying sentiments and stop words give more meaning • Linguistics and psychology are not that simple Document Paragraphs Sentences Clauses Phrases Words Characters
  • 14. Trends over time 2013: Supervised approaches – Deep Learning • Find f such that f(Input text) = Sentiment • Importance on embeddings: • Similar meaning of words implies similar representations • Neural computed embeddings • More interest on similarities -> Word2Vect • Pretrained and can be used as it is • Good results on LSTMs (or any Seq2Seq) or even CNNs Text (X) Embeddings Deep Neural Network CNN, RNN, LSTM Dense Output Sentiment (y)
  • 15. Trends over time 2013: Supervised approaches – Deep Learning • Neural word embeddings became an option which incapsulates semantics • Fast retrieval but small memory footprint • Which composition functions to use for complex language? (tree, sequence, other) • Long range dependencies are difficult to capture • Fit for both long and short text • Focus on architectures that infer meaning • RNN – word associated with vector and context • CNN – all words associated with all context on limited history • Self-attention – all words associated with all contexts
  • 16. Trends over time: Transformers
  • 17. Current trends • Now (2017+): Transformers • Word representation should rely on context • Self attention layer: decides for each part of the sequence which other parts of the sequence are important • Similar to humans? • Word embeddings -> contextualised word embeddings RNN (LSTM) • Pros: • Unlimited context • Recency bias • Cons: • Slow • Strong recency bias • Long range dependency CNN • Pros: • Fast • Computes local ngrams • Cons: • Limited context • Strong local bias • Long-range dependency Self-Attention • Pros: • Fast • Long range dependency • Cons: • Difficult to train • Difficult hyperparameter optimization • Memory intensive
  • 18. Current trends • Now (2017+): Transformers • Captures references & syntactic dependency • Vaswani et al., NiIPS’17 Figure from “Attention is All You Need” by Vaswani et al.Coreference Visualisation from : https://ai.googleblog.com/2017/08/transformer-novel-neural-network.html Famous architectures • ELMo (2018) • BERT (2018) • XLNet (2019) • T5 (2020)
  • 19. Current trends • How do transformers learn? (Bert) • Randomly mask words – predict original value • Too much mask (context) vs too little mask (expensive) • Solution: mask, random or real • Pairs of consecutive sentences – prediction next sentence • Data: Wikipedia, Book corpus, Scientific publications (billions of words) • Initial: TPU training 4 days, 2.5B words, 1M steps I am giving a talk about sentiment analysis I am giving a [mask] about sentiment [mask] A: I am giving a talk. B: It is about sentiment analysis. Label: IsNextSentence A: I am giving a talk. B: My dog is adorable. Label: NotNextSentence
  • 21. Future • Q1: What are the cost & gain for using complex architectures on sentiment analysis? • GPU or infinite time - training on personal devices • Similar to Word2Vect, not everyone needs to train such models • Q2: Where do we stop? • Better performance compared to humans for multiple tasks • Research becomes difficult in small centres (big companies have an advantage) • Q3: What is the next big thing? • More on context? • Reducing size and computation • More experiments on streams of attention