Can Deep Learning solve the Sentiment Analysis Problem

Can Deep Learning solve the Sentiment Analysis Problem?
Mark CieliebakZurichUniversity ofApplied Sciences
Annual Meeting ofSGAICO –Swiss Group forArtificialIntelligenceandCognitiveScience
18.11.2014

Outline
1.What is sentiment analysis?
2.How good are "classical" approaches?
3.Does deep learning solve the problem?
18.11.2014 Mark Cieliebak 2

About Me
Mark Cieliebak
Institute of Applied Information Technology (InIT)
ZHAW, Winterthur
Email: ciel@zhaw.ch, Website: www.zhaw.ch/~ciel
Text
Analytics
Open
Data
Automated
Test
Generation
Research
Interests
Software
Engineering

WhatisSentiment Analysis
"… WiFiAnalytics isa freeAndroid appthatI find veryhandywhenitcomestotroubleshootingandmonitoringa homenetwork. "[1]

Sample Application: SocialMedia Monitoring
Text AnalyticsComponents:
•Find relevant documents
•Hot topicAnalysis
•Sentiment analysis
[7]

FlavoursofSentiment Analysis
•DocumentBased
•SentenceBased
•Target-Specific
•Rating Prediction

Classic ApproachestoSentiment Analysis
Rule-Based
Corpus-Based
Predicted
Label
[3]
[4]

Simple Sentiment Analysis
Idea: Count numberofpositive andnegative words
"This cameraisgreat[+1]."
+1 (pos)
"I find itbeautiful[+1]andgood[+1]."
+2 (pos)
"Itlooksterrible[-1]."
-1 (neg)
"This carhasa bluecolor."
0 (neu)
POSITIVE:
great
love
nice
...
NEUTRAL:
hello
see
I
…
NEGATIVE:
bad
hate
ugly
...
UseSentiment-Dictionary:

Sample Rules
•DetectBooster Words: "The carisreallyveryexpensive[-1 -1 -2]."
•New Category"Mixed": "This carhasan appealing[+1]design andcomfortable[+1]seats, but itisexpensive[-1]."
•Negation: Invertonlyscore ofwordsoccuringafter thenegation: "The carisappealing[+3]andI do not[*-1]find itexpensive[-2]"
•I do notfind thecarexpensiveanditisappealing.
Need to“understand” thesentence

Linguistic Analysis
-> RULE: Invertscoresofwordsbeingin thesame phrasesasnegation.
“I do not find thecarexpensive[+2]
anditisappealing[+3].” → +5 (pos)
Sentence
Sentence
Conj.
Sentence
NounPhrase
Verb Phrase
Verb
Adverb
Verb
Noun Phrase
Adj.
Noun Phrase
Verb Phrase
Det.
Det
Noun
Det.
Verb
Participle
I
do
not
find
the
car
expensive
and
it
is
appealing

Rule-BasedSentiment Analysis
Most ImportantIssues:
-Requiresgoodhand-craftedrules
-Hard totransfertonewtasksorlanguages
-Doesnot workwellfortextswithbadgrammer(Twitter)
[5]

Classic ApproachestoSentiment Analysis
Rule-Based
Corpus-Based
Predicted
Label
[3]
[4]

Corpus-BasedSentiment Analysis
Predicted
Label
[4]

AnnotatedCorpus
Sentence
Polarity
This analysis is good.
Pos
It looks awful.
Neg
This car has a blue color.
Neu
This car has an appealing design, comfortable seats, but it is expensive.
Mix
This carhasa veryappealingdesign, comfortableseats, but itisreallyexpensive.
Mix
This analysis is not good.
Neg
This car has an appealing design, comfortable seats and it is not expensive.
Mix
This movie was like a horror event.
Neg
This carisappealingandisnot expensive.
Mix
...
...

Sample Features forTweets
•Word ngrams:presence or absence of contiguous sequences of 1, 2, 3, and 4 tokens; noncontiguous ngrams
•POS: the number of occurrences of each part-of-speechtag
•SentimentLexica: eachwordannotatedwithtonalityscore (-1..0..+1)
•Negation: the number of negated contexts
•Punctuation: the number of contiguous sequences of exclamation marks, question marks, and both exclamation and question marks
•Emoticons: presenceorabsence, last token is a positive or negative emoticon;
•Hashtags: the number of hashtags;
•Elongatedwords: the number of words with one character repeated (e.g. ‘soooo’)
from: Mohammad et al., SemEval2013

Most ImportantIssues:
-Requireslarge annotatedcorpora
-Dependson goodfeatures
[6]

HowgoodareSentiment Analysis Tools?

Quick Poll
•Short texts: 1-2 sentencesfromTwitter, news, reviewsetc.
•Three-classclassification: positive, negative, other
•Accuracy= #푐표푟푟푒푐푡푑표푐푠 #푑표푐푠
Mark Cieliebak 21
Accuracy
Votes
<50%
50-60%
60-70%
70-80%
80-90%
>90%
"Howgoodarestate-of-the-art sentimentanalysistools?"
18.11.2014

Tool Accuracy
0,2
0,3
0,4
0,5
0,6
0,7
0,8
Accuracy
Best Tool per Corpus
Worst Tool per Corpus
22
61%
40%
Avg.
18.11.2014 Mark Cieliebak
[14]

Tool Accuracy
0,2
0,3
0,4
0,5
0,6
0,7
0,8
Accuracy
Best Tool per Corpus
Worst Tool per Corpus
Overall Best Tool
23
61%
40%
59%
Avg.
18.11.2014 Mark Cieliebak

Take-Home Lesson
Accuracyofbestcommercialtoolon
arbitraryshorttextsis59%

ApproachestoSentiment Analysis
Rule-Based
Corpus-Based
Predicted
Label
[9]
DeepLearning
[8]

DeepLearning on Text
It'sall aboutWord Vectors!

Word2Vec
•Hugesetoftextsamples(billionsofwords)
•Extractdictionary
•Word-Matrix: k-dimensional vectorforeachword(k typically50-500)
•Word vectorinitializedrandomly
•Train wordvectorstopredictnextwords, givena sequenceofwordsfromsample text
Major contributionsbyBengioet al. 2003, Collobert&Weston2008, Socher et al. 2011, Mikolovet al. 2013
[9]

The Magic ofWord Vectors
King -Man + Woman≈ Queen
Live Demo on 100b wordsfromGoogle News dataset: http://radimrehurek.com/2014/02/word2vec-tutorial/
[10]

Relations LearnedbyWord2Vec
[11]

UsingWord Vectorsin NLP
Collobertet al., 2011:
•SENNA: GenericNLP System basedon wordvectors
•Nomanualfeatureengineering
•SolvesmanyNLP-Tasks asgoodasbenchmarksystems
[12]

DeepLearning andSentiment
Maas et al., 2011
•Enrichwordvectorswithsentimentcontext
•Capture semanticofwords(unsupervised) andsentiment(supervised) in parallel, usingmultiple learningtasks
wonderful
amazing
terrible
awful

Socher et al. 2013:
•Word Vectorsdo not helpforSentiment Analysis
•RecursiveNeuralTensor Networks
•Representingsentencestructuresastreeswhileaddingsentimentannotationsat same time
•Restrictedtosingle, well-structuredsentences
•
[13]

QuocandMikolov, 2014:
•"Paragraph Vectors"
•Add context(sentence, paragraph, document) towordvectorsduringtraining
•Improvesmanyexistingapproaches
[9]

DoesDeepLearning solvethe
Sentiment Analysis Problem?

Conclusion: DeepLearning forSentiment
•Small improvements, not revolution
•Veryrecentresearch, not yet"end ofthestory"
•SemEval2015 will bebenchmark

Talk in Short!
1.Classic approachesarerule-basedorcorpus-based
2.State-of-the-art toolsclassify4 out of10 docswrong
3.DeepLearning doesnot needhand-craftedfeatures
4.DeepLearning improvesexistingbenchmarks

ThankYou!
Mark Cieliebak
ZurichUniversity ofApplied Sciences(ZHAW)
Winterthur, Switzerland
Email: ciel@zhaw.ch, Website: www.zhaw.ch/~ciel
[15]

Can Deep Learning solve the Sentiment Analysis Problem

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Can Deep Learning solve the Sentiment Analysis Problem

Ähnlich wie Can Deep Learning solve the Sentiment Analysis Problem (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Can Deep Learning solve the Sentiment Analysis Problem