Sentiment analysis appears to be one of the easier tasks in the realm of text analytics: given a text like a tweet or product review, decide whether it contains positive or negative opinion. This task is almost trivial for humans, but it turns out to be a true challenge for automated systems. In fact, state-of-the-art sentiment analysis tools are wrong on approx. 4 out of 10 documents.
Current sentiment analysis tools are rule-based, feature-based, or combinations of both. However, recent research uses deep learning on very large sets of documents.
In this talk, we will explain the intrinsic difficulties of automated sentiment analysis; present existing solution approaches and their performance; describe an architecture for a deep learning system; and explore whether deep learning can improve sentiment analysis accuracy.
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
Can Deep Learning solve the Sentiment Analysis Problem
1. Can Deep Learning solve the Sentiment Analysis Problem?
Mark CieliebakZurichUniversity ofApplied Sciences
Annual Meeting ofSGAICO –Swiss Group forArtificialIntelligenceandCognitiveScience
18.11.2014
2. Outline
1.What is sentiment analysis?
2.How good are "classical" approaches?
3.Does deep learning solve the problem?
18.11.2014 Mark Cieliebak 2
3. About Me
18.11.2014 Mark Cieliebak 3
Mark Cieliebak
Institute of Applied Information Technology (InIT)
ZHAW, Winterthur
Email: ciel@zhaw.ch, Website: www.zhaw.ch/~ciel
Text
Analytics
Open
Data
Automated
Test
Generation
Research
Interests
Software
Engineering
4. WhatisSentiment Analysis
"… WiFiAnalytics isa freeAndroid appthatI find veryhandywhenitcomestotroubleshootingandmonitoringa homenetwork. "[1]
18.11.2014 Mark Cieliebak 4
5. Sample Application: SocialMedia Monitoring
Text AnalyticsComponents:
•Find relevant documents
•Hot topicAnalysis
•Sentiment analysis
18.11.2014 Mark Cieliebak 5
[7]
8. Simple Sentiment Analysis
Idea: Count numberofpositive andnegative words
"This cameraisgreat[+1]."
+1 (pos)
"I find itbeautiful[+1]andgood[+1]."
+2 (pos)
"Itlooksterrible[-1]."
-1 (neg)
"This carhasa bluecolor."
0 (neu)
POSITIVE:
great
love
nice
...
NEUTRAL:
hello
see
I
…
NEGATIVE:
bad
hate
ugly
...
UseSentiment-Dictionary:
18.11.2014 Mark Cieliebak 8
9. Sample Rules
18.11.2014 Mark Cieliebak 9
•DetectBooster Words: "The carisreallyveryexpensive[-1 -1 -2]."
•New Category"Mixed": "This carhasan appealing[+1]design andcomfortable[+1]seats, but itisexpensive[-1]."
•Negation: Invertonlyscore ofwordsoccuringafter thenegation: "The carisappealing[+3]andI do not[*-1]find itexpensive[-2]"
•I do notfind thecarexpensiveanditisappealing.
Need to“understand” thesentence
10. Linguistic Analysis
-> RULE: Invertscoresofwordsbeingin thesame phrasesasnegation.
“I do not find thecarexpensive[+2]
anditisappealing[+3].” → +5 (pos)
Sentence
Sentence
Conj.
Sentence
NounPhrase
Verb Phrase
Verb
Adverb
Verb
Noun Phrase
Adj.
Noun Phrase
Verb Phrase
Det.
Det
Noun
Det.
Verb
Participle
I
do
not
find
the
car
expensive
and
it
is
appealing
18.11.2014 Mark Cieliebak 10
11. Rule-BasedSentiment Analysis
Most ImportantIssues:
-Requiresgoodhand-craftedrules
-Hard totransfertonewtasksorlanguages
-Doesnot workwellfortextswithbadgrammer(Twitter)
18.11.2014 Mark Cieliebak 11
[5]
14. Corpus-BasedSentiment Analysis
AnnotatedCorpus
Sentence
Polarity
This analysis is good.
Pos
It looks awful.
Neg
This car has a blue color.
Neu
This car has an appealing design, comfortable seats, but it is expensive.
Mix
This carhasa veryappealingdesign, comfortableseats, but itisreallyexpensive.
Mix
This analysis is not good.
Neg
This car has an appealing design, comfortable seats and it is not expensive.
Mix
This movie was like a horror event.
Neg
This carisappealingandisnot expensive.
Mix
...
...
18.11.2014 Mark Cieliebak 14
15. Sample Features forTweets
•Word ngrams:presence or absence of contiguous sequences of 1, 2, 3, and 4 tokens; noncontiguous ngrams
•POS: the number of occurrences of each part-of-speechtag
•SentimentLexica: eachwordannotatedwithtonalityscore (-1..0..+1)
•Negation: the number of negated contexts
•Punctuation: the number of contiguous sequences of exclamation marks, question marks, and both exclamation and question marks
•Emoticons: presenceorabsence, last token is a positive or negative emoticon;
•Hashtags: the number of hashtags;
•Elongatedwords: the number of words with one character repeated (e.g. ‘soooo’)
from: Mohammad et al., SemEval2013
18.11.2014 Mark Cieliebak 15
16. Corpus-BasedSentiment Analysis
Most ImportantIssues:
-Requireslarge annotatedcorpora
-Dependson goodfeatures
18.11.2014 Mark Cieliebak 16
[6]
19. Tool Accuracy
0,2
0,3
0,4
0,5
0,6
0,7
0,8
Accuracy
Best Tool per Corpus
Worst Tool per Corpus
22
61%
40%
Avg.
18.11.2014 Mark Cieliebak
[14]
20. Tool Accuracy
0,2
0,3
0,4
0,5
0,6
0,7
0,8
Accuracy
Best Tool per Corpus
Worst Tool per Corpus
Overall Best Tool
23
61%
40%
59%
Avg.
18.11.2014 Mark Cieliebak
24. Word2Vec
•Hugesetoftextsamples(billionsofwords)
•Extractdictionary
•Word-Matrix: k-dimensional vectorforeachword(k typically50-500)
•Word vectorinitializedrandomly
•Train wordvectorstopredictnextwords, givena sequenceofwordsfromsample text
18.11.2014 Mark Cieliebak 27
Major contributionsbyBengioet al. 2003, Collobert&Weston2008, Socher et al. 2011, Mikolovet al. 2013
[9]
25. The Magic ofWord Vectors
18.11.2014 Mark Cieliebak 28
King -Man + Woman≈ Queen
Live Demo on 100b wordsfromGoogle News dataset: http://radimrehurek.com/2014/02/word2vec-tutorial/
[10]
27. UsingWord Vectorsin NLP
18.11.2014 Mark Cieliebak 30
Collobertet al., 2011:
•SENNA: GenericNLP System basedon wordvectors
•Nomanualfeatureengineering
•SolvesmanyNLP-Tasks asgoodasbenchmarksystems
[12]
28. DeepLearning andSentiment
Maas et al., 2011
•Enrichwordvectorswithsentimentcontext
•Capture semanticofwords(unsupervised) andsentiment(supervised) in parallel, usingmultiple learningtasks
wonderful
amazing
terrible
awful
18.11.2014 Mark Cieliebak 31
29. DeepLearning andSentiment
Socher et al. 2013:
•Word Vectorsdo not helpforSentiment Analysis
•RecursiveNeuralTensor Networks
•Representingsentencestructuresastreeswhileaddingsentimentannotationsat same time
•Restrictedtosingle, well-structuredsentences
•
18.11.2014 Mark Cieliebak 32
[13]
32. Conclusion: DeepLearning forSentiment
•Small improvements, not revolution
•Veryrecentresearch, not yet"end ofthestory"
•SemEval2015 will bebenchmark
18.11.2014 Mark Cieliebak 35
33. Talk in Short!
1.Classic approachesarerule-basedorcorpus-based
2.State-of-the-art toolsclassify4 out of10 docswrong
3.DeepLearning doesnot needhand-craftedfeatures
4.DeepLearning improvesexistingbenchmarks
18.11.2014 Mark Cieliebak 36
34. ThankYou!
Mark Cieliebak
ZurichUniversity ofApplied Sciences(ZHAW)
Winterthur, Switzerland
Email: ciel@zhaw.ch, Website: www.zhaw.ch/~ciel
18.11.2014 Mark Cieliebak 37
[15]