SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Downloaden Sie, um offline zu lesen
Information Studies




Sentiment strength detection
for the social web:
From YouTube arguments to
Twitter praise
Mike Thelwall
University of Wolverhampton, UK
Sentiment Strength Detection
with SentiStrength
1. Detect positive and negative sentiment
   strength in short informal text
     1.     Develop workarounds for lack of standard
            grammar and spelling
     2.     Harness emotion expression forms unique to
            MySpace or CMC (e.g., :-) or haaappppyyy!!!)
     3.     Classify simultaneously as positive 1-5 AND
            negative 1-5 sentiment
2. Apply to MySpace comments and social
       issues
             Thelwall, M., Buckley, K., Paltoglou, G., Cai, D., & Kappas, A. (2010).
                     Sentiment strength detection in short informal text.
Journal of the American Society for Information Science and Technology, 61(12), 2544-2558.
SentiStrength 1 Algorithm - Core
  List of 890 positive and negative
  sentiment terms and strengths (1 to 5),
  e.g.
     ache = -2, dislike = -3, hate=-4,
      excruciating -5
     encourage = 2, coolest = 3, lover = 4
   Sentiment strength is highest in
  sentence; or highest sentence if multiple
  sentences
Examples                    positive, negative
            -2
                                     1, -2
 My legs ache.
                 3
                                     3, -1
 You are the coolest.

 I hate Paul but encourage him.     2, -4

   -4                   2
Term Strength Optimisation
  Term strengths (e.g., ache = -2)
  initially fixed by a human coder
  Term strengths optimised on training
  set with 10-fold cross-validation
     Adjust term strengths to give best training
      set results then evaluate on test set
     E.g., training set: “My legs ache”: coder
      sentiment = 1,-3 => adjust sentiment of
      “ache” from -2 to -3.
Summary of sentiment
methods
  sentiment word strength list             terrify=-4
     “miss” = +2, -2
  spelling corrected                   nicce -> nice
  booster words alter strength          very happy
  negating words flip emotions             not nice
  repeated letters boost sentiment/+ve        niiiice
  emoticon list                              :) =+2
  exclamation marks count as +2 unless –ve       hi!
  repeated punctuation boosts sentiment     good!!!
  negative emotion ignored in questions u h8 me?
Experiments
  Development data = 2600 MySpace
  comments coded by 1 coder
  Test data = 1041 MySpace comments
  coded by 3 independent coders
  Comparison against a range of standard
  machine learning algorithms
Test data: Inter-coder agreement
                                   Comparison +ve    -ve
                                     for 1041 agree- agree-
Krippendorff’s inter-coder           MySpace ment ment
                                     texts
weighted alpha = 0.5743
for positive and 0.5634            Coder 1 vs. 2   51.0% 67.3%
for negative sentiment

Only moderate agreement            Coder 1 vs. 3   55.7% 76.3%

between coders
but it is a hard 5-category task   Coder 2 vs. 3   61.4% 68.2%
SentiStrength vs. 693 other algorithms/variations

Results:+ve sentiment strength
Algorithm                    Optimal     Accuracy   Accuracy   Correlation
                             #features              +/- 1
                                                    class
SentiStrength                -           60.6%      96.9%      .599
Simple logistic regression 700           58.5%      96.1%      .557
SVM (SMO)                    800         57.6%      95.4%      .538
J48 classification tree      700         55.2%      95.9%      .548
JRip rule-based classifier   700         54.3%      96.4%      .476
SVM regression (SMO)         100         54.1%      97.3%      .469
AdaBoost                     100         53.3%      97.5%      .464
Decision table               200         53.3%      96.7%      .431
Multilayer Perceptron        100         50.0%      94.1%      .422
Naïve Bayes                  100         49.1%      91.4%      .567
Baseline                     -           47.3%      94.0%      -
Random                       -           19.8%      56.9%      .016
SentiStrength vs. 693 other algorithms/variations


Results:-ve sentiment strength
Algorithm                    Optimal Accuracy   Accuracy   Correlation
                             #features          +/- 1
                                                class
SVM (SMO)                    100     73.5%      92.7%      .421
SVM regression (SMO)         300     73.2%      91.9%      .363
Simple logistic regression 800       72.9%      92.2%      .364
SentiStrength                -       72.8%      95.1%      .564
Decision table               100     72.7%      92.1%      .346
JRip rule-based classifier   500     72.2%      91.5%      .309
J48 classification tree      400     71.1%      91.6%      .235
Multilayer Perceptron        100     70.1%      92.5%      .346
AdaBoost                     100     69.9%      90.6%      -
Baseline                     -       69.9%      90.6%      -
Naïve Bayes                  200     68.0%      89.8%      .311
Random                       -       20.5%      46.0%      .010
Example differences/errors
  THINK 4 THE ADD
     Computer (1,-1), Human (2,-1)


  0MG 0MG 0MG 0MG 0MG 0MG 0MG
  0MG!!!!!!!!!!!!!!!!!!!!N33N3R!!!!!!!!!!!!!!!!
     Computer (2,-1), Human (5,-1)
Application - Evidence of emotion
homophily in MySpace
  Automatic analysis of sentiment in 2
  million comments exchanged between
  MySpace friends
  Correlation of 0.227 for +ve emotion
  strength and 0.254 for –ve
  People tend to use similar but not
  identical levels of emotion to their
  friends in messages
SentiStrength 2
  Sentiment analysis programs are typically
  domain-dependant
  SentiStrength is designed to be quite generic
    Does not pick up domain-specific non-
     sentiment terms, e.g., G3
  SentiStrength 2.0 has extended negative
  sentiment dictionary
    In response to weakness for negative

     sentiment
             Thelwall, M., Buckley, K., Paltoglou, G. (submitted).
      High Face Validity Sentiment Strength Detection for the Social Web
6 Social web data sets




                   To test on a wide range
                 of different Social Web text
SentiStrength 2
(unsupervised) tests
          Social web sentiment analysis is
      Tested against human coder results
       less domain dependant than reviews
                       Positive    Negative
       Data set        Correlation Correlation
       YouTube               0.589        0.521
       MySpace               0.647        0.599
       Twitter               0.541        0.499
       Sports forum          0.567        0.541
       Digg.com news         0.352        0.552
       BBC forums            0.296        0.591
       All 6                 0.556        0.565
Why the bad results for BBC?
  Long texts, mainly negative, expressive
  language used, e.g.,
     David Cameron must be very happy that I
      have lost my job.
     It is really interesting that David Cameron
      and most of his ministers are millionaires.
     Your argument is a joke.
SentiStrength vs. Machine
learning for Social Web texts

 • Machine learning performs a bit better
   overall (7 out of 12 data sets/+ve or
   negative)
   • Logistic Regression with trigrams, including
     punctuation and emoticons; 200 features
 • But has “domain transfer” and “face
   validity” problems for some tasks
SentiStrength software
  Versions: Windows, Java, live online
  (sentistrength.wlv.ac.uk)
  German version (Hannes Pirker)
  Variants: Binary (positive/negative),
  trinary (positive/neutral/negative) and
  scale (-4 to +4)
  Sold commercially - & purchasers
  converting to French, Spanish,
  Portuguese, & ?
Sentistrength




                                                       Collective Emotions
                                                       in Cyberspace



CYBEREMOTIONS = data gathering + complex systems methods + ICT outputs
Application – sentiment in Twitter
events
      Analysis of a corpus of 1 month of English
      Twitter posts
      Automatic detection of spikes (events)
      Sentiment strength classification of all posts
      Assessment of whether sentiment strength
      increases during important events



      Thelwall, M., Buckley, K., & Paltoglou, G. (2011). Sentiment in Twitter events.
Journal of the American Society for Information Science and Technology, 62(2), 406-418.
Automatically-identified Twitter
      spikes


                        Proportion of tweets
                        mentioning keyword




                                               9 Mar 2010
9 Feb 2010
Proportion of tweets
                            Chile
matching posts


                                                                   mentioning Chile




    9 Feb 2010                                   Date and time                   9 Mar 2010

                                                                     Av. +ve sentiment
Subj. Sentiment strength




                                                                          Just subj.
                           Increase in –ve sentiment strength        Av. -ve sentiment
                                                                          Just subj.




    9 Feb 2010                                 Date and time                     9 Mar 2010
#oscars
% matching posts
                                                                  Proportion of tweets
                                                                 mentioning the Oscars




     9 Feb 2010                                  Date and time                           9 Mar 2010

                                                                                 Av. +ve sentime
                                                                                      Just subj.
Subj. Sentiment strength




                           Increase in –ve sentiment strength                    Av. -ve sentime
                                                                                      Just subj.




     9 Feb 2010                                Date and time                             9 Mar 2010
Sentiment and spikes
  Analysis of top 30 spiking events
     Strong evidence (p=0.001) that higher
      volume hours have stronger negative
      sentiment than lower volume hours
     Insufficient evidence (p=0.014) that higher
      volume hours have different positive
      sentiment strength than lower volume
      hours
=> Spikes are typified by increases in
negativity
Proportion of tweets
             Bieber
          mentioning Bieber




9 Feb 2010                       Date and time                9 Mar 2010

                                                             Av. +ve sen
                                                                  Just su
                         But there is plenty of positivity   Av. -ve sen
                         if you know where to look!               Just su




                                                               9 Mar 2010
9 Feb 2010                         Date and time
YouTube Video comments
               Short text messages left for a video by
               viewers
               Up to 1000 per video accessible via the
               YouTube API
               A good source of social web text data




         Mike Thelwall Pardeep Sud Farida Vis (submitted)
Commenting on YouTube Videos: From Guatemalan Rock to El Big Bang
Sentiment in YouTube
comments




 Predominantly positive comments
Trends in YouTube comment
sentiment
  +ve and –ve sentiment strengths negatively
  correlated for videos (Spearman’s rho -0.213)
  # comments on a video correlates with –ve
  sentiment strength (Spearman’s rho 0.242,
  p=0.000) and negatively correlates with +ve
  sentiment strength (Spearman’s rho -0.113) –
  negativity drives commenting even though
  it is rare!
  Qualitative: Big debates over religion
     No discussion about aging rock stars!
Conclusion
  Automatic classification of sentiment
  strength is possible for the social web –
  even unsupervised!
     …not good for longer, political messages?
  Hard to get accuracy much over 60%?
  Can identify trends through automatic
  analysis of sentiment in millions of
  social web messages
Bibliography
  Thelwall, M., Buckley, K., Paltoglou, G., Cai, D., &
  Kappas, A. (2010). Sentiment strength detection in
  short informal text. Journal of the American Society
  for Information Science and Technology, 61(12),
  2544–2558.
  Thelwall, M., Buckley, K., & Paltoglou, G. (2011).
  Sentiment in Twitter events. Journal of the American
  Society for Information Science and Technology,
  61(12), 2544–2558.

Weitere ähnliche Inhalte

Ähnlich wie Mike Thelwall - Sentiment strength detection for the social web: From YouTube arguments to Twitter praise

Computational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding RegionsComputational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding Regionsbutest
 
TextMiningTwitters
TextMiningTwittersTextMiningTwitters
TextMiningTwittersLiu Chang
 
GradTrack: Getting Started with Statistics September 20, 2018
GradTrack: Getting Started with Statistics September 20, 2018GradTrack: Getting Started with Statistics September 20, 2018
GradTrack: Getting Started with Statistics September 20, 2018Nancy Garmer
 
Sentiment analysis: Incremental learning to build domain-models
Sentiment analysis: Incremental learning to build domain-modelsSentiment analysis: Incremental learning to build domain-models
Sentiment analysis: Incremental learning to build domain-modelsRaimon Bosch
 
Cs221 lecture5-fall11
Cs221 lecture5-fall11Cs221 lecture5-fall11
Cs221 lecture5-fall11darwinrlo
 
Machine Learning in a Flash (Extended Edition): An Introduction to Natural La...
Machine Learning in a Flash (Extended Edition): An Introduction to Natural La...Machine Learning in a Flash (Extended Edition): An Introduction to Natural La...
Machine Learning in a Flash (Extended Edition): An Introduction to Natural La...Kory Becker
 
April 10th of 2018 budapest presentation
April 10th of 2018 budapest presentationApril 10th of 2018 budapest presentation
April 10th of 2018 budapest presentationAhmet Bulut
 
Ashford 2 - Week 1 - Instructor GuidanceWeek OverviewThe f.docx
Ashford 2 - Week 1 - Instructor GuidanceWeek OverviewThe f.docxAshford 2 - Week 1 - Instructor GuidanceWeek OverviewThe f.docx
Ashford 2 - Week 1 - Instructor GuidanceWeek OverviewThe f.docxfredharris32
 
Who will RT this?: Automatically Identifying and Engaging Strangers on Twitte...
Who will RT this?: Automatically Identifying and Engaging Strangers on Twitte...Who will RT this?: Automatically Identifying and Engaging Strangers on Twitte...
Who will RT this?: Automatically Identifying and Engaging Strangers on Twitte...Jeffrey Nichols
 
Grammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy Gryshchuk
Grammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy GryshchukGrammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy Gryshchuk
Grammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy GryshchukGrammarly
 
Emotion Classification In Software Engineering Texts: A Comparative Analysis ...
Emotion Classification In Software Engineering Texts: A Comparative Analysis ...Emotion Classification In Software Engineering Texts: A Comparative Analysis ...
Emotion Classification In Software Engineering Texts: A Comparative Analysis ...Mia Mohammad Imran
 
RCOMM 2011 - Sentiment Classification
RCOMM 2011 - Sentiment ClassificationRCOMM 2011 - Sentiment Classification
RCOMM 2011 - Sentiment Classificationbohanairl
 
RCOMM 2011 - Sentiment Classification with RapidMiner
RCOMM 2011 - Sentiment Classification with RapidMinerRCOMM 2011 - Sentiment Classification with RapidMiner
RCOMM 2011 - Sentiment Classification with RapidMinerbohanairl
 
OPTIMIZATION OF CROSS DOMAIN SENTIMENT ANALYSIS USING SENTIWORDNET
OPTIMIZATION OF CROSS DOMAIN SENTIMENT ANALYSIS USING SENTIWORDNETOPTIMIZATION OF CROSS DOMAIN SENTIMENT ANALYSIS USING SENTIWORDNET
OPTIMIZATION OF CROSS DOMAIN SENTIMENT ANALYSIS USING SENTIWORDNETijfcstjournal
 

Ähnlich wie Mike Thelwall - Sentiment strength detection for the social web: From YouTube arguments to Twitter praise (20)

Alleviating Data Sparsity for Twitter Sentiment Analysis
Alleviating Data Sparsity for Twitter Sentiment AnalysisAlleviating Data Sparsity for Twitter Sentiment Analysis
Alleviating Data Sparsity for Twitter Sentiment Analysis
 
Media 330057 smxx
Media 330057 smxxMedia 330057 smxx
Media 330057 smxx
 
Computational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding RegionsComputational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding Regions
 
TextMiningTwitters
TextMiningTwittersTextMiningTwitters
TextMiningTwitters
 
Overfitting and-tbl
Overfitting and-tblOverfitting and-tbl
Overfitting and-tbl
 
GradTrack: Getting Started with Statistics September 20, 2018
GradTrack: Getting Started with Statistics September 20, 2018GradTrack: Getting Started with Statistics September 20, 2018
GradTrack: Getting Started with Statistics September 20, 2018
 
GradTrack: Getting Started with Statistics September 20, 2018
GradTrack: Getting Started with Statistics September 20, 2018GradTrack: Getting Started with Statistics September 20, 2018
GradTrack: Getting Started with Statistics September 20, 2018
 
Sentiment analysis: Incremental learning to build domain-models
Sentiment analysis: Incremental learning to build domain-modelsSentiment analysis: Incremental learning to build domain-models
Sentiment analysis: Incremental learning to build domain-models
 
Cs221 lecture5-fall11
Cs221 lecture5-fall11Cs221 lecture5-fall11
Cs221 lecture5-fall11
 
Analyse de sentiment et classification par approche neuronale en Python et Weka
Analyse de sentiment et classification par approche neuronale en Python et WekaAnalyse de sentiment et classification par approche neuronale en Python et Weka
Analyse de sentiment et classification par approche neuronale en Python et Weka
 
Machine Learning in a Flash (Extended Edition): An Introduction to Natural La...
Machine Learning in a Flash (Extended Edition): An Introduction to Natural La...Machine Learning in a Flash (Extended Edition): An Introduction to Natural La...
Machine Learning in a Flash (Extended Edition): An Introduction to Natural La...
 
April 10th of 2018 budapest presentation
April 10th of 2018 budapest presentationApril 10th of 2018 budapest presentation
April 10th of 2018 budapest presentation
 
Ashford 2 - Week 1 - Instructor GuidanceWeek OverviewThe f.docx
Ashford 2 - Week 1 - Instructor GuidanceWeek OverviewThe f.docxAshford 2 - Week 1 - Instructor GuidanceWeek OverviewThe f.docx
Ashford 2 - Week 1 - Instructor GuidanceWeek OverviewThe f.docx
 
Who will RT this?: Automatically Identifying and Engaging Strangers on Twitte...
Who will RT this?: Automatically Identifying and Engaging Strangers on Twitte...Who will RT this?: Automatically Identifying and Engaging Strangers on Twitte...
Who will RT this?: Automatically Identifying and Engaging Strangers on Twitte...
 
Grammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy Gryshchuk
Grammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy GryshchukGrammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy Gryshchuk
Grammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy Gryshchuk
 
DL.pptx
DL.pptxDL.pptx
DL.pptx
 
Emotion Classification In Software Engineering Texts: A Comparative Analysis ...
Emotion Classification In Software Engineering Texts: A Comparative Analysis ...Emotion Classification In Software Engineering Texts: A Comparative Analysis ...
Emotion Classification In Software Engineering Texts: A Comparative Analysis ...
 
RCOMM 2011 - Sentiment Classification
RCOMM 2011 - Sentiment ClassificationRCOMM 2011 - Sentiment Classification
RCOMM 2011 - Sentiment Classification
 
RCOMM 2011 - Sentiment Classification with RapidMiner
RCOMM 2011 - Sentiment Classification with RapidMinerRCOMM 2011 - Sentiment Classification with RapidMiner
RCOMM 2011 - Sentiment Classification with RapidMiner
 
OPTIMIZATION OF CROSS DOMAIN SENTIMENT ANALYSIS USING SENTIWORDNET
OPTIMIZATION OF CROSS DOMAIN SENTIMENT ANALYSIS USING SENTIWORDNETOPTIMIZATION OF CROSS DOMAIN SENTIMENT ANALYSIS USING SENTIWORDNET
OPTIMIZATION OF CROSS DOMAIN SENTIMENT ANALYSIS USING SENTIWORDNET
 

Mehr von yaevents

Как научить роботов тестировать веб-интерфейсы. Артем Ерошенко, Илья Кацев, Я...
Как научить роботов тестировать веб-интерфейсы. Артем Ерошенко, Илья Кацев, Я...Как научить роботов тестировать веб-интерфейсы. Артем Ерошенко, Илья Кацев, Я...
Как научить роботов тестировать веб-интерфейсы. Артем Ерошенко, Илья Кацев, Я...yaevents
 
Тема для WordPress в БЭМ. Владимир Гриненко, Яндекс
Тема для WordPress в БЭМ. Владимир Гриненко, ЯндексТема для WordPress в БЭМ. Владимир Гриненко, Яндекс
Тема для WordPress в БЭМ. Владимир Гриненко, Яндексyaevents
 
Построение сложносоставных блоков в шаблонизаторе bemhtml. Сергей Бережной, Я...
Построение сложносоставных блоков в шаблонизаторе bemhtml. Сергей Бережной, Я...Построение сложносоставных блоков в шаблонизаторе bemhtml. Сергей Бережной, Я...
Построение сложносоставных блоков в шаблонизаторе bemhtml. Сергей Бережной, Я...yaevents
 
i-bem.js: JavaScript в БЭМ-терминах. Елена Глухова, Варвара Степанова, Яндекс
i-bem.js: JavaScript в БЭМ-терминах. Елена Глухова, Варвара Степанова, Яндексi-bem.js: JavaScript в БЭМ-терминах. Елена Глухова, Варвара Степанова, Яндекс
i-bem.js: JavaScript в БЭМ-терминах. Елена Глухова, Варвара Степанова, Яндексyaevents
 
Дом из готовых кирпичей. Библиотека блоков, тюнинг, инструменты. Елена Глухов...
Дом из готовых кирпичей. Библиотека блоков, тюнинг, инструменты. Елена Глухов...Дом из готовых кирпичей. Библиотека блоков, тюнинг, инструменты. Елена Глухов...
Дом из готовых кирпичей. Библиотека блоков, тюнинг, инструменты. Елена Глухов...yaevents
 
Модели в профессиональной инженерии и тестировании программ. Александр Петрен...
Модели в профессиональной инженерии и тестировании программ. Александр Петрен...Модели в профессиональной инженерии и тестировании программ. Александр Петрен...
Модели в профессиональной инженерии и тестировании программ. Александр Петрен...yaevents
 
Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...
Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...
Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...yaevents
 
Мониторинг со всех сторон. Алексей Симаков, Яндекс
Мониторинг со всех сторон. Алексей Симаков, ЯндексМониторинг со всех сторон. Алексей Симаков, Яндекс
Мониторинг со всех сторон. Алексей Симаков, Яндексyaevents
 
Истории про разработку сайтов. Сергей Бережной, Яндекс
Истории про разработку сайтов. Сергей Бережной, ЯндексИстории про разработку сайтов. Сергей Бережной, Яндекс
Истории про разработку сайтов. Сергей Бережной, Яндексyaevents
 
Разработка приложений для Android на С++. Юрий Береза, Shturmann
Разработка приложений для Android на С++. Юрий Береза, ShturmannРазработка приложений для Android на С++. Юрий Береза, Shturmann
Разработка приложений для Android на С++. Юрий Береза, Shturmannyaevents
 
Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...
Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...
Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...yaevents
 
Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...
Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...
Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...yaevents
 
Сканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, Яндекс
Сканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, ЯндексСканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, Яндекс
Сканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, Яндексyaevents
 
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, FacebookМасштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebookyaevents
 
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...yaevents
 
Юнит-тестирование и Google Mock. Влад Лосев, Google
Юнит-тестирование и Google Mock. Влад Лосев, GoogleЮнит-тестирование и Google Mock. Влад Лосев, Google
Юнит-тестирование и Google Mock. Влад Лосев, Googleyaevents
 
C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...
C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...
C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...yaevents
 
Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...
Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...
Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...yaevents
 
В поисках математики. Михаил Денисенко, Нигма
В поисках математики. Михаил Денисенко, НигмаВ поисках математики. Михаил Денисенко, Нигма
В поисках математики. Михаил Денисенко, Нигмаyaevents
 
Using classifiers to compute similarities between face images. Prof. Lior Wol...
Using classifiers to compute similarities between face images. Prof. Lior Wol...Using classifiers to compute similarities between face images. Prof. Lior Wol...
Using classifiers to compute similarities between face images. Prof. Lior Wol...yaevents
 

Mehr von yaevents (20)

Как научить роботов тестировать веб-интерфейсы. Артем Ерошенко, Илья Кацев, Я...
Как научить роботов тестировать веб-интерфейсы. Артем Ерошенко, Илья Кацев, Я...Как научить роботов тестировать веб-интерфейсы. Артем Ерошенко, Илья Кацев, Я...
Как научить роботов тестировать веб-интерфейсы. Артем Ерошенко, Илья Кацев, Я...
 
Тема для WordPress в БЭМ. Владимир Гриненко, Яндекс
Тема для WordPress в БЭМ. Владимир Гриненко, ЯндексТема для WordPress в БЭМ. Владимир Гриненко, Яндекс
Тема для WordPress в БЭМ. Владимир Гриненко, Яндекс
 
Построение сложносоставных блоков в шаблонизаторе bemhtml. Сергей Бережной, Я...
Построение сложносоставных блоков в шаблонизаторе bemhtml. Сергей Бережной, Я...Построение сложносоставных блоков в шаблонизаторе bemhtml. Сергей Бережной, Я...
Построение сложносоставных блоков в шаблонизаторе bemhtml. Сергей Бережной, Я...
 
i-bem.js: JavaScript в БЭМ-терминах. Елена Глухова, Варвара Степанова, Яндекс
i-bem.js: JavaScript в БЭМ-терминах. Елена Глухова, Варвара Степанова, Яндексi-bem.js: JavaScript в БЭМ-терминах. Елена Глухова, Варвара Степанова, Яндекс
i-bem.js: JavaScript в БЭМ-терминах. Елена Глухова, Варвара Степанова, Яндекс
 
Дом из готовых кирпичей. Библиотека блоков, тюнинг, инструменты. Елена Глухов...
Дом из готовых кирпичей. Библиотека блоков, тюнинг, инструменты. Елена Глухов...Дом из готовых кирпичей. Библиотека блоков, тюнинг, инструменты. Елена Глухов...
Дом из готовых кирпичей. Библиотека блоков, тюнинг, инструменты. Елена Глухов...
 
Модели в профессиональной инженерии и тестировании программ. Александр Петрен...
Модели в профессиональной инженерии и тестировании программ. Александр Петрен...Модели в профессиональной инженерии и тестировании программ. Александр Петрен...
Модели в профессиональной инженерии и тестировании программ. Александр Петрен...
 
Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...
Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...
Администрирование небольших сервисов или один за всех и 100 на одного. Роман ...
 
Мониторинг со всех сторон. Алексей Симаков, Яндекс
Мониторинг со всех сторон. Алексей Симаков, ЯндексМониторинг со всех сторон. Алексей Симаков, Яндекс
Мониторинг со всех сторон. Алексей Симаков, Яндекс
 
Истории про разработку сайтов. Сергей Бережной, Яндекс
Истории про разработку сайтов. Сергей Бережной, ЯндексИстории про разработку сайтов. Сергей Бережной, Яндекс
Истории про разработку сайтов. Сергей Бережной, Яндекс
 
Разработка приложений для Android на С++. Юрий Береза, Shturmann
Разработка приложений для Android на С++. Юрий Береза, ShturmannРазработка приложений для Android на С++. Юрий Береза, Shturmann
Разработка приложений для Android на С++. Юрий Береза, Shturmann
 
Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...
Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...
Кросс-платформенная разработка под мобильные устройства. Дмитрий Жестилевский...
 
Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...
Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...
Сложнейшие техники, применяемые буткитами и полиморфными вирусами. Вячеслав З...
 
Сканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, Яндекс
Сканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, ЯндексСканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, Яндекс
Сканирование уязвимостей со вкусом Яндекса. Тарас Иващенко, Яндекс
 
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, FacebookМасштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
 
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
 
Юнит-тестирование и Google Mock. Влад Лосев, Google
Юнит-тестирование и Google Mock. Влад Лосев, GoogleЮнит-тестирование и Google Mock. Влад Лосев, Google
Юнит-тестирование и Google Mock. Влад Лосев, Google
 
C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...
C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...
C++11 (formerly known as C++0x) is the new C++ language standard. Dave Abraha...
 
Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...
Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...
Зачем обычному программисту знать языки, на которых почти никто не пишет. Але...
 
В поисках математики. Михаил Денисенко, Нигма
В поисках математики. Михаил Денисенко, НигмаВ поисках математики. Михаил Денисенко, Нигма
В поисках математики. Михаил Денисенко, Нигма
 
Using classifiers to compute similarities between face images. Prof. Lior Wol...
Using classifiers to compute similarities between face images. Prof. Lior Wol...Using classifiers to compute similarities between face images. Prof. Lior Wol...
Using classifiers to compute similarities between face images. Prof. Lior Wol...
 

Kürzlich hochgeladen

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 

Kürzlich hochgeladen (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 

Mike Thelwall - Sentiment strength detection for the social web: From YouTube arguments to Twitter praise

  • 1. Information Studies Sentiment strength detection for the social web: From YouTube arguments to Twitter praise Mike Thelwall University of Wolverhampton, UK
  • 2. Sentiment Strength Detection with SentiStrength 1. Detect positive and negative sentiment strength in short informal text 1. Develop workarounds for lack of standard grammar and spelling 2. Harness emotion expression forms unique to MySpace or CMC (e.g., :-) or haaappppyyy!!!) 3. Classify simultaneously as positive 1-5 AND negative 1-5 sentiment 2. Apply to MySpace comments and social issues Thelwall, M., Buckley, K., Paltoglou, G., Cai, D., & Kappas, A. (2010). Sentiment strength detection in short informal text. Journal of the American Society for Information Science and Technology, 61(12), 2544-2558.
  • 3. SentiStrength 1 Algorithm - Core List of 890 positive and negative sentiment terms and strengths (1 to 5), e.g.  ache = -2, dislike = -3, hate=-4, excruciating -5  encourage = 2, coolest = 3, lover = 4 Sentiment strength is highest in sentence; or highest sentence if multiple sentences
  • 4. Examples positive, negative -2 1, -2 My legs ache. 3 3, -1 You are the coolest. I hate Paul but encourage him. 2, -4 -4 2
  • 5. Term Strength Optimisation Term strengths (e.g., ache = -2) initially fixed by a human coder Term strengths optimised on training set with 10-fold cross-validation  Adjust term strengths to give best training set results then evaluate on test set  E.g., training set: “My legs ache”: coder sentiment = 1,-3 => adjust sentiment of “ache” from -2 to -3.
  • 6. Summary of sentiment methods sentiment word strength list terrify=-4  “miss” = +2, -2 spelling corrected nicce -> nice booster words alter strength very happy negating words flip emotions not nice repeated letters boost sentiment/+ve niiiice emoticon list :) =+2 exclamation marks count as +2 unless –ve hi! repeated punctuation boosts sentiment good!!! negative emotion ignored in questions u h8 me?
  • 7. Experiments Development data = 2600 MySpace comments coded by 1 coder Test data = 1041 MySpace comments coded by 3 independent coders Comparison against a range of standard machine learning algorithms
  • 8. Test data: Inter-coder agreement Comparison +ve -ve for 1041 agree- agree- Krippendorff’s inter-coder MySpace ment ment texts weighted alpha = 0.5743 for positive and 0.5634 Coder 1 vs. 2 51.0% 67.3% for negative sentiment Only moderate agreement Coder 1 vs. 3 55.7% 76.3% between coders but it is a hard 5-category task Coder 2 vs. 3 61.4% 68.2%
  • 9. SentiStrength vs. 693 other algorithms/variations Results:+ve sentiment strength Algorithm Optimal Accuracy Accuracy Correlation #features +/- 1 class SentiStrength - 60.6% 96.9% .599 Simple logistic regression 700 58.5% 96.1% .557 SVM (SMO) 800 57.6% 95.4% .538 J48 classification tree 700 55.2% 95.9% .548 JRip rule-based classifier 700 54.3% 96.4% .476 SVM regression (SMO) 100 54.1% 97.3% .469 AdaBoost 100 53.3% 97.5% .464 Decision table 200 53.3% 96.7% .431 Multilayer Perceptron 100 50.0% 94.1% .422 Naïve Bayes 100 49.1% 91.4% .567 Baseline - 47.3% 94.0% - Random - 19.8% 56.9% .016
  • 10. SentiStrength vs. 693 other algorithms/variations Results:-ve sentiment strength Algorithm Optimal Accuracy Accuracy Correlation #features +/- 1 class SVM (SMO) 100 73.5% 92.7% .421 SVM regression (SMO) 300 73.2% 91.9% .363 Simple logistic regression 800 72.9% 92.2% .364 SentiStrength - 72.8% 95.1% .564 Decision table 100 72.7% 92.1% .346 JRip rule-based classifier 500 72.2% 91.5% .309 J48 classification tree 400 71.1% 91.6% .235 Multilayer Perceptron 100 70.1% 92.5% .346 AdaBoost 100 69.9% 90.6% - Baseline - 69.9% 90.6% - Naïve Bayes 200 68.0% 89.8% .311 Random - 20.5% 46.0% .010
  • 11. Example differences/errors THINK 4 THE ADD  Computer (1,-1), Human (2,-1) 0MG 0MG 0MG 0MG 0MG 0MG 0MG 0MG!!!!!!!!!!!!!!!!!!!!N33N3R!!!!!!!!!!!!!!!!  Computer (2,-1), Human (5,-1)
  • 12. Application - Evidence of emotion homophily in MySpace Automatic analysis of sentiment in 2 million comments exchanged between MySpace friends Correlation of 0.227 for +ve emotion strength and 0.254 for –ve People tend to use similar but not identical levels of emotion to their friends in messages
  • 13. SentiStrength 2 Sentiment analysis programs are typically domain-dependant SentiStrength is designed to be quite generic  Does not pick up domain-specific non- sentiment terms, e.g., G3 SentiStrength 2.0 has extended negative sentiment dictionary  In response to weakness for negative sentiment Thelwall, M., Buckley, K., Paltoglou, G. (submitted). High Face Validity Sentiment Strength Detection for the Social Web
  • 14. 6 Social web data sets To test on a wide range of different Social Web text
  • 15. SentiStrength 2 (unsupervised) tests Social web sentiment analysis is Tested against human coder results less domain dependant than reviews Positive Negative Data set Correlation Correlation YouTube 0.589 0.521 MySpace 0.647 0.599 Twitter 0.541 0.499 Sports forum 0.567 0.541 Digg.com news 0.352 0.552 BBC forums 0.296 0.591 All 6 0.556 0.565
  • 16. Why the bad results for BBC? Long texts, mainly negative, expressive language used, e.g.,  David Cameron must be very happy that I have lost my job.  It is really interesting that David Cameron and most of his ministers are millionaires.  Your argument is a joke.
  • 17. SentiStrength vs. Machine learning for Social Web texts • Machine learning performs a bit better overall (7 out of 12 data sets/+ve or negative) • Logistic Regression with trigrams, including punctuation and emoticons; 200 features • But has “domain transfer” and “face validity” problems for some tasks
  • 18. SentiStrength software Versions: Windows, Java, live online (sentistrength.wlv.ac.uk) German version (Hannes Pirker) Variants: Binary (positive/negative), trinary (positive/neutral/negative) and scale (-4 to +4) Sold commercially - & purchasers converting to French, Spanish, Portuguese, & ?
  • 19. Sentistrength Collective Emotions in Cyberspace CYBEREMOTIONS = data gathering + complex systems methods + ICT outputs
  • 20. Application – sentiment in Twitter events Analysis of a corpus of 1 month of English Twitter posts Automatic detection of spikes (events) Sentiment strength classification of all posts Assessment of whether sentiment strength increases during important events Thelwall, M., Buckley, K., & Paltoglou, G. (2011). Sentiment in Twitter events. Journal of the American Society for Information Science and Technology, 62(2), 406-418.
  • 21. Automatically-identified Twitter spikes Proportion of tweets mentioning keyword 9 Mar 2010 9 Feb 2010
  • 22. Proportion of tweets Chile matching posts mentioning Chile 9 Feb 2010 Date and time 9 Mar 2010 Av. +ve sentiment Subj. Sentiment strength Just subj. Increase in –ve sentiment strength Av. -ve sentiment Just subj. 9 Feb 2010 Date and time 9 Mar 2010
  • 23. #oscars % matching posts Proportion of tweets mentioning the Oscars 9 Feb 2010 Date and time 9 Mar 2010 Av. +ve sentime Just subj. Subj. Sentiment strength Increase in –ve sentiment strength Av. -ve sentime Just subj. 9 Feb 2010 Date and time 9 Mar 2010
  • 24. Sentiment and spikes Analysis of top 30 spiking events  Strong evidence (p=0.001) that higher volume hours have stronger negative sentiment than lower volume hours  Insufficient evidence (p=0.014) that higher volume hours have different positive sentiment strength than lower volume hours => Spikes are typified by increases in negativity
  • 25. Proportion of tweets Bieber mentioning Bieber 9 Feb 2010 Date and time 9 Mar 2010 Av. +ve sen Just su But there is plenty of positivity Av. -ve sen if you know where to look! Just su 9 Mar 2010 9 Feb 2010 Date and time
  • 26. YouTube Video comments Short text messages left for a video by viewers Up to 1000 per video accessible via the YouTube API A good source of social web text data Mike Thelwall Pardeep Sud Farida Vis (submitted) Commenting on YouTube Videos: From Guatemalan Rock to El Big Bang
  • 27. Sentiment in YouTube comments Predominantly positive comments
  • 28. Trends in YouTube comment sentiment +ve and –ve sentiment strengths negatively correlated for videos (Spearman’s rho -0.213) # comments on a video correlates with –ve sentiment strength (Spearman’s rho 0.242, p=0.000) and negatively correlates with +ve sentiment strength (Spearman’s rho -0.113) – negativity drives commenting even though it is rare! Qualitative: Big debates over religion  No discussion about aging rock stars!
  • 29. Conclusion Automatic classification of sentiment strength is possible for the social web – even unsupervised!  …not good for longer, political messages? Hard to get accuracy much over 60%? Can identify trends through automatic analysis of sentiment in millions of social web messages
  • 30. Bibliography Thelwall, M., Buckley, K., Paltoglou, G., Cai, D., & Kappas, A. (2010). Sentiment strength detection in short informal text. Journal of the American Society for Information Science and Technology, 61(12), 2544–2558. Thelwall, M., Buckley, K., & Paltoglou, G. (2011). Sentiment in Twitter events. Journal of the American Society for Information Science and Technology, 61(12), 2544–2558.