SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Alleviating Data Sparsity For
Twitter Sentiment Analysis

        Hassan Saif, Yulan He & Harith Alani
Knowledge Media Institute, The Open University, Milton
             Keynes, United Kingdom



Making Sense of Microblogs – WWW2012 Conference
                   Lyon - France
Outline
•   Hello World
•   Motivation
•   Related Work
•   Semantic Features
•   Topic-Sentiment Features
•   Evaluation
•   Demos
•   The Future
Sentiment Analysis
“Sentiment analysis is the task of identifying
positive and negative opinions, emotions and
evaluations in text”


 The main dish was                         The main dish was
                     It is a Syrian dish
     delicious                             salty and horrible


    Opinion             Fact                   Opinion

                                                                3
Microblogging
• Service which allows subscribers to post short
  updates online and broadcast them

• Answers the question: What are you doing
  now?

• Twitter, Plurk, sfeed, Yammer, BlueTwi, etc.


                                                   4
Together!
Sense?
Sense?
                                           1400000


AGARWAL et al.
                                           1200000            1143562

BARBOSA, L., AND FENG, J.
                                           1000000
BIFET, A., AND FRANK, E.
                                            800000
DIAKOPOULOS, N., AND SHAMMA, D.                                                713178



GO et al.                                   600000



He & Saif                                   400000



PAK & PAROUBEK                              200000



                                   And           0
                                                                         1
                                  Many               Objective Tweets   Subjective Tweets

                                  Others
                                               UK General Elections Corpus
Why


                         Private Sectors


Because It is Critical
    df                                     Keep In Touch


                         Public Sectors
Related Work
Sentiment Analysis

                                        Lexical Based Approach
Text Classification Problem




                  Right Features
                                                Word Polarity




                                   Building Better Dictionary
   Machine Learning Approach
Twitter Sentiment Analysis

Challenges

    – The short length of status update

    – Language Variations

    – Open Social Environment
Twitter Sentiment Analysis

Related Work

  • Distant Supervision
      – Supervised classifiers trained from noisy labels

      – Tweets messages are labeled using emoticons

      – Data filtering process

  Go et al., (2009) - Barbosa and Fengl. (2010) – Pak and Paroubek (2010)
Twitter Sentiment Analysis

Related Work

  • Followers Graph & Label Propagation
       – Twitter follower graph (users, tweets, unigrams
         and hashtags)
       – Start with small number of labeled tweets
       – Applied label propagation method throughout the
         graph.

  Speriosu et al., (2009)
Twitter Sentiment Analysis

Related Work

    • Feature Engineering
         – Unigrams, bigrams, POS
         – Microblogging features
              • Hashtags
              • Emoticons
              • Abbreviations & Intensifiers


  Agsrwal et al., (2011) – Kouloumpis et al (2011)
So?
What Does sparsity mean?




  Training data contains many infrequent terms
What Does sparsity mean?




       Word frequency statistics
How!


Sentiment Topic Features
                                  Extracts semantically hidden concepts
                                  from      tweets     data     and    then
                                  incorporates them into supervised
Extract latent topics and         classifier training by interpolation
the associated topic
sentiment      from    the
tweets data which are
subsequently added into
the original feature space    Semantic Features
for supervised classifier
training
Semantic Features
Shallow Semantic Method


                                                                Sport
         @Stace_meister Ya, I have Rugby in an hour




    Sushi time for fabulous Jesse's last day on dragons den    Person




   Dear eBay, if I win I owe you a total 580.63 bye paycheck
                                                               Company
Sematic Features
Interpolation Method
Topic-Sentiment Features
Joint Sentiment Topic Model
JST1 is a four-layer generative model
which allows the detection of both
sentiment and topic simultaneously
from text.

The only supervision is word prior
polarity information which can be
obtained from MPQA subjectivity
lexicon.




Lin & He. 2009
Twitter Sentiment Corpus
                             Stanford University



Collected: the 6th of April & the 25th of June 2009


Training Set: 1.6 million tweets (Balanced)

Testing Set: 177 negative tweets & 182 positive tweets




http://twittersentiment.appspot.com/
Our Sentiment Corpus
• Training Set: 60K tweets
• Testing Set: 1000 tweets




• Annotated additional 640 tweets using
  Tweenator
Evaluation
                                                   Extended Test Set


Method                          Accuracy

Unigrams                        80.7%

Semantic replacement            76.3%

Semantic interpolation          84.0%

Sentiment-topic features        82.3%



  Sentiment classification results on the 1000-tweets test set
Evaluation
                                                              Stanford Dataset


       Method                            Accuracy
       Unigrams                          81.0%
       Semantic replacement              77.3%
       Sematic augmentation              80.45%
       Semantic interpolation            84.1%
       Sentiment-topic features          86.3%


       (Go et al., 2009)                 83%
       (Speriosu et al., 2011)           84.7%

Sentiment classification results on the original Stanford Twitter Sentiment test set
Evaluation

Semantic Features V.s Sentiment-Topic Features
Tweenator




http://tweenator.com
Conclusion
• Twitter sentiment analysis faces data sparsity problem due to
  some special characteristics of Twitter

• Semantic & topic-sentiment features reduce the sparsity
  problem and increase the performance significantly

• Sentiment-topic features should be preferred over semantic
  features for the sentiment classification task since it gives
  much better results with far less features.



                                                                  30
The Future
Future Work

                                                   Sentiment-Topic Model

        Enhance Entity Extraction Methods




                                 Attaching weight to extracted features

Semantic Smoothing Model



                                                  Statistical replacement
References
[1] AGARWAL, A., XIE, B., VOVSHA, I., RAMBOW, O., AND PASSONNEAU, R. Sentiment analysis of twitter data. In Proceedings of
the ACL 2011 Workshop on Languages in Social Media (2011), pp. 30–38.

[2] BARBOSA, L., AND FENG, J. Robust sentiment detection on twitter from biased and noisy data. In Proceedings of COLING
(2010), pp. 36–44.

[3] BIFET, A., AND FRANK, E. Sentiment knowledge discovery in twitter streaming data. In Discovery Science (2010), Springer, pp.
1–15.

[4] GO, A., BHAYANI, R., AND HUANG, L. Twitter sentiment classification using distant supervision. CS224N Project
Report, Stanford (2009).

[5] KOULOUMPIS, E., WILSON, T., AND MOORE, J. Twitter sentiment analysis: The good the bad and the omg! In Proceedings of
the ICWSM (2011).

[5]LIN, C., AND HE, Y. Joint sentiment/topic model for sentiment analysis. In Proceeding of the 18th ACM conference on
Information and knowledge management (2009), ACM, pp. 375–384.
[6] PAK, A., AND PAROUBEK, P. Twitter as a corpus for sentiment analysis and opinion mining. Proceedings of LREC 2010 (2010).

[7]SAIF, H., HE, Y., AND ALANI, H. Semantic Smoothing for Twitter Sentiment Analysis. In Proceeding of the 10th International
Semantic Web Conference (ISWC) (2011).
Thank




        You

Weitere ähnliche Inhalte

Was ist angesagt?

Sentiment analysis of Twitter Data
Sentiment analysis of Twitter DataSentiment analysis of Twitter Data
Sentiment analysis of Twitter DataNurendra Choudhary
 
SENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATASENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATAParvathy Devaraj
 
Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Dev Sahu
 
Sentiment Analysis on Twitter
Sentiment Analysis on TwitterSentiment Analysis on Twitter
Sentiment Analysis on TwitterSmritiAgarwal26
 
MTech Seminar Presentation [IIT-Bombay]
MTech Seminar Presentation [IIT-Bombay]MTech Seminar Presentation [IIT-Bombay]
MTech Seminar Presentation [IIT-Bombay]Sagar Ahire
 
Sentiment analysis in Twitter on Big Data
Sentiment analysis in Twitter on Big DataSentiment analysis in Twitter on Big Data
Sentiment analysis in Twitter on Big DataIswarya M
 
SentiCheNews - Sentiment Analysis on Newspapers and Tweets
SentiCheNews - Sentiment Analysis on Newspapers and TweetsSentiCheNews - Sentiment Analysis on Newspapers and Tweets
SentiCheNews - Sentiment Analysis on Newspapers and Tweets🧑‍💻 Manuel Coppotelli
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment AnalysisSagar Ahire
 
Ontology based sentiment analysis
Ontology based sentiment analysisOntology based sentiment analysis
Ontology based sentiment analysisprathako
 
social network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysissocial network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysisAshish Mundra
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSumit Raj
 
New sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumarNew sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumarRavi Kumar
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysisharit66
 
Sentiment mining- The Design and Implementation of an Internet Public Opinion...
Sentiment mining- The Design and Implementation of an Internet PublicOpinion...Sentiment mining- The Design and Implementation of an Internet PublicOpinion...
Sentiment mining- The Design and Implementation of an Internet Public Opinion...Prateek Singh
 
SemEval - Aspect Based Sentiment Analysis
SemEval - Aspect Based Sentiment AnalysisSemEval - Aspect Based Sentiment Analysis
SemEval - Aspect Based Sentiment AnalysisAditya Joshi
 

Was ist angesagt? (20)

Sentiment analysis of Twitter Data
Sentiment analysis of Twitter DataSentiment analysis of Twitter Data
Sentiment analysis of Twitter Data
 
SENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATASENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATA
 
Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier
 
Sentiment Analysis on Twitter
Sentiment Analysis on TwitterSentiment Analysis on Twitter
Sentiment Analysis on Twitter
 
MTech Seminar Presentation [IIT-Bombay]
MTech Seminar Presentation [IIT-Bombay]MTech Seminar Presentation [IIT-Bombay]
MTech Seminar Presentation [IIT-Bombay]
 
Sentiment analysis
Sentiment analysisSentiment analysis
Sentiment analysis
 
Sentiment analysis in Twitter on Big Data
Sentiment analysis in Twitter on Big DataSentiment analysis in Twitter on Big Data
Sentiment analysis in Twitter on Big Data
 
SentiCheNews - Sentiment Analysis on Newspapers and Tweets
SentiCheNews - Sentiment Analysis on Newspapers and TweetsSentiCheNews - Sentiment Analysis on Newspapers and Tweets
SentiCheNews - Sentiment Analysis on Newspapers and Tweets
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
Ontology based sentiment analysis
Ontology based sentiment analysisOntology based sentiment analysis
Ontology based sentiment analysis
 
Sentimental Analysis of twitter data .
Sentimental Analysis of twitter data .Sentimental Analysis of twitter data .
Sentimental Analysis of twitter data .
 
social network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysissocial network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysis
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
 
Adapting Sentiment Lexicons using Contextual Semantics for Sentiment Analysis...
Adapting Sentiment Lexicons using Contextual Semantics for Sentiment Analysis...Adapting Sentiment Lexicons using Contextual Semantics for Sentiment Analysis...
Adapting Sentiment Lexicons using Contextual Semantics for Sentiment Analysis...
 
sentiment analysis
sentiment analysis sentiment analysis
sentiment analysis
 
New sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumarNew sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumar
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
Final deck
Final deckFinal deck
Final deck
 
Sentiment mining- The Design and Implementation of an Internet Public Opinion...
Sentiment mining- The Design and Implementation of an Internet PublicOpinion...Sentiment mining- The Design and Implementation of an Internet PublicOpinion...
Sentiment mining- The Design and Implementation of an Internet Public Opinion...
 
SemEval - Aspect Based Sentiment Analysis
SemEval - Aspect Based Sentiment AnalysisSemEval - Aspect Based Sentiment Analysis
SemEval - Aspect Based Sentiment Analysis
 

Andere mochten auch

Negative Sentiment (or "Sentiment Analysis is Sh*te")
Negative Sentiment (or "Sentiment Analysis is Sh*te")Negative Sentiment (or "Sentiment Analysis is Sh*te")
Negative Sentiment (or "Sentiment Analysis is Sh*te")Mat Morrison
 
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...Srivatsan Ramanujam
 
Social media & sentiment analysis splunk conf2012
Social media & sentiment analysis   splunk conf2012Social media & sentiment analysis   splunk conf2012
Social media & sentiment analysis splunk conf2012Michael Wilde
 
Political sentiment analysis using twitter data
Political sentiment analysis using twitter dataPolitical sentiment analysis using twitter data
Political sentiment analysis using twitter dataAmal Mahmoud
 
Supervised Learning Based Approach to Aspect Based Sentiment Analysis
Supervised Learning Based Approach to Aspect Based Sentiment AnalysisSupervised Learning Based Approach to Aspect Based Sentiment Analysis
Supervised Learning Based Approach to Aspect Based Sentiment AnalysisTharindu Kumara
 
Sentiment analysis-by-nltk
Sentiment analysis-by-nltkSentiment analysis-by-nltk
Sentiment analysis-by-nltkWei-Ting Kuo
 
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...Jen Aman
 
How Sentiment Analysis works
How Sentiment Analysis worksHow Sentiment Analysis works
How Sentiment Analysis worksCJ Jenkins
 
Introduction to Sentiment Analysis
Introduction to Sentiment AnalysisIntroduction to Sentiment Analysis
Introduction to Sentiment AnalysisJaganadh Gopinadhan
 
Tutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisTutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisFabio Benedetti
 
Sentiment Analysis in Twitter
Sentiment Analysis in TwitterSentiment Analysis in Twitter
Sentiment Analysis in TwitterAyushi Dalmia
 
Realtime Sentiment Analysis Application Using Hadoop and HBase
Realtime Sentiment Analysis Application Using Hadoop and HBaseRealtime Sentiment Analysis Application Using Hadoop and HBase
Realtime Sentiment Analysis Application Using Hadoop and HBaseDataWorks Summit
 
Lecture 3: Structuring Unstructured Texts Through Sentiment Analysis
Lecture 3: Structuring Unstructured Texts Through Sentiment AnalysisLecture 3: Structuring Unstructured Texts Through Sentiment Analysis
Lecture 3: Structuring Unstructured Texts Through Sentiment AnalysisMarina Santini
 
NLP based Mining on Movie Critics
NLP based Mining on Movie Critics NLP based Mining on Movie Critics
NLP based Mining on Movie Critics supraja reddy
 
Sentiments Improvement
Sentiments ImprovementSentiments Improvement
Sentiments ImprovementMisha Kozik
 
Challenges of using Twitter for sentiment analysis
Challenges of using Twitter for sentiment analysisChallenges of using Twitter for sentiment analysis
Challenges of using Twitter for sentiment analysisAna Canhoto
 
A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...
A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...
A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...Cataldo Musto
 

Andere mochten auch (20)

Negative Sentiment (or "Sentiment Analysis is Sh*te")
Negative Sentiment (or "Sentiment Analysis is Sh*te")Negative Sentiment (or "Sentiment Analysis is Sh*te")
Negative Sentiment (or "Sentiment Analysis is Sh*te")
 
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
 
Social media & sentiment analysis splunk conf2012
Social media & sentiment analysis   splunk conf2012Social media & sentiment analysis   splunk conf2012
Social media & sentiment analysis splunk conf2012
 
Political sentiment analysis using twitter data
Political sentiment analysis using twitter dataPolitical sentiment analysis using twitter data
Political sentiment analysis using twitter data
 
Supervised Learning Based Approach to Aspect Based Sentiment Analysis
Supervised Learning Based Approach to Aspect Based Sentiment AnalysisSupervised Learning Based Approach to Aspect Based Sentiment Analysis
Supervised Learning Based Approach to Aspect Based Sentiment Analysis
 
Sentiment analysis-by-nltk
Sentiment analysis-by-nltkSentiment analysis-by-nltk
Sentiment analysis-by-nltk
 
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
 
How Sentiment Analysis works
How Sentiment Analysis worksHow Sentiment Analysis works
How Sentiment Analysis works
 
Introduction to Sentiment Analysis
Introduction to Sentiment AnalysisIntroduction to Sentiment Analysis
Introduction to Sentiment Analysis
 
Tutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisTutorial of Sentiment Analysis
Tutorial of Sentiment Analysis
 
Sentiment Analysis in Twitter
Sentiment Analysis in TwitterSentiment Analysis in Twitter
Sentiment Analysis in Twitter
 
Realtime Sentiment Analysis Application Using Hadoop and HBase
Realtime Sentiment Analysis Application Using Hadoop and HBaseRealtime Sentiment Analysis Application Using Hadoop and HBase
Realtime Sentiment Analysis Application Using Hadoop and HBase
 
On Stopwords, Filtering and Data Sparsity for Sentiment Analysis of Twitter
On Stopwords, Filtering and Data Sparsity for Sentiment Analysis of  TwitterOn Stopwords, Filtering and Data Sparsity for Sentiment Analysis of  Twitter
On Stopwords, Filtering and Data Sparsity for Sentiment Analysis of Twitter
 
Lecture 3: Structuring Unstructured Texts Through Sentiment Analysis
Lecture 3: Structuring Unstructured Texts Through Sentiment AnalysisLecture 3: Structuring Unstructured Texts Through Sentiment Analysis
Lecture 3: Structuring Unstructured Texts Through Sentiment Analysis
 
Sentiment analysis taxonomy_apr-12-2011
Sentiment analysis taxonomy_apr-12-2011Sentiment analysis taxonomy_apr-12-2011
Sentiment analysis taxonomy_apr-12-2011
 
NLP based Mining on Movie Critics
NLP based Mining on Movie Critics NLP based Mining on Movie Critics
NLP based Mining on Movie Critics
 
Sentiments Improvement
Sentiments ImprovementSentiments Improvement
Sentiments Improvement
 
Challenges of using Twitter for sentiment analysis
Challenges of using Twitter for sentiment analysisChallenges of using Twitter for sentiment analysis
Challenges of using Twitter for sentiment analysis
 
A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...
A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...
A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...
 
Data mining tasks
Data mining tasksData mining tasks
Data mining tasks
 

Ähnlich wie Alleviating Data Sparsity for Twitter Sentiment Analysis

Classifying Non-Referential It for Question Answer Pairs
Classifying Non-Referential It for Question Answer PairsClassifying Non-Referential It for Question Answer Pairs
Classifying Non-Referential It for Question Answer PairsJinho Choi
 
NAACL Tutorial
Social Media Predictive Analytics
NAACL Tutorial
Social Media Predictive AnalyticsNAACL Tutorial
Social Media Predictive Analytics
NAACL Tutorial
Social Media Predictive Analyticsshengjing 孙胜晶
 
IRJET-Sentiment Analysis in Twitter
IRJET-Sentiment Analysis in TwitterIRJET-Sentiment Analysis in Twitter
IRJET-Sentiment Analysis in TwitterIRJET Journal
 
Sentiment analysis using ml
Sentiment analysis using mlSentiment analysis using ml
Sentiment analysis using mlPravin Katiyar
 
Putting research to work for you sabatier
Putting research to work for you sabatierPutting research to work for you sabatier
Putting research to work for you sabatierLouannsabatier
 
Integrating digital traces into a semantic enriched data
Integrating digital traces into a semantic enriched dataIntegrating digital traces into a semantic enriched data
Integrating digital traces into a semantic enriched dataDhaval Thakker
 
Watson DevCon 2016 - From Jeopardy! to the Future
Watson DevCon 2016 - From Jeopardy! to the FutureWatson DevCon 2016 - From Jeopardy! to the Future
Watson DevCon 2016 - From Jeopardy! to the FutureIBM Watson
 
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptxSampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx20211a05p7
 
Extracting Semantic User Networks from Informal Communication Exchanges
Extracting Semantic User Networks from Informal Communication ExchangesExtracting Semantic User Networks from Informal Communication Exchanges
Extracting Semantic User Networks from Informal Communication ExchangesSuvodeep Mazumdar
 
IRJET - Twitter Sentiment Analysis using Machine Learning
IRJET -  	  Twitter Sentiment Analysis using Machine LearningIRJET -  	  Twitter Sentiment Analysis using Machine Learning
IRJET - Twitter Sentiment Analysis using Machine LearningIRJET Journal
 
Wimmics Research Team 2015 Activity Report
Wimmics Research Team 2015 Activity ReportWimmics Research Team 2015 Activity Report
Wimmics Research Team 2015 Activity ReportFabien Gandon
 
IRJET - Cyberbulling Detection Model
IRJET -  	  Cyberbulling Detection ModelIRJET -  	  Cyberbulling Detection Model
IRJET - Cyberbulling Detection ModelIRJET Journal
 
Sentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes AlgorithmSentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes AlgorithmKhushboo Gupta
 
IRJET- A Review on: Sentiment Polarity Analysis on Twitter Data from Diff...
IRJET-  	  A Review on: Sentiment Polarity Analysis on Twitter Data from Diff...IRJET-  	  A Review on: Sentiment Polarity Analysis on Twitter Data from Diff...
IRJET- A Review on: Sentiment Polarity Analysis on Twitter Data from Diff...IRJET Journal
 
Questions about questions
Questions about questionsQuestions about questions
Questions about questionsmoresmile
 
Twitter sentiment analysis ppt
Twitter sentiment analysis pptTwitter sentiment analysis ppt
Twitter sentiment analysis pptSonuCreation
 

Ähnlich wie Alleviating Data Sparsity for Twitter Sentiment Analysis (20)

Classifying Non-Referential It for Question Answer Pairs
Classifying Non-Referential It for Question Answer PairsClassifying Non-Referential It for Question Answer Pairs
Classifying Non-Referential It for Question Answer Pairs
 
NAACL Tutorial
Social Media Predictive Analytics
NAACL Tutorial
Social Media Predictive AnalyticsNAACL Tutorial
Social Media Predictive Analytics
NAACL Tutorial
Social Media Predictive Analytics
 
IRJET-Sentiment Analysis in Twitter
IRJET-Sentiment Analysis in TwitterIRJET-Sentiment Analysis in Twitter
IRJET-Sentiment Analysis in Twitter
 
Sentiment analysis using ml
Sentiment analysis using mlSentiment analysis using ml
Sentiment analysis using ml
 
Putting Research to Work for You
Putting Research to Work for YouPutting Research to Work for You
Putting Research to Work for You
 
Putting research to work for you sabatier
Putting research to work for you sabatierPutting research to work for you sabatier
Putting research to work for you sabatier
 
Integrating digital traces into a semantic enriched data
Integrating digital traces into a semantic enriched dataIntegrating digital traces into a semantic enriched data
Integrating digital traces into a semantic enriched data
 
Estimating the overall sentiment score by inferring modus ponens law
Estimating the overall sentiment score by inferring modus ponens lawEstimating the overall sentiment score by inferring modus ponens law
Estimating the overall sentiment score by inferring modus ponens law
 
Extracting Semantic
Extracting Semantic Extracting Semantic
Extracting Semantic
 
Watson DevCon 2016 - From Jeopardy! to the Future
Watson DevCon 2016 - From Jeopardy! to the FutureWatson DevCon 2016 - From Jeopardy! to the Future
Watson DevCon 2016 - From Jeopardy! to the Future
 
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptxSampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
 
Extracting Semantic User Networks from Informal Communication Exchanges
Extracting Semantic User Networks from Informal Communication ExchangesExtracting Semantic User Networks from Informal Communication Exchanges
Extracting Semantic User Networks from Informal Communication Exchanges
 
IRJET - Twitter Sentiment Analysis using Machine Learning
IRJET -  	  Twitter Sentiment Analysis using Machine LearningIRJET -  	  Twitter Sentiment Analysis using Machine Learning
IRJET - Twitter Sentiment Analysis using Machine Learning
 
Wimmics Research Team 2015 Activity Report
Wimmics Research Team 2015 Activity ReportWimmics Research Team 2015 Activity Report
Wimmics Research Team 2015 Activity Report
 
IRJET - Cyberbulling Detection Model
IRJET -  	  Cyberbulling Detection ModelIRJET -  	  Cyberbulling Detection Model
IRJET - Cyberbulling Detection Model
 
sent_analysis_report
sent_analysis_reportsent_analysis_report
sent_analysis_report
 
Sentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes AlgorithmSentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes Algorithm
 
IRJET- A Review on: Sentiment Polarity Analysis on Twitter Data from Diff...
IRJET-  	  A Review on: Sentiment Polarity Analysis on Twitter Data from Diff...IRJET-  	  A Review on: Sentiment Polarity Analysis on Twitter Data from Diff...
IRJET- A Review on: Sentiment Polarity Analysis on Twitter Data from Diff...
 
Questions about questions
Questions about questionsQuestions about questions
Questions about questions
 
Twitter sentiment analysis ppt
Twitter sentiment analysis pptTwitter sentiment analysis ppt
Twitter sentiment analysis ppt
 

Kürzlich hochgeladen

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Kürzlich hochgeladen (20)

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 

Alleviating Data Sparsity for Twitter Sentiment Analysis

  • 1. Alleviating Data Sparsity For Twitter Sentiment Analysis Hassan Saif, Yulan He & Harith Alani Knowledge Media Institute, The Open University, Milton Keynes, United Kingdom Making Sense of Microblogs – WWW2012 Conference Lyon - France
  • 2. Outline • Hello World • Motivation • Related Work • Semantic Features • Topic-Sentiment Features • Evaluation • Demos • The Future
  • 3. Sentiment Analysis “Sentiment analysis is the task of identifying positive and negative opinions, emotions and evaluations in text” The main dish was The main dish was It is a Syrian dish delicious salty and horrible Opinion Fact Opinion 3
  • 4. Microblogging • Service which allows subscribers to post short updates online and broadcast them • Answers the question: What are you doing now? • Twitter, Plurk, sfeed, Yammer, BlueTwi, etc. 4
  • 7. Sense? 1400000 AGARWAL et al. 1200000 1143562 BARBOSA, L., AND FENG, J. 1000000 BIFET, A., AND FRANK, E. 800000 DIAKOPOULOS, N., AND SHAMMA, D. 713178 GO et al. 600000 He & Saif 400000 PAK & PAROUBEK 200000 And 0 1 Many Objective Tweets Subjective Tweets Others UK General Elections Corpus
  • 8. Why Private Sectors Because It is Critical df Keep In Touch Public Sectors
  • 10. Sentiment Analysis Lexical Based Approach Text Classification Problem Right Features Word Polarity Building Better Dictionary Machine Learning Approach
  • 11. Twitter Sentiment Analysis Challenges – The short length of status update – Language Variations – Open Social Environment
  • 12. Twitter Sentiment Analysis Related Work • Distant Supervision – Supervised classifiers trained from noisy labels – Tweets messages are labeled using emoticons – Data filtering process Go et al., (2009) - Barbosa and Fengl. (2010) – Pak and Paroubek (2010)
  • 13. Twitter Sentiment Analysis Related Work • Followers Graph & Label Propagation – Twitter follower graph (users, tweets, unigrams and hashtags) – Start with small number of labeled tweets – Applied label propagation method throughout the graph. Speriosu et al., (2009)
  • 14. Twitter Sentiment Analysis Related Work • Feature Engineering – Unigrams, bigrams, POS – Microblogging features • Hashtags • Emoticons • Abbreviations & Intensifiers Agsrwal et al., (2011) – Kouloumpis et al (2011)
  • 15. So?
  • 16. What Does sparsity mean? Training data contains many infrequent terms
  • 17. What Does sparsity mean? Word frequency statistics
  • 18. How! Sentiment Topic Features Extracts semantically hidden concepts from tweets data and then incorporates them into supervised Extract latent topics and classifier training by interpolation the associated topic sentiment from the tweets data which are subsequently added into the original feature space Semantic Features for supervised classifier training
  • 19. Semantic Features Shallow Semantic Method Sport @Stace_meister Ya, I have Rugby in an hour Sushi time for fabulous Jesse's last day on dragons den Person Dear eBay, if I win I owe you a total 580.63 bye paycheck Company
  • 21. Topic-Sentiment Features Joint Sentiment Topic Model JST1 is a four-layer generative model which allows the detection of both sentiment and topic simultaneously from text. The only supervision is word prior polarity information which can be obtained from MPQA subjectivity lexicon. Lin & He. 2009
  • 22. Twitter Sentiment Corpus Stanford University Collected: the 6th of April & the 25th of June 2009 Training Set: 1.6 million tweets (Balanced) Testing Set: 177 negative tweets & 182 positive tweets http://twittersentiment.appspot.com/
  • 23. Our Sentiment Corpus • Training Set: 60K tweets • Testing Set: 1000 tweets • Annotated additional 640 tweets using Tweenator
  • 24. Evaluation Extended Test Set Method Accuracy Unigrams 80.7% Semantic replacement 76.3% Semantic interpolation 84.0% Sentiment-topic features 82.3% Sentiment classification results on the 1000-tweets test set
  • 25. Evaluation Stanford Dataset Method Accuracy Unigrams 81.0% Semantic replacement 77.3% Sematic augmentation 80.45% Semantic interpolation 84.1% Sentiment-topic features 86.3% (Go et al., 2009) 83% (Speriosu et al., 2011) 84.7% Sentiment classification results on the original Stanford Twitter Sentiment test set
  • 26. Evaluation Semantic Features V.s Sentiment-Topic Features
  • 28. Conclusion • Twitter sentiment analysis faces data sparsity problem due to some special characteristics of Twitter • Semantic & topic-sentiment features reduce the sparsity problem and increase the performance significantly • Sentiment-topic features should be preferred over semantic features for the sentiment classification task since it gives much better results with far less features. 30
  • 30. Future Work Sentiment-Topic Model Enhance Entity Extraction Methods Attaching weight to extracted features Semantic Smoothing Model Statistical replacement
  • 31. References [1] AGARWAL, A., XIE, B., VOVSHA, I., RAMBOW, O., AND PASSONNEAU, R. Sentiment analysis of twitter data. In Proceedings of the ACL 2011 Workshop on Languages in Social Media (2011), pp. 30–38. [2] BARBOSA, L., AND FENG, J. Robust sentiment detection on twitter from biased and noisy data. In Proceedings of COLING (2010), pp. 36–44. [3] BIFET, A., AND FRANK, E. Sentiment knowledge discovery in twitter streaming data. In Discovery Science (2010), Springer, pp. 1–15. [4] GO, A., BHAYANI, R., AND HUANG, L. Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford (2009). [5] KOULOUMPIS, E., WILSON, T., AND MOORE, J. Twitter sentiment analysis: The good the bad and the omg! In Proceedings of the ICWSM (2011). [5]LIN, C., AND HE, Y. Joint sentiment/topic model for sentiment analysis. In Proceeding of the 18th ACM conference on Information and knowledge management (2009), ACM, pp. 375–384. [6] PAK, A., AND PAROUBEK, P. Twitter as a corpus for sentiment analysis and opinion mining. Proceedings of LREC 2010 (2010). [7]SAIF, H., HE, Y., AND ALANI, H. Semantic Smoothing for Twitter Sentiment Analysis. In Proceeding of the 10th International Semantic Web Conference (ISWC) (2011).
  • 32. Thank You

Hinweis der Redaktion

  1. Can we extract sentiment from microblogs?Provide list of people who said users express more opinions on microblogscustomers are less controlled of their emotions when interacting in a social environment.So, yes we can do sentiment analysis over microblogs.
  2. Now, Why do we need to do it?Millionsof status updates and tweets messages are shared everyday among usersFor public sectors: social data can be used to predict public political opinion which make easier to policy makers to put political strategies based on that.For private sector: Companies can track public opinion about their products and services
  3. Two Main ApproachesLexical-based ApproachBuilding a better dictionaryFocuses on words and phrases as bearers of semantic orientationMachine Learning Approach Finding the Right FeaturesYet another form of text classification
  4. To overcome the noisy label data
  5. To overcome the noisy label data
  6. To overcome the noisy label data
  7. Previous work tried to overcome the sparsity problem partially and indirectly by either apply some data filtering or depending on different set of microblogs features. However,None of them studied the affect of data sparsity on the classification performance.
  8. Why: many words and terms in the testing corpus doesn’t appear in the training corpus.
  9. Compares the word frequency statistics of the tweets data we used in our experiments and the movie review dataX-axis shows the word frequency interval, e.g., words occur up to 10 times. (1-10), more than 10 times but up to 20 times (10-20), etc.Y-axis shows the percentage of words falls within certain word frequency interval.It can be observed that the tweets data are sparser than the movie review data since the former contain more infrequent words, with 93% of the words in the tweets data occurring less than 10 times (cf. 78% in the movie review data).
  10. §
  11. §