SlideShare a Scribd company logo
1 of 18
Evaluation Datasets for Twitter Sentiment Analysis
A survey and a new dataset, the STS-Gold

Hassan Saif, Miriam Fernandez, Yulan He and Harith Alani
Knowledge Media Institute, The Open University,
Milton Keynes, United Kingdom

1st Workshop on Emotion and Sentiment in Social and
Expressive Media Approaches and perspectives from AI
• Definition & Background
• Evaluation Datasets for Twitter Sentiment
Analysis
• STS-Gold

Outline
• Comparative Study
• Conclusion
Sentiment Analysis – Definition
Sentiment Analysis
“Sentiment analysis is the task of identifying
positive and negative opinions, emotions and
evaluations in text”

The main dish was
delicious

It is a Syrian dish

Positive

Neutral

The main dish was
salty and horrible

Negative
3
Supervised

Sentiment Approaches

Unsupervised
Hybrid

Tweet-level
Sentiment Levels
Phrase-level
Entity-level

Twitter
Sentiment
Analysis
(Background)

Subjectivity
Sentiment Tasks

Polarity
Sentiment Strength
Emotion/Mood

4
Evaluation Datasets for Twitter Sentiment Analysis
SA Level

SA Task

No. of Tweets

Construction & Annotation

Dataset
Dataset

Vocabulary Size

Class Distribution
Sparsity
Dataset

SA Level

SA Task

Annotation/Agreement

Tweet

Subjectivity

Manual/UD

Tweet/Target

Subjectivity

Manual/UD

Obama-McCain Debate (OMD)

Tweet

Polarity*

Manual/α=0.655

Sentiment Strength Twitter Dataset (SS-Tweet)

Tweet

Strength/Subj
ectivity**

Manual
α≈0.56

Sanders Twitter Dataset

Tweet

Subjectivity

Manual/UD

Dialogue Earth Twitter Corpus (WAB, GASP)

Tweet/Target

Subjectivity

Manual/UD

SemEval-2013 Dataset

Tweet/Expre
ssion

Subjectivity

Manual/UD

Stanford Twitter Corpus (STS)
Health Care Reform (HCR)

Evaluation Datasets – Overview
• Details about the annotation
methodology (STS, HCR, Sanders)

What is Missing?

• Entity-level Sentiment Evaluation:
• Most works are focused on
assessing the performance of
sentiment classifiers at the tweet
level (STS, OMD, SS-Tweet, Sanders)
• Datasets, which allow for the
sentiment evaluation at the entity
level, assign similar sentiment
labels to the tweet and the entities
within it. (HCR, WAB, GASP)
 Enables the evaluation at both the entity and tweet
levels

 Tweets and entities are annotated independently

 Contains 58 Entities & 3000 Tweets
Data Collection

STS Corpus
Select

28 Entities
Select

100 Tweet/Entity
180K Tweets

STS-Gold

Alchemy API

2800 Tweets

Entity-Extraction
+200 tweets

Identify Frequent
Concepts

3000 Tweets

Top & Mid
Frequent Entities

Entity-Extraction

147 Entities
STS-Gold
Obama

Taylor Swift

Vegas

YouTube

Facebook

London
City

Person

Person

Person

Company

LeBron

Oprah

Person

Seattle

McDonalds

Starbucks

Sydney
iPod

iPhone
Lakers
England

Cavas

US

Xbox

Technology
Person

PSP

Organization

Person

Country

Headache

NASA

Person

Health
Condition

UN

Brazil

LeBron

Flu

Person
Cancer

Fever
3000 Tweets

147 Entities

Data Annotation

Tweenator.com

Sentiment Classes
Positive, Negative, Neutr
al, Mixed, Other

STS-Gold
3000 Tweets

147 Entities

Inter-annotation Agreement
Tweet α=0.765

Filtering

2205 Tweets

58 Entities

Entity α1=0.416
α2=0.964
Comparative Study

•
•
•
•

Vocabulary Size
Number of Tweets
Data Sparsity
Classification Performance
– Polarity Classification
– Naïve Bayes & Maximum Entropy
Comparative Study.1
Vocabulary Size vs. No. of Tweets
- There exists a high correction between the vocabulary size and the number of
tweets (ρ = 0.95)
- However, increasing the number of tweets does not always lead to increasing the
vocabulary size. (OMD)
Data Spar sity

Comparativeimportant factor that affectstheov
Da s t s rs isa Study.2
ta e pa ity
n

-

m chinele rning cla s rs[17]. According toS if e a
a
a
s ifie
a t l.
tha
nothe type
r
sof da
ta(e m
.g., oviere w da ) duetoa
vie
ta
Data Sparsity in tweets.
words
Inthiss ction, wea
e
imtocom rethepre e dda s ts
pa
s nte ta e
Twitter datasets are generally tethes rs de eof agive
Toca
lculavery sparse ity gre
pa
nda s t weus
ta e
e
Increasing both the number of tweets or the vocabulary size increases the sparsity
[13]:
Pn
degree of the dataset:
- ρno_of_tweets = 0.71
i Ni
Sd = 1 −
- ρvocabulary_size = 0.77
n ⇥ |V |
Whe
reN i isthethenum r of dis
be
tinct wordsintwe t i
e
the dataset and |V | the vocabulary size.
9

The Twe tNLP toke r ca be downloa d from ht t p:
e
nize n
de
Tweet NLP/
Comparative Study.3
Classification Performance vs. Dataset Sparsity (1)

0.9

Average Classifier Performance

Average Classifier Performance

According to Makrehchi et al (2008) and Saif et al (2012): in a given dataset the
classification performance and the sparsity degree are negatively correlated, i.e.,
increasing the dataset sparsity hinders the classification performance.
228
M . M akrehchi and M .S. K amel

0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1

Industry Sectors
20 newsgroups
Reuters

0.991 0.992 0.993 0.994 0.995 0.996 0.997 0.998 0.999

Average Sparsity

(a)

0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0.9441

Industry Sectors
20 newsgroups
Reuters
0.9550

0.9661

0.9772

0.9886

1.00

0.9441

0.9550

Average Sparsity

(b)

F i g. 2. Classifier performance as a funct ion of sparsity: (a) Rocchio, and (b) SV M
Comparative Study.3
Classification Performance vs. Dataset Sparsity (2)
- No correlation between the classification performance and the sparsity degree
across the datasets. (ρacc = −0.06, ρf1 = 0.23)
- The sparsity-performance correlation is intrinsic, meaning that it might exists within
the dataset itself, but not necessarily across the datasets.
• Current datasets to evaluate Twitter
sentiment classifiers:
– Focus on the tweet-level.
– Assign similar sentiment labels to the
tweets and the entities within them.

• STS-Gold allows for sentiment evaluation
as both the tweet and the entity levels.

• A correlation between the vocabulary size
and the number of tweets does not
always exist.
• The sparsity-performance correlation is
intrinsic, i.e., it only exists within the
dataset itself, but not across the different
datasets.

Conclusion!
Thank You
Email: hassan.saif@open.ac.uk
Twitter: hrsaif
Website: tweenator.com

More Related Content

What's hot

Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14
Rachit Goel
 
Twitter sentiment analysis ppt
Twitter sentiment analysis pptTwitter sentiment analysis ppt
Twitter sentiment analysis ppt
SonuCreation
 
SENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATASENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATA
Parvathy Devaraj
 

What's hot (20)

Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14
 
sentiment analysis text extraction from social media
sentiment  analysis text extraction from social media sentiment  analysis text extraction from social media
sentiment analysis text extraction from social media
 
Sentiment mining- The Design and Implementation of an Internet Public Opinion...
Sentiment mining- The Design and Implementation of an Internet PublicOpinion...Sentiment mining- The Design and Implementation of an Internet PublicOpinion...
Sentiment mining- The Design and Implementation of an Internet Public Opinion...
 
A review of sentiment analysis approaches in big
A review of sentiment analysis approaches in bigA review of sentiment analysis approaches in big
A review of sentiment analysis approaches in big
 
Ontology based sentiment analysis
Ontology based sentiment analysisOntology based sentiment analysis
Ontology based sentiment analysis
 
Opinion Mining – Twitter
Opinion Mining – TwitterOpinion Mining – Twitter
Opinion Mining – Twitter
 
IRE2014-Sentiment Analysis
IRE2014-Sentiment AnalysisIRE2014-Sentiment Analysis
IRE2014-Sentiment Analysis
 
Sentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse AnalysisSentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse Analysis
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
Sentiment analysis
Sentiment analysisSentiment analysis
Sentiment analysis
 
Sentiment Analysis in Twitter
Sentiment Analysis in TwitterSentiment Analysis in Twitter
Sentiment Analysis in Twitter
 
Sentiment analysis of Twitter data using python
Sentiment analysis of Twitter data using pythonSentiment analysis of Twitter data using python
Sentiment analysis of Twitter data using python
 
Twitter sentiment analysis ppt
Twitter sentiment analysis pptTwitter sentiment analysis ppt
Twitter sentiment analysis ppt
 
Sentiment Analaysis on Twitter
Sentiment Analaysis on TwitterSentiment Analaysis on Twitter
Sentiment Analaysis on Twitter
 
Alleviating Data Sparsity for Twitter Sentiment Analysis
Alleviating Data Sparsity for Twitter Sentiment AnalysisAlleviating Data Sparsity for Twitter Sentiment Analysis
Alleviating Data Sparsity for Twitter Sentiment Analysis
 
New sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumarNew sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumar
 
Sentimental Analysis of twitter data .
Sentimental Analysis of twitter data .Sentimental Analysis of twitter data .
Sentimental Analysis of twitter data .
 
Sentiment analysis - Our approach and use cases
Sentiment analysis - Our approach and use casesSentiment analysis - Our approach and use cases
Sentiment analysis - Our approach and use cases
 
Big Data + Sentiment Analysis = Awesome
Big Data + Sentiment Analysis = AwesomeBig Data + Sentiment Analysis = Awesome
Big Data + Sentiment Analysis = Awesome
 
SENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATASENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATA
 

Similar to Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new dataset, the STS-Gold

DESCRIPTIVE ANALYSIS1DESCRIPTIVE ANALYSIS8Examining .docx
DESCRIPTIVE ANALYSIS1DESCRIPTIVE ANALYSIS8Examining .docxDESCRIPTIVE ANALYSIS1DESCRIPTIVE ANALYSIS8Examining .docx
DESCRIPTIVE ANALYSIS1DESCRIPTIVE ANALYSIS8Examining .docx
donaldp2
 
DESCRIPTIVE ANALYSIS1DESCRIPTIVE ANALYSIS8Examining .docx
DESCRIPTIVE ANALYSIS1DESCRIPTIVE ANALYSIS8Examining .docxDESCRIPTIVE ANALYSIS1DESCRIPTIVE ANALYSIS8Examining .docx
DESCRIPTIVE ANALYSIS1DESCRIPTIVE ANALYSIS8Examining .docx
carolinef5
 
Slides
SlidesSlides
Slides
butest
 
Sa discover text webinar
Sa discover text webinarSa discover text webinar
Sa discover text webinar
QuestionPro
 

Similar to Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new dataset, the STS-Gold (20)

statistical analysis of questionnaires
statistical analysis of questionnairesstatistical analysis of questionnaires
statistical analysis of questionnaires
 
GradTrack: Getting Started with Statistics September 20, 2018
GradTrack: Getting Started with Statistics September 20, 2018GradTrack: Getting Started with Statistics September 20, 2018
GradTrack: Getting Started with Statistics September 20, 2018
 
GradTrack: Getting Started with Statistics September 20, 2018
GradTrack: Getting Started with Statistics September 20, 2018GradTrack: Getting Started with Statistics September 20, 2018
GradTrack: Getting Started with Statistics September 20, 2018
 
Twitter Sentiment & Investing - modeling stock price movements with twitter s...
Twitter Sentiment & Investing - modeling stock price movements with twitter s...Twitter Sentiment & Investing - modeling stock price movements with twitter s...
Twitter Sentiment & Investing - modeling stock price movements with twitter s...
 
Twitter sentiment classifications 1
Twitter sentiment classifications 1Twitter sentiment classifications 1
Twitter sentiment classifications 1
 
Module 1.3 data exploratory
Module 1.3  data exploratoryModule 1.3  data exploratory
Module 1.3 data exploratory
 
DESCRIPTIVE ANALYSIS1DESCRIPTIVE ANALYSIS8Examining .docx
DESCRIPTIVE ANALYSIS1DESCRIPTIVE ANALYSIS8Examining .docxDESCRIPTIVE ANALYSIS1DESCRIPTIVE ANALYSIS8Examining .docx
DESCRIPTIVE ANALYSIS1DESCRIPTIVE ANALYSIS8Examining .docx
 
DESCRIPTIVE ANALYSIS1DESCRIPTIVE ANALYSIS8Examining .docx
DESCRIPTIVE ANALYSIS1DESCRIPTIVE ANALYSIS8Examining .docxDESCRIPTIVE ANALYSIS1DESCRIPTIVE ANALYSIS8Examining .docx
DESCRIPTIVE ANALYSIS1DESCRIPTIVE ANALYSIS8Examining .docx
 
Slalom
SlalomSlalom
Slalom
 
Mike Thelwall: Introduction to Webometrics
Mike Thelwall: Introduction to WebometricsMike Thelwall: Introduction to Webometrics
Mike Thelwall: Introduction to Webometrics
 
Media 330057 smxx
Media 330057 smxxMedia 330057 smxx
Media 330057 smxx
 
Digging for data: opportunities and challenges in an open research landscape_...
Digging for data: opportunities and challenges in an open research landscape_...Digging for data: opportunities and challenges in an open research landscape_...
Digging for data: opportunities and challenges in an open research landscape_...
 
Introduction to spss
Introduction to spssIntroduction to spss
Introduction to spss
 
wendi_ppt
wendi_pptwendi_ppt
wendi_ppt
 
Research Data Management
Research  Data ManagementResearch  Data Management
Research Data Management
 
Becoming Datacentric
Becoming DatacentricBecoming Datacentric
Becoming Datacentric
 
IRJET- Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
IRJET-  	  Improved Real-Time Twitter Sentiment Analysis using ML & Word2VecIRJET-  	  Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
IRJET- Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
 
Slides
SlidesSlides
Slides
 
Analytical Design in Applied Marketing Research
Analytical Design in Applied Marketing ResearchAnalytical Design in Applied Marketing Research
Analytical Design in Applied Marketing Research
 
Sa discover text webinar
Sa discover text webinarSa discover text webinar
Sa discover text webinar
 

Recently uploaded

Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
FIDO Alliance
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
FIDO Alliance
 

Recently uploaded (20)

Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 

Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new dataset, the STS-Gold

  • 1. Evaluation Datasets for Twitter Sentiment Analysis A survey and a new dataset, the STS-Gold Hassan Saif, Miriam Fernandez, Yulan He and Harith Alani Knowledge Media Institute, The Open University, Milton Keynes, United Kingdom 1st Workshop on Emotion and Sentiment in Social and Expressive Media Approaches and perspectives from AI
  • 2. • Definition & Background • Evaluation Datasets for Twitter Sentiment Analysis • STS-Gold Outline • Comparative Study • Conclusion
  • 3. Sentiment Analysis – Definition Sentiment Analysis “Sentiment analysis is the task of identifying positive and negative opinions, emotions and evaluations in text” The main dish was delicious It is a Syrian dish Positive Neutral The main dish was salty and horrible Negative 3
  • 5. Evaluation Datasets for Twitter Sentiment Analysis SA Level SA Task No. of Tweets Construction & Annotation Dataset Dataset Vocabulary Size Class Distribution Sparsity
  • 6. Dataset SA Level SA Task Annotation/Agreement Tweet Subjectivity Manual/UD Tweet/Target Subjectivity Manual/UD Obama-McCain Debate (OMD) Tweet Polarity* Manual/α=0.655 Sentiment Strength Twitter Dataset (SS-Tweet) Tweet Strength/Subj ectivity** Manual α≈0.56 Sanders Twitter Dataset Tweet Subjectivity Manual/UD Dialogue Earth Twitter Corpus (WAB, GASP) Tweet/Target Subjectivity Manual/UD SemEval-2013 Dataset Tweet/Expre ssion Subjectivity Manual/UD Stanford Twitter Corpus (STS) Health Care Reform (HCR) Evaluation Datasets – Overview
  • 7. • Details about the annotation methodology (STS, HCR, Sanders) What is Missing? • Entity-level Sentiment Evaluation: • Most works are focused on assessing the performance of sentiment classifiers at the tweet level (STS, OMD, SS-Tweet, Sanders) • Datasets, which allow for the sentiment evaluation at the entity level, assign similar sentiment labels to the tweet and the entities within it. (HCR, WAB, GASP)
  • 8.  Enables the evaluation at both the entity and tweet levels  Tweets and entities are annotated independently  Contains 58 Entities & 3000 Tweets
  • 9. Data Collection STS Corpus Select 28 Entities Select 100 Tweet/Entity 180K Tweets STS-Gold Alchemy API 2800 Tweets Entity-Extraction +200 tweets Identify Frequent Concepts 3000 Tweets Top & Mid Frequent Entities Entity-Extraction 147 Entities
  • 11. 3000 Tweets 147 Entities Data Annotation Tweenator.com Sentiment Classes Positive, Negative, Neutr al, Mixed, Other STS-Gold 3000 Tweets 147 Entities Inter-annotation Agreement Tweet α=0.765 Filtering 2205 Tweets 58 Entities Entity α1=0.416 α2=0.964
  • 12. Comparative Study • • • • Vocabulary Size Number of Tweets Data Sparsity Classification Performance – Polarity Classification – Naïve Bayes & Maximum Entropy
  • 13. Comparative Study.1 Vocabulary Size vs. No. of Tweets - There exists a high correction between the vocabulary size and the number of tweets (ρ = 0.95) - However, increasing the number of tweets does not always lead to increasing the vocabulary size. (OMD)
  • 14. Data Spar sity Comparativeimportant factor that affectstheov Da s t s rs isa Study.2 ta e pa ity n - m chinele rning cla s rs[17]. According toS if e a a a s ifie a t l. tha nothe type r sof da ta(e m .g., oviere w da ) duetoa vie ta Data Sparsity in tweets. words Inthiss ction, wea e imtocom rethepre e dda s ts pa s nte ta e Twitter datasets are generally tethes rs de eof agive Toca lculavery sparse ity gre pa nda s t weus ta e e Increasing both the number of tweets or the vocabulary size increases the sparsity [13]: Pn degree of the dataset: - ρno_of_tweets = 0.71 i Ni Sd = 1 − - ρvocabulary_size = 0.77 n ⇥ |V | Whe reN i isthethenum r of dis be tinct wordsintwe t i e the dataset and |V | the vocabulary size. 9 The Twe tNLP toke r ca be downloa d from ht t p: e nize n de Tweet NLP/
  • 15. Comparative Study.3 Classification Performance vs. Dataset Sparsity (1) 0.9 Average Classifier Performance Average Classifier Performance According to Makrehchi et al (2008) and Saif et al (2012): in a given dataset the classification performance and the sparsity degree are negatively correlated, i.e., increasing the dataset sparsity hinders the classification performance. 228 M . M akrehchi and M .S. K amel 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Industry Sectors 20 newsgroups Reuters 0.991 0.992 0.993 0.994 0.995 0.996 0.997 0.998 0.999 Average Sparsity (a) 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.9441 Industry Sectors 20 newsgroups Reuters 0.9550 0.9661 0.9772 0.9886 1.00 0.9441 0.9550 Average Sparsity (b) F i g. 2. Classifier performance as a funct ion of sparsity: (a) Rocchio, and (b) SV M
  • 16. Comparative Study.3 Classification Performance vs. Dataset Sparsity (2) - No correlation between the classification performance and the sparsity degree across the datasets. (ρacc = −0.06, ρf1 = 0.23) - The sparsity-performance correlation is intrinsic, meaning that it might exists within the dataset itself, but not necessarily across the datasets.
  • 17. • Current datasets to evaluate Twitter sentiment classifiers: – Focus on the tweet-level. – Assign similar sentiment labels to the tweets and the entities within them. • STS-Gold allows for sentiment evaluation as both the tweet and the entity levels. • A correlation between the vocabulary size and the number of tweets does not always exist. • The sparsity-performance correlation is intrinsic, i.e., it only exists within the dataset itself, but not across the different datasets. Conclusion!
  • 18. Thank You Email: hassan.saif@open.ac.uk Twitter: hrsaif Website: tweenator.com