SQL Database Design For Developers at php[tek] 2024
Â
Presentacion
1. Introduction
Framework
Sentiment analysis
Case studies
Conclusions
A Descriptive Analysis of Twitter Activity Around
Boston Terror Attacks
Ălvaro Cuesta David F. Barrero MarĂa D. R-Moreno
Computer Engineering Department
Universidad de AlcalĂĄ, Spain
ICCCI 2013
Craiova, Romania
September 11, 2013
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 1 / 25
2. Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Summary
1 Introduction
Motivation
Objectives
Case studies
2 Framework
Framework overview
Framework messaging
Framework components
3 Sentiment analysis
Overview
ClassiïŹer
4 Case studies
Boston Terror Attack
Political analysis
5 Conclusions and future work
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 2 / 25
3. Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Motivation
Objectives
Case studies
Introduction
Motivation
Great expansion of social networks in the last
years
One of the most successfull ones is Twitter
Microblogging platform
Short messages known as tweets
Open nature
Twitter oïŹers great research opportunities
Open nature
Distributed human sensor network
Easy data extraction, diïŹcult data
processing
Twitter + sentiment analysis
Lack of tools for sentiment analysis in
Spanish
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 3 / 25
4. Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Motivation
Objectives
Case studies
Introduction
Objectives
Twitter oïŹers excelent API ... however there is a need of some
infraestructure (mainly storage and reporting)
Objectives
1 Develop a framework for Twitter data extraction and analysis
2 Provide reporting tools
3 Foundation for sentiment analysis in Spanish
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 4 / 25
6. Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Framework overview
Framework messaging
Framework components
Framework architecture
Overview
Requirements
Easy to use, extensible, massive data processing
Design decisions
Modular design: Collection of independent scripts
Focus on open data formats
Built around the database: MongoDB
Set of independent scripts interchanging data
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 6 / 25
8. Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Framework overview
Framework messaging
Framework components
Framework architecture
Framework components: Miner
Miner
Extracts and stores
tweets
Stream API
Several ïŹlters
Written in Python
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 8 / 25
9. Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Framework overview
Framework messaging
Framework components
Framework architecture
Framework components: Database
Database
Storage for futher
processing
MongoDB
NoSQL database
High performance
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 9 / 25
10. Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Framework overview
Framework messaging
Framework components
Framework architecture
Framework components: Reporting
Reporting
CSV export for
futher processing
R processing
Extensibility
Powerful libraries
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 10 / 25
11. Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Framework overview
Framework messaging
Framework components
Framework architecture
Framework components: Sentiment analysis
Sentiment analysis
Supervised learning
Need of labeling
Tools for labeling
ClassiïŹer building
ClassiïŹer testing
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 11 / 25
12. Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Overview
ClassiïŹer
Sentiment analysis
Overview
Supervised learning with Natural Language Toolkit (NLTK)
Three classes: âPositiveâ, ânegativeâ and âneutralâ
Need of labeled corpus
Several ones in English ...
... none in Spanish
Need of thousands manually classiïŹed tweets
Collaborative labeling
Web application to label tweets
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 12 / 25
14. Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Boston Terror Attack
Political analysis
Case study
Boston Terror Attack
Main objective
Evaluate the platform
Secondary objective
Describe activity around an event
Stream by string ïŹlter
The event
Terror attack on 15 Apr 2013 14:49 (GMT-4) in Boston
Internet witch-hunt motivated by the release of some photos
Shooting and manhunt
Data adquisition
Begin: Tue, 16 Apr 2013 00:43 (GMT)
End: Tue, 23 Apr 2013 00:43 (GMT)
Filter: âMaratĂłn de Bostonâ (Boston Marathon in Spanish)
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 14 / 25
15. Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Boston Terror Attack
Political analysis
Case study
Boston Terror Attack: Dataset description
Value Relative Average
Tweets 28,892 1.16/user
No-retweets 16,029 55.48 %
Reweets 12,863 44.52 %
Geolocalized 255 0.88 %
Users 24,989
Mentions 18,937 65.54 %
Replies 849 2.94 %
Non-replies 18,088 62.61 %
Size 96.39 MB 3.38 KB/tweet
Index size 0.91 MB
Disk 132.99 MB
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 15 / 25
16. Case study
Boston Terror attack: activity
Apr 17 Apr 19 Apr 21 Apr 23
010002500
Time
Tweets
Tweets
Apr 17 Apr 19 Apr 21 Apr 23
04001000
Time
Nonâretweets
Tweets (excluding RTs)
Apr 17 Apr 19 Apr 21 Apr 23
04001000
Time
Retweets
Retweets
Dashed line: Bombing
Dotted line: Photo release
Solid line: Shooting
Gray background: Manhunt
17. Case study
Boston Terror attack: activity
Thu 23:00 Fri 04:00 Fri 09:00 Fri 14:00 Fri 19:00 Sat 00:00
50150
Time
Tweets
Tweets
Thu 23:00 Fri 04:00 Fri 09:00 Fri 14:00 Fri 19:00 Sat 00:00
2060120
Time
Nonâretweets
Tweets (excluding RTs)
Thu 23:00 Fri 04:00 Fri 09:00 Fri 14:00 Fri 19:00 Sat 00:00
0204060
Time
Retweets
Retweets
Dotted line: Photo release
Solid line: Shooting
Gray background: Manhunt
18. Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Boston Terror Attack
Political analysis
Case study
Political analysis: Overview
Main objective
Evaluate sentiment analysis
Secondary objective
Describe regular Twitter activity
Stream by user ïŹlter
Selection of Spanish political actors
Selected by activity and controversy
Account owner Accounts
Political party @PPopular, @PSOE, @iunida, @UPyD
Politician @agarzon, @EduMadina, @ToniCanto1, @Re-
villaMiguelA, @ccifuentes, @_Rubalcaba_
Journalist @jordievole, @iescolar
Activist organization @LA_PAH
Data adquisition
From Tue, 16 Apr 2013 00:00 (GMT)
End: 18 Apr 2013 04:00 (GMT)
Filter: Account name (â@accountâ)
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 18 / 25
19. Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Boston Terror Attack
Political analysis
Case study
Political analysis: Dataset description
Value Relative Average
Tweets 65,043 1.9/user
No-retweets 28,175 43.32 %
Reweets 36,868 56.68 %
Geolocalized 528 0.81 %
Users 34,195
Mentions 56,713 87.19 %
Non-replies 46,981 72.23 %
Replies 9,732 14.96 %
Size 227.51 MB 3.58 KB/tweet
Index size 2.05 MB
Disk 237.95 MB
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 19 / 25
20. Case study
Political analysis: Activity
Tue Wed Thu
015003500
Time
Tweets
Tweets
Tue Wed Thu
05001500
Time
Nonâretweets
Tweets (excluding RTs)
Tue Wed Thu
010002000
Time
Retweets
Retweets
21. Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Boston Terror Attack
Political analysis
Case study
Political analysis: Sentiment analysis
9, 884 tweets were manually classiïŹed in a collaborative way
4, 739 non-neutral tweets
1, 062 positives, 3, 677 negatives
Unbalanced dataset
We tried several parameters for the NaĂŻve Bayes classiïŹer
N-grams: {1}, {2}, {3}, {1, 2}, {1, 3} and {2, 3}
Minimum score: 0, 1, 2, 3, 4, 5, 6 and 10
10-fold cross-validation
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 21 / 25
22. Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Boston Terror Attack
Political analysis
Case study
Political analysis: Sentiment analysis
Accuracy
NaiveBayes-1_2-min3 0.8543
NaiveBayes-1-min3 0.8510
NaiveBayes-1_3-min3 0.8507
NaiveBayes-1-min4 0.8476
NaiveBayes-1_3-min5 0.8474
NaiveBayes-1_2-min4 0.8469
NaiveBayes-1_3-min4 0.8467
NaiveBayes-1_3-min1 0.8459
NaiveBayes-1-min6 0.8452
NaiveBayes-1-min1 0.8448
NaiveBayes-1_2-min5 0.8446
NaiveBayes-1_3-min6 0.8438
NaiveBayes-1_2-min6 0.8436
NaiveBayes-1-min5 0.8406
NaiveBayes-1_2-min1 0.8389
NaiveBayes-2_3-min6 0.8385
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 22 / 25
24. Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Conclusions and future work
We developed a framework that eases data extraction and
analysis on Twitter
Ready for production
It will be released soon with a free licence
We brieïŹy described two case studies
Event driven activity - Boston terror attacks
Regular activity - Political activity
Sentiment analysis is intrinsically diïŹcult
Future work
Lemmalization
Natural language processing
Time series analysis
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 24 / 25
25. Thanks for your attention!
David F. Barrero
david@aut.uah.es
@dfbarrero