Data augmented ethnography: using big data and ethnography to explore candi...
Large scale Twitter collection of 2012 US election
1. 2012 Twitter
collection
Alexander
Hanna
Research
Agenda
Current
Large scale Twitter collection of 2012 US
Project
Case Study
election
Future
Approaches
Alexander Hanna
Department of Sociology
University of Wisconsin-Madison
ahanna@ssc.wisc.edu
@alexhanna
September 14, 2012
2. 2012 Twitter
collection
Alexander A Twitter-specific Research
Hanna
Agenda
Research
Agenda
Current
Project
Case Study
Future
Approaches
• How different is the political Twitterverse from the rest
of the social graph?
• What are the different modes of engagement between
different types of elite users and their followers?
• How does information flow from elite users to others?
3. 2012 Twitter
collection
Alexander A Twitter-specific Research
Hanna
Agenda
Research
Agenda
Current
Project
Case Study
Future
Approaches
• How different is the political Twitterverse from the rest
of the social graph?
• What are the different modes of engagement between
different types of elite users and their followers?
• How does information flow from elite users to others?
4. 2012 Twitter
collection
Alexander A Twitter-specific Research
Hanna
Agenda
Research
Agenda
Current
Project
Case Study
Future
Approaches
• How different is the political Twitterverse from the rest
of the social graph?
• What are the different modes of engagement between
different types of elite users and their followers?
• How does information flow from elite users to others?
5. 2012 Twitter
collection
Alexander Current Study
Hanna
Research
Agenda
Current
Project
Case Study
Future
Approaches
• Structured to consider direct follow relationships
• Constructing the political Twitterverse
6. 2012 Twitter
collection
Alexander Political elites
Hanna
Research
Agenda
Current
Project
Case Study
• Candidates in national races
Future
Approaches • Party leadership
• Media - Pundits, Reporters, Bloggers
• Satirists
• Celebrities
• Advocacy Groups
7. 2012 Twitter
collection
Alexander Sampling Strategy
Hanna
Research
Agenda Three levels - Elites and followers
Current
Project
Case Study
Future
Approaches
8. 2012 Twitter
collection
Alexander Waves of collection
Hanna
Research
Agenda
Current
Project
Case Study
Future
Approaches
Sampling at three different points
• Pre-primary - Mid January
• Post-primary - June 26
• Post-convention and pre-election - September 7
9. 2012 Twitter
collection
Alexander Data Collection and Processing
Hanna
Research
Agenda
Current
Project
Case Study
Future • Twitter RESTful API for collecting follower lists
Approaches
• Twitter Streaming API for collecting tweets
• Two streams - targeted sample stream and
“gardenhose” (10% sample of all of Twitter)
• Hadoop/MapReduce for analysis
10. 2012 Twitter
collection
Alexander Data size and storage
Hanna
Research
Agenda
Current
Project
Case Study
• Gardenhose
Future • 2.7 TB
Approaches • 20-40mil tweets/day
• 15-16 GB/day
• Targeted sample:
• 77,054 unique users
• 103 GB
• 500k-1mil tweets/day
• Currently around 1 GB/day
11. 2012 Twitter
collection
Alexander Case Study in Agenda Setting
Hanna
Research
Agenda
Current
Project
Case Study
Future
Approaches
Who establishes the media discourse? How do different
elements of media try to set the discourse?
12. 2012 Twitter
collection
Alexander Trayvon Martin
Hanna
Research
Agenda • February 26 - Martin
Current
Project
killed
Case Study • March 8 - CBS News
Future
Approaches
interview with Martin’s
parents
• Week of March 12 -
Media catches on, case
more covered than
presidential race
• April 11 - State
Prosecuter files charges
• April 19 - Zimmerman
released on bond
13. 2012 Twitter
collection
Alexander Twitter mentions
Hanna
Research
Agenda 1.0
Current
Project
Case Study 0.8
Future
Approaches
0.6
factor(Keyword)
Count
trayvon
zimmerman
0.4
0.2
03/01 03/05 03/09 03/13 03/17 03/21 03/25 03/29 04/02 04/06 04/10 04/14 04/18 04/22 04/26 04/30 05/04
Date
14. 2012 Twitter
collection
Alexander Twitter vs. Google
Hanna
Research
Agenda 1.0
Current
Project
Case Study 0.8
Future
Approaches
0.6
factor(Keyword)
trayvon
Count
zimmerman
Gzimmerman
Gtrayvon
0.4
0.2
0.0
03/01 03/05 03/09 03/13 03/17 03/21 03/25 03/29 04/02 04/06 04/10 04/14 04/18 04/22 04/26 04/30 05/04
Date
15. 2012 Twitter
collection
Alexander Setting the agenda
Hanna
Mentions of Trayvon
Research
Agenda
Current
Project
Case Study 0.05
Future
Approaches
0.04
factor(Level)
0.03 1
Ratio
2
3
0.02
0.01
03/01 03/05 03/09 03/13 03/17 03/21 03/25 03/29 04/02 04/06 04/10 04/14 04/18 04/22 04/26 04/30 05/04
Date
16. 2012 Twitter
collection
Alexander Setting the agenda
Hanna
Mentions of Zimmerman
Research
Agenda
Current
Project
0.030
Case Study
Future
Approaches 0.025
0.020
factor(Level)
1
Ratio
2
3
0.015
0.010
0.005
03/01 03/05 03/09 03/13 03/17 03/21 03/25 03/29 04/02 04/06 04/10 04/14 04/18 04/22 04/26 04/30 05/04
Date
17. 2012 Twitter
collection
Alexander Setting the agenda
Hanna
Research
Agenda
Current
Project
Case Study
• No noticable difference
Future
Approaches between mentions of
Trayvon in elites vs.
followers
• However, followers seem
to catch on to
Zimmerman quicker
18. 2012 Twitter
collection
Alexander Future Work
Hanna
Research
Agenda
Current
Project
Case Study
Future • Incorporating network structure
Approaches
• Follower/friend networks
• User mention networks
• Retweet patterns
• Computer-aided content analysis
• Machine learning (supervised and unsupervised)
19. 2012 Twitter
collection
Alexander Future Work
Hanna
Research
Agenda
Current
Project
Case Study
Future
Approaches
Thanks!
ahanna@ssc.wisc.edu
@alexhanna