SlideShare ist ein Scribd-Unternehmen logo
1 von 105
Twitter analytics: some thoughts
on sampling, tools, data, ethics
and user requirements
Farida Vis, Information School
University of Sheffield
@flygirltwo
Keynote SRA Social Media in Social Research conference, London, 24 June 2013.
READING
THE RIOTS
ON TWITTER
Rob Procter (University of Manchester)
Farida Vis (University of Leicester)
Alexander Voss (University of St Andrews)
[Funded by JISC]
#readingtheriots
What role did social media play?
2.6 million riot tweets (donated by Twitter)
–
700,000 individual accounts
Initially:
o Role of Rumours
o Did incitement take place? [no - #riotcleanup]
o What is the role of different actors on Twitter?
Role of Rumours
Guardian Interactive Team (Alastair Dant)
http://www.guardian.co.uk/uk/interactive/20
11/dec/07/london-riots-twitter
Data Journalism Award (sponsored by
Google)
• Lots of questions about methods
• Lots of questions about our tools
• Lots of questions about donated data
• Lots of questions about ethics
Actively engaged on Twitter
Actor Types – top 1000 mentions
Typical long tail distribution
Twitter researchers tend to focus on the head
Actor Types
Mainstream Media Police/emergency services
Only online media (news) Riot accounts
Non-(news) mainstream media Celebrities
Journalists (mainstream media) Researchers
Journalists (online media) Members of the public
Non-(news) media organisations Bots
Bloggers Unclear
Activists Account closed down
UK Twitterati Fake/spoof account
Political Actors Other
http://researchingsocialmedia.org/2012/01/24/reading-the-riots-on-twitter-who-tweeted-the-riots/
Who tweeted the riots? - categories
mainstream media
journalists
riot accounts
You know you’re dealing with Twitter data when…
Number 13, 6697 mentions
Number 20, 5939 mentions
Number 23, 5527 mentions
Context
Context
Context
Individual accounts with > 3K mentions
30031 mentions, 441 tweets sent over 4 days: top UK listed journalist (2)
3484 mentions, 290 tweets sent over 4 days: top non UK listed journalist
(34)
Image sharing practices during crises
400 million tweets/day (March 2013)
40 million Instagram images/day (January 2013)
Percentages posted to Twitter / Facebook
-> 59% posted to Twitter
-> 98% posted to Facebook
Where do images fit in the era of ‘Big Data’?
Big Data – text + number driven
Images: undervalued, underexplored
Not by the users
Deleted content
http://twitpic.com/62m6nx
#FakeSandy pics
250,000 tweets (4hrs)
1 weekend
http://istwitterwrong.tumblr.com/
Jean Burgess
Farida Vis
Axel Bruns
‘fakes’
http://www.guardian.co.uk
/news/datablog/2012/nov/
06/fake-sandy-pictures-
social-media
Twitter handles
MPSBarkDag
MPSBarnet
MPSBexley
MPSBrent
MPSBromley
MPSCamden
metpoliceuk
MPSWestminster
MPSCroydon
EalingMPS
MPSEnfield
MPSGreenwich
MPSHackney
MPSHammFul
MPSHaringey
MPSHarrow
MPSHavering
MPSHillingdon
MPSHounslow
MPSIslington
MPSKenChel
MPSKingston
LambethMPS
MPSLewisham
MPSMerton
MPSNewham
MPSRedbridge
MPSRichmond
MPSSouthwark
MPSSutton
MPSTowerHam
MPSWForest
MPSWandsworth
Plus:
@MetPoliceEvents (Updates from the Met Police
regarding demonstrations & events in London)
@MPSOnTheStreet (An official MPS account giving an
officer on the ground's view of events, operations and
other policing activities in London)
@MPSDoI (Updates from the Metropolitan Police
Service, Directorate of Information)
Police tweets
Collecting the data
Scraper by Jacopo
Ottaviani
URL for the scraper: https://scraperwiki.com/scrapers/police_and_the_olympics_2012/
ScraperWiki is a key DDJ
site
Datajournalismhandbook.org
Reference point 1
Data challenges
• Collecting Twitter data in (real) time (APIs)
• Methods for building a reliable corpus
• Problems with language bias
• Problems with hashtag/keyword bias
• API bias
• Demographics of Twitter users – who are they?
• Problems with escalating volume
• Mapping explosion of new tools: are they any good?
• Off the shelf tools (growing divide in research capacity in
this area)
• Limitations of the tools
• Problems with data sharing / replicating studies + findings
Data challenge 1: Know your API
See: https://dev.twitter.com/start
1% random sample of the firehose
If not rate limited – all data may be collected
FIREHOSE
Data challenge 2: API bias?
We collect and analyse messages exchanged in Twitter using two of
the platforms publicly available APIs (the search and stream
specifications). We assess the differences between the two samples,
and compare the networks of communication reconstructed from them.
The empirical context is given by political protests taking place in May
2012: we track online communication around these protests for the
period of one month, and reconstruct the network of mentions and re-
tweets according to the two samples. We find that the search API over-
represents the more central users and does not offer an accurate
picture of peripheral activity; we also find that the bias is greater for the
network of mentions. We discuss the implications of this bias for the
study of diffusion dynamics and collective action in the digital era, and
advocate the need for more uniform sampling procedures in the study
of online communication.
(González-Bailó n et al, 2012)
Data challenge 3: rate limiting + 1%
Random sampling with the streaming API: the 1%
‘If we estimate a daily tweet volume of 450 million tweets (Farber), this
would mean that, in terms of standard sampling theory, the 1%
endpoint would provide a representative and high resolution sample
with a maximum margin of error or 0.06 as a confidence level of 99%,
making the study of even relatively small subpopulations within that
sample a realistic option.’
(Gerlitz and Rieder, 2013)
Data challenge 4: relation to firehose?
‘The essential drawback of the Twitter API is the lack of documentation
concerning what and how much data users get. This leads researchers
to question whether the sampled data is a valid representation of the
overall activity on Twitter. In this work we embark on answering this
question by comparing data collected using Twitter’s sampled API
service with data collected using the full, albeit costly, Firehose stream
that includes every single published tweet.’
(Morstatter et al, 2013)
Data challenge 5: relation to ‘general public’?
Data challenge 6: what data to collect?
For hashtag datasets: contributions made by specific users and
groups of users; overall patterns of activity over time;
combinations to examine contributions by specific users and
groups over time. (Bruns and Stieglitz, 2013)
Data challenge 6: how to collect the data?
TWITTER TOOLS
Recent explosion in Twitter tools
• Twitonomy
• Scraperwiki
• TAGS
• DMI Twitter Capture and Analysis Toolset
• MozDeh (and Webometric Analyst)
• NViVO 10
• YourTwapperKeeper
Twitonomy (REST + search API)
Scraperwiki
#horsemeat still producing data in June!
Tweet mapping: geolocations
TAGS
Collects up to 8000 tweets based
on hashtags/keywords/users
DMI Twitter Capture and Analysis Toolset
DMI tools for extracting links (all the URLs)
Mostly URLS are shorted, mainly using t.co (Twitter). Unpack them using:
Didn’t always work, manual unpacking and note taking (plus you still
have the shortened URL in case you want to retrace it.
MOZDEH (and Webometric Analyst)
NViVO 10
YourTwapperKeeper
Data challenge 7: how to analyse the data?
What to do about all those bots?
For hashtag datasets: contributions made by specific users and
groups of users; overall patterns of activity over time;
combinations to examine contributions by specific users and
groups over time. (Bruns and Stieglitz, 2013)
Data collected + methods used
produce specific research object
Where do images fit in the era of ‘Big Data’?
Data challenge 8: representing your data?
Data visualisations: what are they and what do they want?
Data challenge 9: how to deal with ethics?
Data challenge 10: user requirements?
What do we want from these APIs, the data,
the tools, and Twitter researchers so that we
can develop more robust social scientific
research on Twitter?
@flygirltwo
References
• Bruns, A., and Stieglitz, S. 2013. Towards More Systematic Twitter Analysis: Metrics
for Tweeting Activities. International Journal of Social Research Methodology.
DOI:10.1080/13645579.2013.770300 Available from:
http://snurb.info/files/2013/Towards%20More%20Systematic%20Twitter%20Analysis
%20(final).pdf
• Gerlitz, C. & Rieder, B. 2013. Mining One Percent of Twitter: Collections, Baselines,
Sampling. M/C Journal, Vol. 16, No 2. Available from: http://journal.media-
culture.org.au/index.php/mcjournal/article/viewArticle/620
• González-Bailó n, S., Ning, W., Rivero, A., Borge-Holthoefer, J., & Moreno, Y. 2012.
Assessing the Bias in Communication Networks Samples from Twitter. Available
from: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2185134
• Morstatter, F., Pfeffer, J., Liu, H, & Carley, K.M. 2013. Is the Sample Good Enough?
Comparing Data from Twitter’s Streaming API with Twitter’s Firehose. Association for
the Advancement of Artificial Intelligence. Available from:
http://www.public.asu.edu/~fmorstat/paperpdfs/icwsm2013.pdf
• Vis, F. 2012 . Twitter as a reporting tool for breaking news: journalists tweeting the
2011 UK riots, Digital Journalism 1(1). Available from:
http://www.tandfonline.com/doi/full/10.1080/21670811.2012.741316#.UcwBZ-CPDao
• Vis, F., Faulkner, S., Parry, K., Manyukhina, Y., and Evans, L. (in press), Twitpic-ing
the riots: analysing images shared on Twitter during the 2011 UK riots, in Twitter and
Society, Weller, K., Bruns, A., Burgess, J.,Mahrt, M., and Puschmann, C. (eds.), New
York: Peter Lang.
Links to all mentioned tools
• Twitonomy - http://www.twitonomy.com/
• Scraperwiki - https://beta.scraperwiki.com/
• TAGS - http://mashe.hawksey.info/2013/02/twitter-archive-
tagsv5/
• DMI Twitter Capture and Analysis Toolset -
https://wiki.digitalmethods.net/Dmi/ToolDmiTcat
• MozDeh (and Webometric Analyst) -
http://mozdeh.wlv.ac.uk/ + http://lexiurl.wlv.ac.uk/
• NViVO 10 -
http://www.qsrinternational.com/products_nvivo.aspx
• YourTwapperKeeper -
https://github.com/540co/yourTwapperKeeper
See also:
http://mappingonlinepublics.net/tag/yourtwapperkeeper/

Weitere ähnliche Inhalte

Was ist angesagt?

Social listening: how to do it and how to use (SNA Perspective)
Social listening: how to do it and how to use (SNA Perspective)Social listening: how to do it and how to use (SNA Perspective)
Social listening: how to do it and how to use (SNA Perspective)Toronto Metropolitan University
 
CrowdTruth @VU Faculty Colloquium (June 2015)
CrowdTruth @VU Faculty Colloquium (June 2015)CrowdTruth @VU Faculty Colloquium (June 2015)
CrowdTruth @VU Faculty Colloquium (June 2015)Lora Aroyo
 
Altmetrics: Listening & Giving Voice to Ideas with Social Media Data
Altmetrics: Listening & Giving Voice to Ideas with Social Media DataAltmetrics: Listening & Giving Voice to Ideas with Social Media Data
Altmetrics: Listening & Giving Voice to Ideas with Social Media DataToronto Metropolitan University
 
Predicting what gets ‘Likes’ on Facebook: case study of BlogTO
Predicting what gets ‘Likes’ on Facebook:  case study of BlogTOPredicting what gets ‘Likes’ on Facebook:  case study of BlogTO
Predicting what gets ‘Likes’ on Facebook: case study of BlogTOToronto Metropolitan University
 
Workshop on Data Collection & Network Analysis with @Netlytic & the iGraph R ...
Workshop on Data Collection & Network Analysis with @Netlytic & the iGraph R ...Workshop on Data Collection & Network Analysis with @Netlytic & the iGraph R ...
Workshop on Data Collection & Network Analysis with @Netlytic & the iGraph R ...Toronto Metropolitan University
 
Social Media Data Collection & Network Analysis with Netlytic and R
Social Media Data Collection & Network Analysis with Netlytic and R Social Media Data Collection & Network Analysis with Netlytic and R
Social Media Data Collection & Network Analysis with Netlytic and R Toronto Metropolitan University
 
The Use of Social Media during the 2014 Crisis In Ukraine
The Use of Social Media during the 2014 Crisis In UkraineThe Use of Social Media during the 2014 Crisis In Ukraine
The Use of Social Media during the 2014 Crisis In Ukraine Toronto Metropolitan University
 
Social Web 2014: Final Presentations (Part I)
Social Web 2014: Final Presentations (Part I)Social Web 2014: Final Presentations (Part I)
Social Web 2014: Final Presentations (Part I)Lora Aroyo
 
Social Information & Browsing March 6
Social Information & Browsing   March 6Social Information & Browsing   March 6
Social Information & Browsing March 6sritikumar
 
You're Hired: Examining Acceptance of Social Media Screening of Job Applicants
You're Hired: Examining Acceptance of Social Media Screening of Job ApplicantsYou're Hired: Examining Acceptance of Social Media Screening of Job Applicants
You're Hired: Examining Acceptance of Social Media Screening of Job ApplicantsToronto Metropolitan University
 
User Behaviour Pattern Recognition On Twitter Social Network
User Behaviour Pattern Recognition On Twitter Social NetworkUser Behaviour Pattern Recognition On Twitter Social Network
User Behaviour Pattern Recognition On Twitter Social NetworkGeorge Konstantakopoulos
 
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013Digital Methods Initiative
 
Lecture 7: How to STUDY the Social Web? (2014)
Lecture 7: How to STUDY the Social Web? (2014)Lecture 7: How to STUDY the Social Web? (2014)
Lecture 7: How to STUDY the Social Web? (2014)Lora Aroyo
 
Rogers studyingpoliticalissues mar2014_optimized_ii_
Rogers studyingpoliticalissues mar2014_optimized_ii_Rogers studyingpoliticalissues mar2014_optimized_ii_
Rogers studyingpoliticalissues mar2014_optimized_ii_Digital Methods Initiative
 
Social Media in Australia: A ‘Big Data’ Perspective on Twitter
Social Media in Australia: A ‘Big Data’ Perspective on TwitterSocial Media in Australia: A ‘Big Data’ Perspective on Twitter
Social Media in Australia: A ‘Big Data’ Perspective on TwitterAxel Bruns
 

Was ist angesagt? (17)

Social listening: how to do it and how to use (SNA Perspective)
Social listening: how to do it and how to use (SNA Perspective)Social listening: how to do it and how to use (SNA Perspective)
Social listening: how to do it and how to use (SNA Perspective)
 
CrowdTruth @VU Faculty Colloquium (June 2015)
CrowdTruth @VU Faculty Colloquium (June 2015)CrowdTruth @VU Faculty Colloquium (June 2015)
CrowdTruth @VU Faculty Colloquium (June 2015)
 
Altmetrics: Listening & Giving Voice to Ideas with Social Media Data
Altmetrics: Listening & Giving Voice to Ideas with Social Media DataAltmetrics: Listening & Giving Voice to Ideas with Social Media Data
Altmetrics: Listening & Giving Voice to Ideas with Social Media Data
 
Predicting what gets ‘Likes’ on Facebook: case study of BlogTO
Predicting what gets ‘Likes’ on Facebook:  case study of BlogTOPredicting what gets ‘Likes’ on Facebook:  case study of BlogTO
Predicting what gets ‘Likes’ on Facebook: case study of BlogTO
 
Workshop on Data Collection & Network Analysis with @Netlytic & the iGraph R ...
Workshop on Data Collection & Network Analysis with @Netlytic & the iGraph R ...Workshop on Data Collection & Network Analysis with @Netlytic & the iGraph R ...
Workshop on Data Collection & Network Analysis with @Netlytic & the iGraph R ...
 
Social Media Data Collection & Network Analysis with Netlytic and R
Social Media Data Collection & Network Analysis with Netlytic and R Social Media Data Collection & Network Analysis with Netlytic and R
Social Media Data Collection & Network Analysis with Netlytic and R
 
The Use of Social Media during the 2014 Crisis In Ukraine
The Use of Social Media during the 2014 Crisis In UkraineThe Use of Social Media during the 2014 Crisis In Ukraine
The Use of Social Media during the 2014 Crisis In Ukraine
 
Who are We Studying: Humans or Bots?
Who are We Studying: Humans or Bots? Who are We Studying: Humans or Bots?
Who are We Studying: Humans or Bots?
 
Social Web 2014: Final Presentations (Part I)
Social Web 2014: Final Presentations (Part I)Social Web 2014: Final Presentations (Part I)
Social Web 2014: Final Presentations (Part I)
 
Social Information & Browsing March 6
Social Information & Browsing   March 6Social Information & Browsing   March 6
Social Information & Browsing March 6
 
You're Hired: Examining Acceptance of Social Media Screening of Job Applicants
You're Hired: Examining Acceptance of Social Media Screening of Job ApplicantsYou're Hired: Examining Acceptance of Social Media Screening of Job Applicants
You're Hired: Examining Acceptance of Social Media Screening of Job Applicants
 
User Behaviour Pattern Recognition On Twitter Social Network
User Behaviour Pattern Recognition On Twitter Social NetworkUser Behaviour Pattern Recognition On Twitter Social Network
User Behaviour Pattern Recognition On Twitter Social Network
 
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
 
Rogers data days_2014_slides_opti
Rogers data days_2014_slides_optiRogers data days_2014_slides_opti
Rogers data days_2014_slides_opti
 
Lecture 7: How to STUDY the Social Web? (2014)
Lecture 7: How to STUDY the Social Web? (2014)Lecture 7: How to STUDY the Social Web? (2014)
Lecture 7: How to STUDY the Social Web? (2014)
 
Rogers studyingpoliticalissues mar2014_optimized_ii_
Rogers studyingpoliticalissues mar2014_optimized_ii_Rogers studyingpoliticalissues mar2014_optimized_ii_
Rogers studyingpoliticalissues mar2014_optimized_ii_
 
Social Media in Australia: A ‘Big Data’ Perspective on Twitter
Social Media in Australia: A ‘Big Data’ Perspective on TwitterSocial Media in Australia: A ‘Big Data’ Perspective on Twitter
Social Media in Australia: A ‘Big Data’ Perspective on Twitter
 

Ähnlich wie Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

Challenges in-archiving-twitter
Challenges in-archiving-twitterChallenges in-archiving-twitter
Challenges in-archiving-twitterKatrin Weller
 
Geo-information and Twitter Use
Geo-information and Twitter UseGeo-information and Twitter Use
Geo-information and Twitter UseHan Woo PARK
 
Fusing text and image for event
Fusing text and image for eventFusing text and image for event
Fusing text and image for eventijma
 
Weller social media as research data_psm15
Weller social media as research data_psm15Weller social media as research data_psm15
Weller social media as research data_psm15Katrin Weller
 
Event detection in twitter using text and image fusion
Event detection in twitter using text and image fusionEvent detection in twitter using text and image fusion
Event detection in twitter using text and image fusioncsandit
 
Analyzing-Threat-Levels-of-Extremists-using-Tweets
Analyzing-Threat-Levels-of-Extremists-using-TweetsAnalyzing-Threat-Levels-of-Extremists-using-Tweets
Analyzing-Threat-Levels-of-Extremists-using-TweetsRESHAN FARAZ
 
Meyer Big Data SDP13
Meyer Big Data SDP13Meyer Big Data SDP13
Meyer Big Data SDP13Eric Meyer
 
Improving the Coverage of Complex Issues with Data Journalism and Digital Met...
Improving the Coverage of Complex Issues with Data Journalism and Digital Met...Improving the Coverage of Complex Issues with Data Journalism and Digital Met...
Improving the Coverage of Complex Issues with Data Journalism and Digital Met...Liliana Bounegru
 
Social Media @Home and @Work: Understanding Who Is Using and Why
Social Media @Home and @Work:Understanding Who Is Using and WhySocial Media @Home and @Work:Understanding Who Is Using and Why
Social Media @Home and @Work: Understanding Who Is Using and WhyCaroline Dangson
 
Information Contagion through Social Media: Towards a Realistic Model of the ...
Information Contagion through Social Media: Towards a Realistic Model of the ...Information Contagion through Social Media: Towards a Realistic Model of the ...
Information Contagion through Social Media: Towards a Realistic Model of the ...Axel Bruns
 
IRJET- An Experimental Evaluation of Mechanical Properties of Bamboo Fiber Re...
IRJET- An Experimental Evaluation of Mechanical Properties of Bamboo Fiber Re...IRJET- An Experimental Evaluation of Mechanical Properties of Bamboo Fiber Re...
IRJET- An Experimental Evaluation of Mechanical Properties of Bamboo Fiber Re...IRJET Journal
 
IRJET- Tweet Segmentation and its Application to Named Entity Recognition
IRJET- Tweet Segmentation and its Application to Named Entity RecognitionIRJET- Tweet Segmentation and its Application to Named Entity Recognition
IRJET- Tweet Segmentation and its Application to Named Entity RecognitionIRJET Journal
 
The Networked Creativity in the Censored Web 2.0
The Networked Creativity in the Censored Web 2.0The Networked Creativity in the Censored Web 2.0
The Networked Creativity in the Censored Web 2.0Weiai Wayne Xu
 
Twitter Based Election Prediction and Analysis
Twitter Based Election Prediction and AnalysisTwitter Based Election Prediction and Analysis
Twitter Based Election Prediction and AnalysisIRJET Journal
 
Twitter Intelligent Sensor Agent
Twitter Intelligent Sensor AgentTwitter Intelligent Sensor Agent
Twitter Intelligent Sensor AgentIoannis Katakis
 
RUNNING HEAD BIG DATA IN SOCIAL MEDIA .docx
RUNNING HEAD BIG DATA IN SOCIAL MEDIA                            .docxRUNNING HEAD BIG DATA IN SOCIAL MEDIA                            .docx
RUNNING HEAD BIG DATA IN SOCIAL MEDIA .docxcheryllwashburn
 
What's up at Kno.e.sis?
What's up at Kno.e.sis? What's up at Kno.e.sis?
What's up at Kno.e.sis? Amit Sheth
 
Eavesdropping on the Twitter Microblogging Site
Eavesdropping on the Twitter Microblogging SiteEavesdropping on the Twitter Microblogging Site
Eavesdropping on the Twitter Microblogging SiteShalin Hai-Jew
 

Ähnlich wie Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements (20)

Challenges in-archiving-twitter
Challenges in-archiving-twitterChallenges in-archiving-twitter
Challenges in-archiving-twitter
 
Geo-information and Twitter Use
Geo-information and Twitter UseGeo-information and Twitter Use
Geo-information and Twitter Use
 
Fusing text and image for event
Fusing text and image for eventFusing text and image for event
Fusing text and image for event
 
Weller social media as research data_psm15
Weller social media as research data_psm15Weller social media as research data_psm15
Weller social media as research data_psm15
 
Event detection in twitter using text and image fusion
Event detection in twitter using text and image fusionEvent detection in twitter using text and image fusion
Event detection in twitter using text and image fusion
 
Analyzing-Threat-Levels-of-Extremists-using-Tweets
Analyzing-Threat-Levels-of-Extremists-using-TweetsAnalyzing-Threat-Levels-of-Extremists-using-Tweets
Analyzing-Threat-Levels-of-Extremists-using-Tweets
 
Meyer Big Data SDP13
Meyer Big Data SDP13Meyer Big Data SDP13
Meyer Big Data SDP13
 
Improving the Coverage of Complex Issues with Data Journalism and Digital Met...
Improving the Coverage of Complex Issues with Data Journalism and Digital Met...Improving the Coverage of Complex Issues with Data Journalism and Digital Met...
Improving the Coverage of Complex Issues with Data Journalism and Digital Met...
 
Social Media @Home and @Work: Understanding Who Is Using and Why
Social Media @Home and @Work:Understanding Who Is Using and WhySocial Media @Home and @Work:Understanding Who Is Using and Why
Social Media @Home and @Work: Understanding Who Is Using and Why
 
Information Contagion through Social Media: Towards a Realistic Model of the ...
Information Contagion through Social Media: Towards a Realistic Model of the ...Information Contagion through Social Media: Towards a Realistic Model of the ...
Information Contagion through Social Media: Towards a Realistic Model of the ...
 
IRJET- An Experimental Evaluation of Mechanical Properties of Bamboo Fiber Re...
IRJET- An Experimental Evaluation of Mechanical Properties of Bamboo Fiber Re...IRJET- An Experimental Evaluation of Mechanical Properties of Bamboo Fiber Re...
IRJET- An Experimental Evaluation of Mechanical Properties of Bamboo Fiber Re...
 
IRJET- Tweet Segmentation and its Application to Named Entity Recognition
IRJET- Tweet Segmentation and its Application to Named Entity RecognitionIRJET- Tweet Segmentation and its Application to Named Entity Recognition
IRJET- Tweet Segmentation and its Application to Named Entity Recognition
 
The Networked Creativity in the Censored Web 2.0
The Networked Creativity in the Censored Web 2.0The Networked Creativity in the Censored Web 2.0
The Networked Creativity in the Censored Web 2.0
 
Bigdata
BigdataBigdata
Bigdata
 
Twitter Based Election Prediction and Analysis
Twitter Based Election Prediction and AnalysisTwitter Based Election Prediction and Analysis
Twitter Based Election Prediction and Analysis
 
Twitter Intelligent Sensor Agent
Twitter Intelligent Sensor AgentTwitter Intelligent Sensor Agent
Twitter Intelligent Sensor Agent
 
RUNNING HEAD BIG DATA IN SOCIAL MEDIA .docx
RUNNING HEAD BIG DATA IN SOCIAL MEDIA                            .docxRUNNING HEAD BIG DATA IN SOCIAL MEDIA                            .docx
RUNNING HEAD BIG DATA IN SOCIAL MEDIA .docx
 
s00146-014-0549-4.pdf
s00146-014-0549-4.pdfs00146-014-0549-4.pdf
s00146-014-0549-4.pdf
 
What's up at Kno.e.sis?
What's up at Kno.e.sis? What's up at Kno.e.sis?
What's up at Kno.e.sis?
 
Eavesdropping on the Twitter Microblogging Site
Eavesdropping on the Twitter Microblogging SiteEavesdropping on the Twitter Microblogging Site
Eavesdropping on the Twitter Microblogging Site
 

Mehr von Farida Vis

Everyday Growing and Digging Cultures
Everyday Growing and Digging CulturesEveryday Growing and Digging Cultures
Everyday Growing and Digging CulturesFarida Vis
 
Where do images fit in the era of ‘Big Data’?
Where do images fit in the era of ‘Big Data’?Where do images fit in the era of ‘Big Data’?
Where do images fit in the era of ‘Big Data’?Farida Vis
 
How not to measure Twitter Influence
How not to measure Twitter InfluenceHow not to measure Twitter Influence
How not to measure Twitter InfluenceFarida Vis
 
What do Data Visualisations want?
What do Data Visualisations want?What do Data Visualisations want?
What do Data Visualisations want?Farida Vis
 
London Police Tweets
London Police Tweets London Police Tweets
London Police Tweets Farida Vis
 
Allotment (publics): an open data and data driven journalism perspective
Allotment (publics): an open data and data driven journalism perspective Allotment (publics): an open data and data driven journalism perspective
Allotment (publics): an open data and data driven journalism perspective Farida Vis
 
FutureEverything 2012: Social Media, Social Change
FutureEverything 2012: Social Media, Social Change FutureEverything 2012: Social Media, Social Change
FutureEverything 2012: Social Media, Social Change Farida Vis
 
Reading The Riots on Twitter at LIFT12
Reading The Riots on Twitter at LIFT12Reading The Riots on Twitter at LIFT12
Reading The Riots on Twitter at LIFT12Farida Vis
 
Growing Back to the Future
Growing Back to the Future Growing Back to the Future
Growing Back to the Future Farida Vis
 
Future of allotments in the UK: Manchester City Camp
Future of allotments in the UK: Manchester City CampFuture of allotments in the UK: Manchester City Camp
Future of allotments in the UK: Manchester City CampFarida Vis
 
Digging into citizenship: land, waiting lists, broken promises and creating c...
Digging into citizenship: land, waiting lists, broken promises and creating c...Digging into citizenship: land, waiting lists, broken promises and creating c...
Digging into citizenship: land, waiting lists, broken promises and creating c...Farida Vis
 
Amazon.com and the outbreak narrative
Amazon.com and the outbreak narrativeAmazon.com and the outbreak narrative
Amazon.com and the outbreak narrativeFarida Vis
 

Mehr von Farida Vis (13)

Vacantacres
VacantacresVacantacres
Vacantacres
 
Everyday Growing and Digging Cultures
Everyday Growing and Digging CulturesEveryday Growing and Digging Cultures
Everyday Growing and Digging Cultures
 
Where do images fit in the era of ‘Big Data’?
Where do images fit in the era of ‘Big Data’?Where do images fit in the era of ‘Big Data’?
Where do images fit in the era of ‘Big Data’?
 
How not to measure Twitter Influence
How not to measure Twitter InfluenceHow not to measure Twitter Influence
How not to measure Twitter Influence
 
What do Data Visualisations want?
What do Data Visualisations want?What do Data Visualisations want?
What do Data Visualisations want?
 
London Police Tweets
London Police Tweets London Police Tweets
London Police Tweets
 
Allotment (publics): an open data and data driven journalism perspective
Allotment (publics): an open data and data driven journalism perspective Allotment (publics): an open data and data driven journalism perspective
Allotment (publics): an open data and data driven journalism perspective
 
FutureEverything 2012: Social Media, Social Change
FutureEverything 2012: Social Media, Social Change FutureEverything 2012: Social Media, Social Change
FutureEverything 2012: Social Media, Social Change
 
Reading The Riots on Twitter at LIFT12
Reading The Riots on Twitter at LIFT12Reading The Riots on Twitter at LIFT12
Reading The Riots on Twitter at LIFT12
 
Growing Back to the Future
Growing Back to the Future Growing Back to the Future
Growing Back to the Future
 
Future of allotments in the UK: Manchester City Camp
Future of allotments in the UK: Manchester City CampFuture of allotments in the UK: Manchester City Camp
Future of allotments in the UK: Manchester City Camp
 
Digging into citizenship: land, waiting lists, broken promises and creating c...
Digging into citizenship: land, waiting lists, broken promises and creating c...Digging into citizenship: land, waiting lists, broken promises and creating c...
Digging into citizenship: land, waiting lists, broken promises and creating c...
 
Amazon.com and the outbreak narrative
Amazon.com and the outbreak narrativeAmazon.com and the outbreak narrative
Amazon.com and the outbreak narrative
 

Kürzlich hochgeladen

[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 

Kürzlich hochgeladen (20)

[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 

Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

  • 1. Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements Farida Vis, Information School University of Sheffield @flygirltwo Keynote SRA Social Media in Social Research conference, London, 24 June 2013.
  • 2. READING THE RIOTS ON TWITTER Rob Procter (University of Manchester) Farida Vis (University of Leicester) Alexander Voss (University of St Andrews) [Funded by JISC] #readingtheriots
  • 3. What role did social media play? 2.6 million riot tweets (donated by Twitter) – 700,000 individual accounts Initially: o Role of Rumours o Did incitement take place? [no - #riotcleanup] o What is the role of different actors on Twitter?
  • 5. Guardian Interactive Team (Alastair Dant) http://www.guardian.co.uk/uk/interactive/20 11/dec/07/london-riots-twitter Data Journalism Award (sponsored by Google)
  • 6.
  • 7.
  • 8.
  • 9. • Lots of questions about methods • Lots of questions about our tools • Lots of questions about donated data • Lots of questions about ethics
  • 11. Actor Types – top 1000 mentions Typical long tail distribution Twitter researchers tend to focus on the head
  • 12. Actor Types Mainstream Media Police/emergency services Only online media (news) Riot accounts Non-(news) mainstream media Celebrities Journalists (mainstream media) Researchers Journalists (online media) Members of the public Non-(news) media organisations Bots Bloggers Unclear Activists Account closed down UK Twitterati Fake/spoof account Political Actors Other http://researchingsocialmedia.org/2012/01/24/reading-the-riots-on-twitter-who-tweeted-the-riots/
  • 13. Who tweeted the riots? - categories mainstream media journalists riot accounts
  • 14. You know you’re dealing with Twitter data when… Number 13, 6697 mentions Number 20, 5939 mentions Number 23, 5527 mentions
  • 16. Individual accounts with > 3K mentions
  • 17. 30031 mentions, 441 tweets sent over 4 days: top UK listed journalist (2) 3484 mentions, 290 tweets sent over 4 days: top non UK listed journalist (34)
  • 18. Image sharing practices during crises
  • 19. 400 million tweets/day (March 2013) 40 million Instagram images/day (January 2013) Percentages posted to Twitter / Facebook -> 59% posted to Twitter -> 98% posted to Facebook
  • 20. Where do images fit in the era of ‘Big Data’?
  • 21. Big Data – text + number driven Images: undervalued, underexplored Not by the users
  • 22.
  • 23.
  • 24.
  • 26. #FakeSandy pics 250,000 tweets (4hrs) 1 weekend http://istwitterwrong.tumblr.com/ Jean Burgess Farida Vis Axel Bruns
  • 27.
  • 29. Twitter handles MPSBarkDag MPSBarnet MPSBexley MPSBrent MPSBromley MPSCamden metpoliceuk MPSWestminster MPSCroydon EalingMPS MPSEnfield MPSGreenwich MPSHackney MPSHammFul MPSHaringey MPSHarrow MPSHavering MPSHillingdon MPSHounslow MPSIslington MPSKenChel MPSKingston LambethMPS MPSLewisham MPSMerton MPSNewham MPSRedbridge MPSRichmond MPSSouthwark MPSSutton MPSTowerHam MPSWForest MPSWandsworth Plus: @MetPoliceEvents (Updates from the Met Police regarding demonstrations & events in London) @MPSOnTheStreet (An official MPS account giving an officer on the ground's view of events, operations and other policing activities in London) @MPSDoI (Updates from the Metropolitan Police Service, Directorate of Information) Police tweets
  • 30. Collecting the data Scraper by Jacopo Ottaviani URL for the scraper: https://scraperwiki.com/scrapers/police_and_the_olympics_2012/ ScraperWiki is a key DDJ site
  • 32. Data challenges • Collecting Twitter data in (real) time (APIs) • Methods for building a reliable corpus • Problems with language bias • Problems with hashtag/keyword bias • API bias • Demographics of Twitter users – who are they? • Problems with escalating volume • Mapping explosion of new tools: are they any good? • Off the shelf tools (growing divide in research capacity in this area) • Limitations of the tools • Problems with data sharing / replicating studies + findings
  • 33. Data challenge 1: Know your API
  • 34.
  • 35.
  • 37. 1% random sample of the firehose If not rate limited – all data may be collected
  • 39. Data challenge 2: API bias?
  • 40. We collect and analyse messages exchanged in Twitter using two of the platforms publicly available APIs (the search and stream specifications). We assess the differences between the two samples, and compare the networks of communication reconstructed from them. The empirical context is given by political protests taking place in May 2012: we track online communication around these protests for the period of one month, and reconstruct the network of mentions and re- tweets according to the two samples. We find that the search API over- represents the more central users and does not offer an accurate picture of peripheral activity; we also find that the bias is greater for the network of mentions. We discuss the implications of this bias for the study of diffusion dynamics and collective action in the digital era, and advocate the need for more uniform sampling procedures in the study of online communication. (González-Bailó n et al, 2012)
  • 41. Data challenge 3: rate limiting + 1%
  • 42. Random sampling with the streaming API: the 1% ‘If we estimate a daily tweet volume of 450 million tweets (Farber), this would mean that, in terms of standard sampling theory, the 1% endpoint would provide a representative and high resolution sample with a maximum margin of error or 0.06 as a confidence level of 99%, making the study of even relatively small subpopulations within that sample a realistic option.’ (Gerlitz and Rieder, 2013)
  • 43. Data challenge 4: relation to firehose?
  • 44. ‘The essential drawback of the Twitter API is the lack of documentation concerning what and how much data users get. This leads researchers to question whether the sampled data is a valid representation of the overall activity on Twitter. In this work we embark on answering this question by comparing data collected using Twitter’s sampled API service with data collected using the full, albeit costly, Firehose stream that includes every single published tweet.’ (Morstatter et al, 2013)
  • 45. Data challenge 5: relation to ‘general public’?
  • 46. Data challenge 6: what data to collect?
  • 47. For hashtag datasets: contributions made by specific users and groups of users; overall patterns of activity over time; combinations to examine contributions by specific users and groups over time. (Bruns and Stieglitz, 2013)
  • 48. Data challenge 6: how to collect the data?
  • 50. Recent explosion in Twitter tools • Twitonomy • Scraperwiki • TAGS • DMI Twitter Capture and Analysis Toolset • MozDeh (and Webometric Analyst) • NViVO 10 • YourTwapperKeeper
  • 51. Twitonomy (REST + search API)
  • 52.
  • 53.
  • 54.
  • 55.
  • 56.
  • 57.
  • 58.
  • 59.
  • 60.
  • 61.
  • 62.
  • 64.
  • 65.
  • 66.
  • 67.
  • 68.
  • 69.
  • 70.
  • 71.
  • 72.
  • 73. #horsemeat still producing data in June!
  • 75. TAGS
  • 76.
  • 77. Collects up to 8000 tweets based on hashtags/keywords/users
  • 78.
  • 79.
  • 80.
  • 81.
  • 82.
  • 83. DMI Twitter Capture and Analysis Toolset
  • 84. DMI tools for extracting links (all the URLs) Mostly URLS are shorted, mainly using t.co (Twitter). Unpack them using: Didn’t always work, manual unpacking and note taking (plus you still have the shortened URL in case you want to retrace it.
  • 86.
  • 87.
  • 88.
  • 89.
  • 90.
  • 93. Data challenge 7: how to analyse the data?
  • 94. What to do about all those bots?
  • 95. For hashtag datasets: contributions made by specific users and groups of users; overall patterns of activity over time; combinations to examine contributions by specific users and groups over time. (Bruns and Stieglitz, 2013)
  • 96. Data collected + methods used produce specific research object
  • 97. Where do images fit in the era of ‘Big Data’?
  • 98. Data challenge 8: representing your data?
  • 99. Data visualisations: what are they and what do they want?
  • 100. Data challenge 9: how to deal with ethics?
  • 101. Data challenge 10: user requirements?
  • 102. What do we want from these APIs, the data, the tools, and Twitter researchers so that we can develop more robust social scientific research on Twitter?
  • 104. References • Bruns, A., and Stieglitz, S. 2013. Towards More Systematic Twitter Analysis: Metrics for Tweeting Activities. International Journal of Social Research Methodology. DOI:10.1080/13645579.2013.770300 Available from: http://snurb.info/files/2013/Towards%20More%20Systematic%20Twitter%20Analysis %20(final).pdf • Gerlitz, C. & Rieder, B. 2013. Mining One Percent of Twitter: Collections, Baselines, Sampling. M/C Journal, Vol. 16, No 2. Available from: http://journal.media- culture.org.au/index.php/mcjournal/article/viewArticle/620 • González-Bailó n, S., Ning, W., Rivero, A., Borge-Holthoefer, J., & Moreno, Y. 2012. Assessing the Bias in Communication Networks Samples from Twitter. Available from: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2185134 • Morstatter, F., Pfeffer, J., Liu, H, & Carley, K.M. 2013. Is the Sample Good Enough? Comparing Data from Twitter’s Streaming API with Twitter’s Firehose. Association for the Advancement of Artificial Intelligence. Available from: http://www.public.asu.edu/~fmorstat/paperpdfs/icwsm2013.pdf • Vis, F. 2012 . Twitter as a reporting tool for breaking news: journalists tweeting the 2011 UK riots, Digital Journalism 1(1). Available from: http://www.tandfonline.com/doi/full/10.1080/21670811.2012.741316#.UcwBZ-CPDao • Vis, F., Faulkner, S., Parry, K., Manyukhina, Y., and Evans, L. (in press), Twitpic-ing the riots: analysing images shared on Twitter during the 2011 UK riots, in Twitter and Society, Weller, K., Bruns, A., Burgess, J.,Mahrt, M., and Puschmann, C. (eds.), New York: Peter Lang.
  • 105. Links to all mentioned tools • Twitonomy - http://www.twitonomy.com/ • Scraperwiki - https://beta.scraperwiki.com/ • TAGS - http://mashe.hawksey.info/2013/02/twitter-archive- tagsv5/ • DMI Twitter Capture and Analysis Toolset - https://wiki.digitalmethods.net/Dmi/ToolDmiTcat • MozDeh (and Webometric Analyst) - http://mozdeh.wlv.ac.uk/ + http://lexiurl.wlv.ac.uk/ • NViVO 10 - http://www.qsrinternational.com/products_nvivo.aspx • YourTwapperKeeper - https://github.com/540co/yourTwapperKeeper See also: http://mappingonlinepublics.net/tag/yourtwapperkeeper/