Introduction for the second workshop "#FAIL! Things that didn't work out in social media research - and what we can learn from them". Workshop at #ir16 conference, Phoenix, October 21st, 2015
See https://failworkshops.wordpress.com
1. #FAIL!
THINGS THAT DIDN‘T WORK OUT IN SOCIAL
MEDIA RESEARCH
- AND WHAT WE CAN LEARN FROM THEM
Workshop at Internet Research 16, Phoenix, October 21st, 2015.
3. ABOUT #FAIL! WORKSHOPS
• Traveling on to different conferences. First
workshop was at WebSci15 (June 2015)
• Aim: collect various examples for things that
can go wrong and share them with different
communities
learn from experiences
Connect different research communities
5. 0
100
200
300
400
500
600
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
Twitter
Facebook
YouTube
Blogs
Wikis
Foursquare
LinkedIn
MySpace
Number of publications per year, which mention the respective social media platform‘s name in their title. Scopus
Title Search. For details: http://kwelle.wordpress.com/2014/04/07/bibliometric-analysis-of-social-media-research/
SOCIAL MEDIA RESEARCH
6. 2008-2013 papers on Twitter and elections: data sources
Weller, K. (2014). Twitter und Wahlen: Zwischen 140 Zeichen und Milliarden von Tweets. In: R. Reichert (Ed.), Big
Data: Analysen zum digitalen Wandel von Wissen, Macht und Ökonomie (pp. 239-257). Bielefeld: transcript.
6
Data source number
No information 11
Collected manually from Twitter website (Copy-Paste /
Screenshot)
6
Twitter API (no further information) 8
Twitter Search API 3
Twitter Streaming API 1
Twitter Rest API 1
Twitter API user timeline 1
Own program for accessing Twitter APIs 4
Twitter Gardenhose 1
Official Reseller (Gnip, DataSift) 3
YourTwapperKeeper 3
Other tools (e.g. Topsy) 6
Received from colleagues 1
SOCIAL MEDIA RESEARCH
8. Challenge 1: users
• How to involve social media users in the
research process?
• Presentation by Elodie Crespel: “Extending
data collection with web browser extension”
– Participants may be creative in their use of
technology – flexibility is needed.
9. Challenge 2: methods
• Data analytics: which approaches should be
chosen?
• Taha Yasseri: “The double-edged sword of
statistical significance”
– Questions the p-value as a standard for data
analytics.
– “too much of attention and reliance on specific
measures or methods without being aware of the
logic behind them, can be misleading”
10. Challenge 3: tools
• Many researchers use third party tools for
data collection or analysis – which may not
always work as expected.
• Presentation by Michael Bossetta and
Anamaria Dutceac Segesten: “Tracing
Eurosceptic Party Networks via Hyperlink
Network Analysis and #FAIL!ng: Can Web
Crawlers Keep up with Web Design?”
– Exemplary case: issuecrawler.
11. Challenge 4: content
• Content analysis is heavily effected by the
dynamic nature of social media.
• Presentation by Marie Van Cranenbroeck:
“Managing and Using Unstable Data in a Social
Science Research about Museums and
Audiences on Social Media”
– Data collection and storage challenges
12. Specific details and additions
• Researchers and users may have different ideas
about the definition of social media / social
networks
• Lack of evaluation standards
• Availability of data (also: not enough data)
• Data may be corrupt (e.g. missing data)
• Social media as a moving target (Karpf, D. (2012).
Social science research methods in Internet time.
Information, Communication & Society
15(5):639-661. )
13. Meta discussion
• Social media research can have various forms.
Different disciplines involved.
• Best practices and pitfalls in social media
research are mainly discussed informally. Few
possibilities to share unsuccessful approaches.
14. WHAT WE‘D LIKE TO LEARN TODAY
• Towards a categorization of challenges for
social media research: what can go wrong?
• Collection of more experiences
• Structuring them into different categories
15. WHAT WE‘D LIKE TO LEARN TODAY
Today:
- 4 presentations
- Think about your own experiences!
- … in connection to each presentation
- … in general
16. 9:00 Introduction: “What we’ve learnt from the first workshop and
what we’d like to learn today”
9:15 Shawn Walker: “Complexity of collecting social media data
in ephemeral contexts”
9:40 Cornelius Puschmann: “Why LIWC sucks (or: saner options
for social media content analysis)”
10:05 Break
10:20 Luca Rossi: “The fourth deadly sin of social media researchers
(or: scientific research and unstable socio-technical platforms)”
10:45 Marco Toledo Bastos, “Individual Behavior from Aggregate
Social Media Data“
11:10 Discussion & Conclusions
12:00 End
PROGRAM
17. • Other experiences? Share your thoughts!
• Main categories of #fail cases?
• Top 3 take away messages for next workshop?
DISCUSSION
18. WHERE TO GO FROM HERE?
• Next steps – lessons learnt for future
workshop organisation
• Which additional conferences?
• Publication? Guidebook?
19. • Archiving:
– URLs may vanish (Question: linear rate of decay?)
– Images missing
– Platforms changing (moving target!) – not just about the interface!
• Visualization of results
– Word cloud (compare histograms)
• Tools
– sentiment140, Internet Archive, GNIP
• Methods
– Content Analysis:
• replicability? Validation?
• Context for social media contents (e.g. surrounding tweets).
• LIWC, General Inquirer
– Predictions
– „Data Science“
• Lack of theory
• Data Quality:
– Can we still cite/use data and research published in 2007/2008`?
– Baseline? (how to define for a moving target)
• Theory:
– Can we only do descriptive work for single platforms?
– Look for the theory instead for the data?
• Meta
– Systematic review of existing literature is needed
• Documentation
– Timeframe generalizaion
– Document time, cultures?
– How long will my results be valid?
– Have a general base for comparison
Hinweis der Redaktion
Social media platforms and users – a moving target…
Best practices and pitfalls in social media research are mainly discussed informally. Few possibilities to share unsuccessful approaches.
Researchers with lots of different disciplinary backgrounds enter the field. Different fields of expertise, few interdisciplinary exchange of approaches.
Limited possibilities for data sharing / validation and reproduction of results.