Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Chances and Challenges of Studying Social Media Data

1.143 Aufrufe

Veröffentlicht am

Presentation at University of Maryland, Maryland Institute for Technology in the Humanities, 30.04.2015.

Veröffentlicht in: Soziale Medien
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Chances and Challenges of Studying Social Media Data

  1. 1. Chances and Challenges of Studying Social Media Data Dr. Katrin Weller GESIS – Leibniz-Institute for the Social Sciences Data Archive for the Social Sciences / Computational Social Science Cologne, Germany ● Digital Studies Fellow at John W. Kluge Center Library of Congress Washington D.C. E-Mail: katrin.weller@gesis.org ●Twitter: @kwelle ● Web: www.katrinweller.net
  2. 2. My Background • PhD in Information Science (until 2012 University of Düsseldorf) • Interests: Web Science, Social Media (focus on Twitter), Knoweledge representation + Semantic Web, informetrics + altmetrics, scholarly communication • Since 2013: GESIS, Social Web Data: New data types for social science research; research methods and data archiving. • Jan-May 2015: Digital Studies Fellowship at the Library of Congress 2
  3. 3. Recent and Current Work • Co-editor of „Twitter & Society“ (Peter Lang, 2014). • With Katharina Kinder-Kurlanda: „The hidden data of social media research“ • #FAIL! Things that didn‘t work out in social media research – and what we can learn from them (#fail2015a at Web Science Conference, Oxford, #fail2015b at Internet Research 16, Phoenix). https://failworkshops.wordpress.com • Pilotproject for archiving social media datasets in an election study. (http://arxiv.org/abs/1312.4476v2) 3
  4. 4. Social media research – some insights from bibliometrics
  5. 5. 5 What is social media research?
  6. 6. 457,000 4,991 19,440
  7. 7. Social media research 2000-2013 0 1000 2000 3000 4000 5000 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 No. of publications (Scopus) Scopus search, conducted March 2014: (TITLE-ABS-KEY("social media") OR TITLE-ABS-KEY("social web") OR TITLE-ABS-KEY("social software") OR TITLE-ABS-KEY("web 2.0")) AND PUBYEAR > 1999
  8. 8. Scopus: 2000-2013 by country 0 1000 2000 3000 4000 5000 6000 7000 United States United Kingdom Germany Australia China Spain Canada Italy France Taiwan Netherlands South Korea Finland Austria Japan Greece India Singapore Switzerland Hong Kong Ireland
  9. 9. Scopus: 2000-2013 by subject area 10650; 36% 5542; 19% 2384 2288 2151 1535 773 772 65 Computer Science Social Sciences Engineering Medicine Business, Management and Accounting Mathematics Arts and Humanities Decision Sciences Psychology Nursing Economics, Econometrics and Finance Biochemistry, Genetics and Molecular Biology Health Professions Environmental Science Earth and Planetary Sciences Agricultural and Biological Sciences Pharmacology, Toxicology and Pharmaceutics Physics and Astronomy Materials Science Multidisciplinary Neuroscience Immunology and Microbiology Chemical Engineering Veterinary Dentistry Chemistry Energy
  10. 10. Twitter research by discipline 10
  11. 11. Challenge • Interdisciplinarity! • „Social media research“ is not a coherent research field. • Influences from lots of different disciplines. • Some disciplines still isolated, not all equally advanced in technical tasks. • Challenge of keeping track of what is going on – across disciplines. 11
  12. 12. Example: Twitter research in social sciences 12 Weller, K. (2014). What do we get from Twitter – and What Not? A Close Look at Twitter Research in the Social Sciences. Knowledge Organization 41(3), 238-248.
  13. 13. Challenge vs. Chance • Lots of room for exploration and innovation • Few or no standards 13
  14. 14. Comparability • Data? • Collection Period? • Method? • Tools? 14
  15. 15. 15 Different methods even in social science Twitter research Weller, K. (2014). What do we get from Twitter – and what not? A close look at Twitter research in the social sciences. Knowledge Organization. 41(3), 238-248
  16. 16. Example 0 10 20 30 40 50 60 2008 2009 2010 2011 2012 2013 Publications on „Twitter and elections“ (Scopus and Web of Science) Weller, K. (2014). Twitter und Wahlen: Zwischen 140 Zeichen und Milliarden von Tweets. In: R. Reichert (Ed.), Big Data: Analysen zum digitalen Wandel von Wissen, Macht und Ökonomie (pp. 239-257). Bielefeld: transcript. 16
  17. 17. Year of election Name of election Country/region No. of papers (2013) Date of election 2008 40th Canadian General Election Canada 1 14.10.200 8 2009 European Parliament election, 2009 Europe 1 07.06.200 9 2009 German federal election, 2009 Germany 2 27.09.200 9 2010 2010 UK general election United Kingdom 4 06.05.201 0 2010 South Korean local elections, 2010 South Korea 1 02.06.201 0 2010 Dutch general election, 2010 Netherlands 2 09.06.201 0 2010 Australian federal election, 2010 Australia 1 21.08.201 0 2010 Swedish general election, 2010 Sweden 1 19.09.201 0 2010 Midterm elections / United States House of Representatives elections, 2010 USA 4 02.11.201 0 2010 Gubernational elections: Georgia USA 1 02.11.201 0 2010 Gubernational elections: Ohio USA 1 02.11.201 0 2010 Gubernational elections: Rhode Island USA 1 02.11.201 0 2010 Gubernational elections: Vermont USA 1 02.11.201 0 2010 2010 superintendent elections South Korea 1 17.12.201 0 2011 Baden-Württemberg state election, 2011 Germany 1 27.03.201 1 2011 Rhineland-Palatinate state election, 2011 Germany 1 27.03.201 1 2011 Scottish parliament election 2011 Scotland 1 05.05.201 1 2011 Singapore’s 16th parliamentary General Election Singapore 1 07.05.201 1 2011 Norwegian local elections, 2011 Norway 2 12.09.201 1 2011 2011 Danish parliamentary election Denmark 2 15.09.201 1
  18. 18. 2011 Scottish parliament election 2011 Scotland 1 05.05.201 1 2011 Singapore’s 16th parliamentary General Election Singapore 1 07.05.201 1 2011 Norwegian local elections, 2011 Norway 2 12.09.201 1 2011 2011 Danish parliamentary election Denmark 2 15.09.201 1 2011 Berlin state election, 2011 Germany 2 18.09.201 1 2011 Gubernational elections: West Virginia USA 1 04.10.201 1 2011 Gubernational elections: Louisiana USA 1 22.10.201 1 2011 Swiss federal election, 2011 Switzerland 1 23.10.201 1 2011 2011 Seoul mayoral elections South Korea 1 26.10.201 1 2011 Gubernational eletions: Kentucky USA 1 08.11.201 1 2011 Gubernational elections: Mississippi USA 1 08.11.201 1 2011 Spanish national election 2011 Spain 1 20.11.201 1 2012 Queensland State election Australia 1 24.03.201 2 2012 South Korean legislative election, 2012 South Korea 1 11.04.201 2 2012 French presidential election, 2012 France 2 22.04.201 2 2012 Mexican general election, 2012 Mexico 1 01.07.201 2 2012 United States presidential election, 2012 / United States House of Representatives elections, 2012 USA 17 06.11.201 2 2012 South Korean presidential election, 2012 South Korea 2 19.12.201 2 2013 Ecuadorian general election, 2013 Ecuador 1 17.02.201 3 2013 Venezuelan presidential election, 2013 Venezuela 1 14.04.201 3 2013 Paraguayan general election, 2013 Paraguay 1 21.04.201 3
  19. 19. Big DATA? 2013: twitter and election No. of Tweets No. Of publications (2013) 0-500 3 501-1.000 4 1.001-5.000 1 5.001-10.000 1 10.001-50.000 7 50.001-100.000 4 100.001-500.000 5 500.001-1.000.000. 3 1.000.001-5.000.000 3 More than 5.000.000 3 More than 100.000.000 1 More than 1.000.000.000 1 No/unsufficient information 13 Weller, K. (2014). Twitter und Wahlen: Zwischen 140 Zeichen und Milliarden von Tweets. In: R. Reichert (Ed.), Big Data: Analysen zum digitalen Wandel von Wissen, Macht und Ökonomie (pp. 239-257). Bielefeld: transcript. 19
  20. 20. Comparability twitter and election Data collection methods Weller, K. (2014). Twitter und Wahlen: Zwischen 140 Zeichen und Milliarden von Tweets. In: R. Reichert (Ed.), Big Data: Analysen zum digitalen Wandel von Wissen, Macht und Ökonomie (pp. 239-257). Bielefeld: transcript. 20 Data source number No information 11 Collected manually from Twitter website (Copy-Paste / Screenshot) 6 Twitter API (no further information) 8 Twitter Search API 3 Twitter Streaming API 1 Twitter Rest API 1 Twitter API user timeline 1 Own program for accessing Twitter APIs 4 Twitter Gardenhose 1 Official Reseller (Gnip, DataSift) 3 YourTwapperKeeper 3 Other tools (e.g. Topsy) 6 Received from colleagues 1
  21. 21. Comparability twitter and election Data collection periods Weller, K. (2014). Twitter und Wahlen: Zwischen 140 Zeichen und Milliarden von Tweets. In: R. Reichert (Ed.), Big Data: Analysen zum digitalen Wandel von Wissen, Macht und Ökonomie (pp. 239-257). Bielefeld: transcript. 21 period Number of publications (2013) 0-10 hours 1 1-2 days 6 3-7 days 3 8-14 days 5 2-4 weeks 7 1-2 months 13 2-6 months 5 7-12 months 3 More than 12 months 0 No/unsufficient information 6
  22. 22. What is being studied? 0 100 200 300 400 500 600 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 Twitter Facebook YouTube Blogs Wikis Foursquare LinkedIn MySpace Number of publications per year, which mention the respective social media platform‘s name in their title. Scopus Title Search. For details: http://kwelle.wordpress.com/2014/04/07/bibliometric-analysis-of-social-media-research/.22
  23. 23. Challenges • Quickly changing landscape of social media platforms • Twitter as a model organism of social media research? 23
  24. 24. Scopus: 2000-2013 popular keywords Social networks (897), Social network (657) User interfaces (1,007) Social networking (online) (2,291) Facebook (847) Knowledge management (860) Web services (869) Information systems (810) Twitter (667) Semantics(765), Semantic Web(669) Communication(650) Information technology (639) E-learning (623), Students(579) Education(520) Teaching (504) Scopus search, conducted March 2014: (TITLE-ABS-KEY("social media") OR TITLE-ABS-KEY("social web") OR TITLE-ABS-KEY("social software") OR TITLE-ABS-KEY("web 2.0")) AND PUBYEAR > 1999
  25. 25. Social Media Research: Topics • Political communication / elections • Activism • Popular culture, memes • Brand communication, marketing • Journalism (incl. agenda setting, citizen journalism, TV backchannel) • Crisis communication, disaster response • Scholarly communication • Language • And many more 25
  26. 26. pointless babble? 26
  27. 27. Social media research – some insights from expert interviews
  28. 28. • Weller, Katrin, and Katharina E. Kinder-Kurlanda. 2014. ““I love thinking about ethics!” Perspectives on ethics in social media research.” Internet Research (IR15), Deagu, South Korea, 22.-24.10.2014. Paper to be published in Selected Papers of Internet Research, view preprintHiddenDataEthics_Weller+Kinder-Kurlanda_IR15- preprint. • Kinder-Kurlanda, K. E.; Weller, K. (2014). “I always feel it must be great to be a Hacker!”: The role of interdisciplinary work in social media research. In Proceedings of the 2014 ACM Web Science Conference WebSci’14, June 23–26, 2014, Bloomington, IN, USA, pp. 91-98. doi:http://dx.doi.org/10.1145/2615569.2615685. View preprint versionhiddendata_websci14-preprint_Kinder-Kurlanda+Weller(2014). • Weller, K. & Kinder-Kurlanda, K. (in press). Uncovering the Challenges in Collection, Sharing and Documentation: The Hidden Data of Social Media Research? To appear in Workshop on Standards and Practices in Large-Scale Social Media Research. ICWSM, Oxford, May 2015. http://www.aaai.org/ocs/index.php/ICWSM/ICWSM15/paper/viewFile/10657/105 52. 28
  29. 29. CHANCES • Researchers value social media as a new type of data. • Previously „ephemeral data“ become visible • Immediate – quick reaction to events • Structured • „natural“ data 29 What I find really interesting is that structure becomes manifest in internet communication. So it’s the first time in history actually that we can, that social structures between people become manifest within a technology. (...) They become visible, they become crawlable, they become analyzable.
  30. 30. Some of the CHALLENGES Preliminary results, more detailed analysis to follow. - Interdisciplinarity - Ethics - Standards - Data access & infrastructure - … 30
  31. 31. Unregulated and developing field • Researchers showed a high awareness of the unregulated and developing character of social media research methods. But, I think that (…) in like a couple of years, maybe five – it depends a lot, because the subject of the research is changing every day, (…) but I think that we’re going to have, (…) more or less shared qualitative approaches with a lot of good practices.
  32. 32. Data Sharing 32 But you can’t make your data available for others to look at, which means both your study can’t really be replicated and it can’t be tested for review. But also it just means your data can’t be made available for other people to say, Ah you have done this with it, I’ll see what I can do with it, (…) There is no open data.
  33. 33. Data Sharing “I think probably a couple times we’ve asked around if anyone else happened to have a particular dataset. […] but not so much, because they probably have tracked in a different data format, and then merging the two together actually becomes quite difficult as well.” 33
  34. 34. Ethics / privacy 34 “I will not quote tweets.” “if somebody plays a really important role in a particular event then maybe they deserve the credit of being accredited as well.”
  35. 35. Social media research – some more insights from critical literature
  36. 36. Representativeness Blank, G. (2014). Who uses Twitter? Representativeness of Twitter Users. Presentation at General Online Research GOR 14. Retrieved from: http://conftool.gor.de/conftool14/index.php?page=downloadPaper&filename=Blank- Who_uses_Twitter_Representativeness-119.pptx&form_id=119&form_version=final 34 26 8 12 18 14 10 17 12 23 28 3330 35 0 20 40 60 80 100 InterestPolitical activities Interest in politics Send political message Contact MP online Re-post political news Political comment on SNS Find political facts Sign online petition OxIS current users: 2013 N=1,613 Figure 6: Political Activities of Twitter Users Twitter user Non-user
  37. 37. Data Quality • E.g. comparison of Twitter API and Reseller data. 37 Morstatter, F., Pfeffer, J., Liu, H., & Carley, K. M. (2013). Is the Sample Good Enough? Comparing Data from Twitter’s Streaming API with Twitter’s Firehose. Retrieved from http://arxiv.org/abs/1306.5204
  38. 38. Inequality in data access possibilities 38 boyd, d., and Crawford, K. 2012. Critical questions for Big Data: Provocations for a cultural, technological, and scholarly phenomenon. Information, Communication & Society 15(5):662–679. DOI: 10.1080/1369118X.2012.678878 • Data haves and data have nots – Financial reasons – Connections to companies – Different skills – …
  39. 39. Top 5 Challenges in Twitter research • Representativeness and validity • Cross-platform studies • Comparisons (e.g. different countries, points in time) • Multi-method approaches • Context and meaning Bruns, Axel, and Katrin Weller. 2014. "Twitter data analytics – or: the pleasures and perils of studying Twitter (guest editorial for special issue)". Aslib Journal of Information Management 66 (3): 246-249. 39
  40. 40. Summary Three sources of challenges in social media research: • The variety of user interactions that count as social media and their ever changing nature that makes social media a moving target. • The diversity of the research community, which challenges knowledge transfer and development of standards. • The dependency on commercial companies to open up access to their data. Researchers themselves only have limited means to change these sources of challenges. 40 Weller, K. (2015). Accepting the challenges of social media research. Online Information Review 39(3).
  41. 41. Summary Currently addressed challenges • research infrastructure, including data collection and sharing facilities, training in new methods and technologies. • The call for more thoughtfulness in research ethics. • Critical considerations on big data and data quality, including reflection of the power of algorithms and misrepresentations through big data approaches. Requests for broader scopes by facilitating multi-method and multi- platform studies, as well as longitudinal studies 41
  42. 42. Outlook • Long term preservation of social media, i.e. archiving of data and as well as of social media platforms’ look and feel. • Documentation of applied research methods which should enable comparative studies across the single use cases, quality control and verification of results. • Accessing social media users’ expectations on privacy in order to respond to them through ethical standards. 42
  43. 43. GREETINGS FROM COLOGNE AND DC!
  44. 44. QUESTIONS AND FEEDBACK katrin.weller@gesis.org @kwelle http://katrinweller.net

×