Social Media Daten als Grundlage
sozialwissenschaftlicher Forschung:
Chancen und Herausforderungen
Kassel, 11.06.2014
Dr. Katrin Weller, katrin.weller@gesis.org, @kwelle
http://katrinweller.net
Dr. Katrin Weller
GESIS – Leibniz Institut für Sozialwissenschaften
Datenarchiv für Sozialwissenschaften
2006-2012: Heinrich-Heine-Universität Düsseldorf,
Promotion in Informationswissenschaft
Weller, K., Bruns, A., Burgess, J., Mahrt, M., & Puschmann, C. (Eds.) (2014).
Twitter and Society. New York et al.: Peter Lang.
Social Media Forschung in (Publikations)zahlen
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
Anzahl an Publikationen in Scopus
(TITLE-ABS-KEY("social media") OR TITLE-ABS-KEY("social web") OR TITLE-ABS-KEY("social
software") OR TITLE-ABS-KEY("web 2.0")) AND PUBYEAR > 1999 [ März 2014]
Publikationen nach Fachgebieten (Scopus)
10650; 36%
5542; 19%
2384
2288
2151
1535
773
772
65 Computer Science
Social Sciences
Engineering
Medicine
Business, Management and Accounting
Mathematics
Arts and Humanities
Decision Sciences
Psychology
Nursing
Economics, Econometrics and Finance
Biochemistry, Genetics and Molecular Biology
Health Professions
Environmental Science
Earth and Planetary Sciences
Agricultural and Biological Sciences
Pharmacology, Toxicology and Pharmaceutics
Physics and Astronomy
Materials Science
Multidisciplinary
Neuroscience
Immunology and Microbiology
Chemical Engineering
Veterinary
Dentistry
Chemistry
Energy
2013: Twitter und Wahlen
No. of Tweets No. Of publications (2013)
0-500 3
501-1.000 4
1.001-5.000 1
5.001-10.000 1
10.001-50.000 7
50.001-100.000 4
100.001-500.000 5
500.001-1.000.000. 3
1.000.001-5.000.000 3
mehr als 5.000.000 3
mehr als 100.000.000 1
mehr als 1.000.000.000 1
keine/ungenaue Angabe 13
Weller, K. (im Druck): Twitter und Wahlen: Zwischen 140 Zeichen und Milliarden von Tweets. Erscheint im R. Reichert:Big
Data. http://www.transcript-verlag.de/978-3-8376-2592-9/big-data
34
Methoden (in der sozialw. Twitter Forschung)
Weller, K. (2014). What do we get from Twitter – and what not? A close look at Twitter research in the social sciences. Knowledge
Organization 41(3), 238-248.
Data collection
• dapper: http://open.dapper.net/
• SocSciBot “Web crawler and link analyser for the social sciences and
humanities http://socscibot.wlv.ac.uk
• Chorus: http://www.chorusanalytics.co.uk/
• SODATO (Copenhagen Business School):
http://cssl.cbs.dk/software/sodato/
• NVivo 10 (eigenltich Tool für qualitative Inhaltsanalyse), jetzt mit
Import von Tweets, Facebook Posts, YouTube Daten, LinkedIn.
• YourTwapperkeeper – für Twitter
• Python Packages, z.B. für Wikipedia
36
Social Network Analysis
• NodeXL (mit Datenimport-Funktion)
• ORA (http://www.casos.cs.cmu.edu/projects/ora), a
social network analysis (SNA) software package, for
basic manipulations and visualization of the network
data
• UCINET (http://www.analytictech.com/ucinet)
37
NodeXL
Network Analysis – basierend auf Excel.
Integrierte Funktion zur Sammlung von Daten aus u.a.:
• Facebook
• Twitter
• YouTube
• Flickr
• Wikipedia
Ende der Theorie?
Sozialwissenschaften
1. Problem
2. Forschungsfrage/
Hypothesen
3. Theorien
4. Methoden
5. Daten
6. Analyse
7. Ergebnispräsentation
Typische Big Data-Analyse
1. Methoden
2. Daten
3. Analyse
4. Ergebnispräsentation
5. Problem
Korrelation vs. Kausalität
Pfeffer, J. (2013). Big data, big research? Opportunities and constraints for computer supported social science.
Keynote zur „Digital methods“-Tagung der DGPuK-Fachgruppe Computervermittelte Kommunikation, Wien.
Abgerufen von http://www.pfeffer.at/slides/DigitalMethods-BigData.pdf
Anfang der Theorie?
“The interesting point is that these limitations can (and have to)
be addressed by theory guided research that is typically
conducted by social scientists. Accordingly, opportunities emerge
for those social and behavioral scientists who are willing to
collaborate with the Big Data researchers in the natural,
engineering, and computer sciences.”
60
Snijders, C., Matzat, U., & Reips, U.-D. (2012). ‘Big Data’: Big gaps of knowledge in the field of Internet. International
Journal of Internet Science, 7, 1-5. Retrieved from http://www.ijis.net/ijis7_1/ijis7_1_editorial.html
#2
„data haves“ vs. „data have-nots“
boyd, danah and Kate Crawford. (2012).
“Critical Questions for Big Data”
Repräsentativität
“The core challenge is that most big data that have
received popular attention are not the output of
instruments designed to produce valid and reliable
data amenable for scientific analysis.“
Lazer, D., Kennedy, R., King, G., & Vespignani, A. (2014). The parable of Google Flu: Traps
in big data analysis. Science, 343(6176), 1203-1205.
Blank, G. (2014). Who uses Twitter? Representativeness of Twitter Users. Presentation at General Online Research GOR 14.
Retrieved from: http://conftool.gor.de/conftool14/index.php?page=downloadPaper&filename=Blank-
Who_uses_Twitter_Representativeness-119.pptx&form_id=119&form_version=final
34
26
8
12
18
14
10
17
12
23
28
3330
35
0
20
40
60
80
100
InterestPolitical activities
Interest
in politics
Send
political
message
Contact
MP online
Re-post
political
news
Political
comment
on SNS
Find
political
facts
Sign
online
petition
OxIS current users: 2013 N=1,613
Figure 6: Political Activities of Twitter Users
Twitter user Non-user
Repräsentativitätsprobleme auf
mehreren Ebenen
“About a third of all UK Internet users have a twitter profile; a
subset of that group are the active tweeters who produce the
bulk of content; and then a tiny subset of that group (about
1%) geocode their tweets (essential information if you want to
know about where your information is coming from).”
Graham M. (2012). Big data and the end of theory?". The Guardian. Retrieved from:
http://www.theguardian.com/news/datablog/2012/mar/09/big-data-theory
Gefahren durch fehlende Repräsentativität
• Diskussion: Menschen, die
durch Big Data nicht
repräsentiert sind
http://streetbump.org
Siehe auch: http://www.wired.com/2014/03/potholes-big-data-
crowdsourcing-way-better-government/
Literatur• Literaturliste zum Thema Big Data in den Sozialwissenschaften:
http://kwelle.wordpress.com/2014/04/12/big-data-links-and-literature/
Literaturempfehlungen
• Ackland, R. (2013). Web Social Science. Los Angeles et al: SAGE.
• Boyd, D., & Crawford, K. (2012). Critical questions for big data: Provocations for a cultural,
technological, and scholarly phenomenon. Information, Communication, & Society, 15(5), 662-679.
• Bruns, A. (2013). Faster than the speed of print: Reconciling ‘big data’ social media analysis and
academic scholarship. First Monday 10(18). Available
http://firstmonday.org/ojs/index.php/fm/article/view/4879
• Giglietto, F., Rossi, L., & Bennato, D. (2012). The Open Laboratory: Limits and Possibilities of Using
Facebook, Twitter, and YouTube as a Research Data Source. Journal of Technology in Human Services,
30(3-4), 145–159.
• Karpf, D. (2012). Social science research methods in internet time. Information, Communication &
Society, 5(15), 639-661.
• Weller, K., Bruns, A., Burgess, J., Mahrt, M., & Puschmann, C. (2014). Twitter and Society. New York et
al.: Peter Lang.
• Williams, S. A., Terras, M. M., Warwick, C. (2013). What do people study when they study Twitter?
Classifying Twitter related academic papers. Journal of Documentation, 69(3), 384-410.