This document summarizes Anatoliy Gruzd's presentation on research with social media data and considerations around data stewardship and ethics. It discusses key aspects of working with social big data including collection from APIs and data resellers, analysis through visualization, network and geo-based analysis, and preservation efforts from public archives, private companies and personal archiving. It also covers ethical considerations for researchers, industry and users around topics like transparency, privacy and expectations of data use. The presentation emphasizes the importance of responsible data stewardship across the whole data lifecycle from collection to analysis to preservation.
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Research with Social Media Data: Stewardship & Ethical Considerations
1. Research with Social Media Data –
Data Stewardship & Ethical Considerations
Anatoliy Gruzd
@gruzd
gruzd@ryerson.ca
Associate Professor
Ted Rogers School of Management
Director, Social Media Lab
Ryerson University
KMDI Speaker Series
University of Toronto
Toronto, Canada
February 11, 2015
3. Defining Big Data
• Large data sets
• Structured & Unstructured
• Live data
• Machine-generated vs User-generated
Anatoliy Gruzd 3Twitter: @gruzd
4. Growth of Social Big Data
from Online Social Networks
Facebook
1B
users
Twitter
500M
usersSocial Media sites have become
an integral part of our daily lives!
5. Social Media Data Stewardship
• Social Media Data Stewardship – processes related to all aspects of
managing social media data including collection, storage, analysis,
publishing, reuse and preservation of data
• Today’s focus on
Anatoliy Gruzd 5
COLLECTION ANALYSIS
Ethical Consideration
PRESERVATION
Twitter: @gruzd
6. Increasing Access to Social Big Data
via API (Application Programmable Interface)
Anatoliy Gruzd 6source: http://www.programmableweb.com
COLLECTION
7. Increasing Access to Social Big Data
via Data Resellers
Anatoliy Gruzd 7
COLLECTION
Twitter: @gruzd
11. Decision Making
in domains such as Politics, Health Care and Education
Data
Social
Big
Making Sense of Social Big Data
Anatoliy Gruzd 11
ANALYSIS
Twitter: @gruzd
12. Making Sense of Social Big Data
Anatoliy Gruzd 12
Social Big Data -> Visualizations -> Understanding
(Development, Application & Validation)
ANALYSIS
Twitter: @gruzd
13. Making Sense of Social Big Data
Example: Geo-based Analysis
Anatoliy Gruzd 13
ANALYSIS
Twitter: @gruzd
14. Making Sense of Social Big Data
Example: Geo-based Analysis
Anatoliy Gruzd
ANALYSIS
Source: https://blog.twitter.com/2013/the-geography-of-tweets
Geography of
Twitter Networks
Twitter: @gruzd 14
15. Making Sense of Social Big Data
Example: Geo-based + Content Analysis
Tracking Hate Speech on Twitter
Anatoliy Gruzd 15
ANALYSIS
Source: http://www.fenuxe.com/tag/geo-coded
Twitter: @gruzd
16. Making Sense of Social Big Data
Example: Network Analysis
Anatoliy Gruzd 16
ANALYSIS
Social Network Analysis (SNA)
• Nodes = People
• Edges /Ties (lines) = Relations/
“Who talks to whom”Twitter: @gruzd
17. • Reduce the large quantity of data into
a more concise representation
• Makes it much easier to understand
what is going on in user-driven data
Once the network is discovered, we can find out:
• How do people interact with each other,
• Who are the most/least active members of a group,
• Who is influential in a group,
• Who is susceptible to being influenced, etc…
Advantages of Social Network Analysis
Anatoliy Gruzd 17
ANALYSIS
Twitter: @gruzd
18. Making Sense of Social Big Data
Example: Network Analysis
Social Media Use during the 2011 Canadian Federal Election
ANALYSIS
There are some pockets of
political polarization on
Twitter
But Twitter has potential for
supporting open cross-
ideological discourse
Liberal
Conservative
Spam
Unknown &
Undecided
NDP
Left
Green
Bloc
Other
Gruzd, A. and Roy, J (2014). Political Polarization on Social Media: Do Birds of a Feather
Flock Together on Twitter? Policy & Internet. 18
19. Making Sense of Social Big Data
Example: Network Analysis
Communication of health-related information in blogs
ANALYSIS
Gruzd, A., Black, F.A., Le, Y., Amos, K. (2012). Investigating Biomedical Research Literature in the
Blogosphere: A Case Study of Diabetes and HbA1c. Journal of the Medical Library Association 100(1): 34-42.
20. Making Sense of Social Big Data
Example: Network Analysis
Communication of health-related information in blogs
ANALYSIS
Gruzd, A., Black, F.A., Le, Y., Amos, K. (2012). Investigating Biomedical Research Literature in the
Blogosphere: A Case Study of Diabetes and HbA1c. Journal of the Medical Library Association 100(1): 34-42.
21. Social Big Data Preservation Efforts:
Public/Non-Profit Initiatives
• Twitter Archive at the Library of
Congress
• “Archiving and preserving outlets such
as Twitter will enable future
researchers access to a fuller picture of
today’s cultural norms, dialogue,
trends and events to inform
scholarship, the legislative process,
new works of authorship, education
and other purposes.”
• As of December 1, 2012:
approximately 170 billion tweets
totaling 133.2 terabytes for two
compressed copies
http://www.loc.gov/today/pr/2013/files/twitter_r
eport_2013jan.pdf
PRESERVATION
22. Social Big Data Preservation Efforts:
Public/Non-Profit Initiatives
• Internet Archive https://archive.org/
Anatoliy Gruzd 22
PRESERVATION
Twitter: @gruzd
23. Social Big Data Preservation Efforts:
Public/Non-Profit Initiatives
• Internet Archive https://archive.org/
Anatoliy Gruzd 23
PRESERVATION
Twitter: @gruzd
24. Social Big Data Preservation Efforts:
Private Initiatives – Data Resellers
Anatoliy Gruzd 24
PRESERVATION
Twitter: @gruzd
25. Social Big Data Preservation Efforts:
Private Initiatives – Enterprise solutions
Anatoliy Gruzd 25
PRESERVATION
Twitter: @gruzd
26. Social Big Data Preservation Efforts:
Personal Archiving – Facebook
Anatoliy Gruzd 26
PRESERVATION
Twitter: @gruzd
27. Social Media Data Stewardship
• Social Media Data Stewardship – processes related to all aspects of
managing social media data including collection, storage, analysis,
publishing, reuse and preservation of data
• Today’s focus on
Anatoliy Gruzd 27
COLLECTION ANALYSIS
Ethical Consideration
PRESERVATION
Twitter: @gruzd
INDUSTRY RESEARCHERS USERS
28. Ethical Considerations when working with Big Data
• 2014 Facebook news feed experiment
• Facebook Atlas ID - People-based marketing
Anatoliy Gruzd 28
http://america.aljazeera.com/articles/2014/10/7/facebook-atlas.html
Ethical Consideration
Twitter: @gruzd
INDUSTRY
30. Social Media Data as Research Data
Data Collection Transparency
Anatoliy Gruzd Twitter: @gruzd 30
Ethical Consideration
(Driscoll & Walker, 2014)
RESEARCHERS
31. Social Media Data as Research Data
Data Collection Transparency:
Deleted Posts Dilemma?
Anatoliy Gruzd Twitter: @gruzd 31
Ethical Consideration
(Mason. R, 2015)
RESEARCHERS
32. Social Media Data as Research Data
Users’ Perspective
Views about researchers using social media fell into three
categories:
1) Scepticism: that ‘traditional’ research methods are more
valid and reliable than online methods,
2) Acceptance: online research is beneficial as it removes bias
caused by face-to-face research
3) Ambivalence: those who had no feelings, as they felt it
would happen regardless of their opinion.
Anatoliy Gruzd 32
Ethical Consideration
(Beninger et.al., 2014)
Twitter: @gruzd
33. Social Media Data as Research Data
Users’ Perspective
Factors that influence users’ views of research using social
media (Beninger et.al., 2014):
• mode and content of social media posts,
• social media website being used,
• the expectations the user had when posting,
• the nature/purposes of the research and researcher’s
affiliation.
Anatoliy Gruzd 33
Ethical Consideration
Twitter: @gruzd
34. Social Media Data as Research Data
Users’ Perspective
• Teen social media users do not express a high level of concern
about third-party access to their data; just 9% say they are
“very” concerned.
Anatoliy Gruzd 34
Ethical Consideration
Twitter: @gruzd
(Madden, et.al, 2013)
36. Human Subject Research
• In Canada research that involves human participants is
governed by the Tri-Council Policy Statement:
• Ethical Conduct for Research Involving Humans (TCPS)
• 1st Ed (2005) | 2nd Ed. (2010) | 2nd Ed. – REVISED (2014)
• http://pre.ethics.gc.ca/
36
Ethical Consideration
Anatoliy Gruzd Twitter: @gruzd 36
37. TCPS on Internet Research
• REB review is also not required where research uses exclusively
publicly available information that may contain identifiable
information, and for which there is no reasonable expectation
of privacy.
• Cyber-material such as documents, records, performances,
online archival materials or published third party interviews to
which the public is given uncontrolled access on the Internet
for which there is no expectation of privacy is considered to be
publicly available information.
(TCPS 2014 Ed,p.16)
37
Ethical Consideration
Anatoliy Gruzd Twitter: @gruzd 37
REB - Research Ethics Boards
38. TCPS on Internet Research (cont.)
• There are publicly accessible digital sites where there is a
reasonable expectation of privacy.
• When accessing identifiable information in publicly accessible
digital sites, such as Internet chat rooms, and self-help groups
with restricted membership, the privacy expectation of
contributors of these sites is much higher.
• Researchers shall submit their proposal for REB review
(see Article 10.3).
(TCPS 2014 Ed, p.16)
38
Ethical Consideration
Anatoliy Gruzd Twitter: @gruzd 38
39. TCPS on Internet Research (cont.)
• Where data linkage of different sources of publicly
available information is involved, it could give rise to new
forms of identifiable information that would raise issues
of privacy and confidentiality when used in research, and
would therefore require REB review (see Article 5.7).
(TCPS 2014 Ed, p.16)
39
Ethical Consideration
Anatoliy Gruzd Twitter: @gruzd 39
40. Social Media Data Stewardship…
• Social Media Data Stewardship – processes related to all aspects of
managing social media data including collection, storage, analysis,
publishing, reuse and preservation of data
• Today’s focus:
Anatoliy Gruzd 40
COLLECTION ANALYSIS
Ethical Consideration
PRESERVATION
Twitter: @gruzd
• Next steps: Develop a conceptual model of Social Media Data
Stewardship based on both industry & research practices as well as
social media users’ attitudes and perceptions.
41. Research with Social Media Data –
Data Stewardship & Ethical Considerations
Anatoliy Gruzd
@gruzd
gruzd@ryerson.ca
Associate Professor
Ted Rogers School of Management
Director, Social Media Lab
Ryerson University
KMDI Speaker Series
University of Toronto
Toronto, Canada
February 11, 2015
42. References
• Beninger, K., Fry, A., Jago, N., Lepps, H., Nass, L., & Silvester, H. (2014). Research
using Social Media: Users’ Views. NatCen Social Research. Retrieved from
http://www.natcen.ac.uk/media/282288/p0639-research-using-social-media-
report-final-190214.pdf
• Driscoll, K., & Walker, S. (2014). Big Data, Big Questions| Working Within a Black
Box: Transparency in the Collection and Production of Big Twitter Data.
International Journal of Communication, 8(0), 20.
• Madden, M., Am, Lenhart, a, S, Cortesi, ra, Gasser, U., … Beaton, M. (2013).
Teens, Social Media, and Privacy. Retrieved from
http://www.pewinternet.org/2013/05/21/teens-social-media-and-privacy/
• Mason, R. (2015). Social Media Research: Approaches, Findings, Challenges.
HICSS-15. Retrieved from http://somelab.net/wp-
content/uploads/2015/02/SoMe_Ames_final_presented.pdf
• Kitchin, H. (2007). Research Ethics and the Internet: Negotiating Canada’s Tri-
Council Policy Statement. Fernwood Publishing.
Anatoliy Gruzd Twitter: @gruzd 42