SlideShare ist ein Scribd-Unternehmen logo
1 von 60
Working with Social Media Data:
Ethics & Good Practice around
Collecting, Using and Storing Data
Nicola Osborne
Digital Education Manager, EDINA
Nicola.osborne@ed.ac.uk
@suchprettyeyes
Introductions: my social media work
• Digital Education Manager at EDINA, University of Edinburgh.
• Work on EDINA’s educational technology, innovation, digital and data projects
for audiences across Scotland, UK and further afield.
• Co-I on: PTAS-funded Managing Your Digital Footprints research strand (2014-
2015); Ongoing (2015-) Managing Your Digital Footprint research team; PTAS-
funded “A Live Pulse”: Yik Yak for understanding teaching, learning and
assessment at Edinburgh project.
• Co-tutor on ongoing Digital Footprint MOOC (2017-)
• Previously EDINA Social Media Officer (2009-2015), providing expertise and
advice on social media to colleagues across UoE for over 8 years.
http://edina.ac.uk/
Introduction: you and your work
1. Who are you?
2. What social media related research are you
working on or hoping to work on?
3. What do you hope to get out of today’s
session?
Overview
• Introduction & Design Considerations
– Approach
– Data accuracy
• Ethical Considerations
– Recommended ethical guidance
– Terms & Conditions – and impact on Data
– Consent and trust
• Practical Considerations
– Existing data sets
– Available data tools
– APIS
– Options for analysis and visualisation
• Storing and handling Data
– Compliance with legal requirements
– Sources of support
• Recommended researchers, groups, and resources.
• Q&A/Discussion – but questions welcome throughout!
Where to start…
• What is your research question(s)?
• Are social media or social media communities the
subject, or core to the subject?
• Or, is it the space for recruitment or reaching an
audience?
• Or, is it just a convenient space for data collection?
The Elephant (Blue Bird) in the Room
Image ©Twitter.com 2012
Research Design Considerations
• Research approach to be taken
• Appropriate data types to support your research
– Streaming/live data OR
– Archived / capture of data over time with asynchronous analysis
• Ethical considerations
• Consent process of subjects and their network
• Etiquette considerations
• Platform(s) to be used
– Fit with target subjects
– Terms & Conditions
• Practical access limitations e.g.
– Do tools for data capture exist?
– Does an API exist?
– What are the API limitations?
– Costs of access
• Your (researcher) or RAs expertise.
• Long term research vision – do you have rights to use
and reuse data in the ways you hope to?
Possible Methods &
Questions to Think About
• Computational (See also Batrinca and Treleaven 2015):
– Data access through APIs, screen scraping, established methods (e.g. DMI tools)?
– Text and data mining and/or Natural Language Processing (NLP)?
– Social network analysis and/or Actor Network Theory (ANT) analysis using nodes and edges in the network?
– Sentiment analysis based on text mining/NLP or based on presence/absence of emojis and/or visual content?
– Visual analysis and/or video or audio analysis for multimedia content?
• Quantitative (See also OII 2013a, b & c):
– Medium or large scale data?
– Automated or survey/volunteered data collection?
– Data cleansing process – how will you ensure that you have a good quality data set?
– What kind of statistical analysis do you want to take? Tools might include SPSS, NVIVO, Gephi, Tableu, etc.
– Will you be comparing to existing data sets and/or undertaking trend analysis over time?
– What standard tools in your field – for digital or non digital data – can you use to collect or interpret your data?
• Qualitative:
– Manual collection?
– Ethnographic approaches and/or participant observation
– Focus groups or similar?
– Critical/reflexive reading and coding of texts/content
Batrinca, B. and Treleaven, P.C., 2015. Social Media Analytics: a survey of techniques, tools and platforms. In AI & Society, 30 (1). Pp. 89-116. https://doi.org/10.1007/s00146-014-0549-4
Oxford Internet Institute, 2013a. Quantitative Methods in Social Media Research: Big Data. In OII YouTube Channel, 15 March 2013. See: https://www.youtube.com/watch?v=6hmjj7_1sSY
Oxford Internet Institute, 2013b. Quantitative Methods in Social Media Research: Populations and Sampling. In OII YouTube Channel, 15 March 2013. See: https://www.youtube.com/watch?v=6hmjj7_1sSY
Oxford Internet Institute, 2013c. Space-Time as a Sampling Condition for New Media Research. In OII YouTube Channel, 15 March 2013. See: https://www.youtube.com/watch?v=HNxn0PqOc8k
Is Social Media Data Representative?
• Not all people use social media (and some of the least privileged groups in society are not
online at all).
• Most social media data collection methods favour English language data in mainstream
US/Global sites. It is unusual to see multilingual research or research that acknowledges use
of content including non-English text by primarily English speakers.
• Privacy settings and publicness tend to reflect status and privilege. Accessing at-risk,
vulnerable, heavily trolled, and/or niche interest groups is more difficult than obtaining public
posts from middle class white male social media users. BAME communities, women’s groups,
LGBTQ+ communities, etc. tend to make higher use of private groups, group moderation, and
protective measures that require more qualitative and overt consent-based approaches.
• Not all social media users are active. There is an “activity and agency bias” (Lutz and Hoffman
2017) in much of the current research. Obtaining data on passive reads and engagement with
content is extremely difficult through quantitative methods. It may be easier with participant
observation.
Lutz, C. and Hoffman, C. P. 2017. The dark side of online participation: exploring non-passive and negative participation. In Information,
Communication & Society: AoIR Special Issue, 20 (6), pp. 876-897. http://dx.doi.org/10.1080/1369118X.2017.1293129
Question/Discussion
Which platform(s) are you intending to/are you
working with?
How did you select these social media spaces?
Ethical Considerations
• Visibility vs expectations of privacy:
– Being “in public” is not consent to being researched, their imagined audience may be quite different.
(see AoIR guidance, Marwick and boyd 2011)
– Are you engaging with private or “public” figures – expectations over visibility will vary significantly.
• How possible is it to obtain informed consent for work undertaken with your chosen social
media platform? How can consent be withdrawn?
• How will your data be collected and used? (Attributed vs Pseudonyms vs Anonymous).
• What personal data is being used? Does it put anyone at risk?
• What is the risk of accidental exposure or re-identification? Text snippets, quotes and images
may all be easily searchable.
• Public – or previously public – data can change in sensitivities over time.
• How will you handle/remove/retain subsequently deleted content
Marwick, A. and boyd, d., 2011. I tweet honestly, I tweet passionately: Twitter users, context
collapse, and the imagined audience. In new Media & Society, 12 (1), pp. 144-133. DOI:
10.1177/1461444810365313.
Recommended: AoIR Ethics Guidance
• AoIR Ethics Guidance (2012):
https://aoir.org/reports/ethics2.pdf
• AoIR Ethics Chart – a quick guide to
key issues:
https://aoir.org/aoir_ethics_graphic_2
016/
• AoIR Ethics Guidance (2002):
https://aoir.org/reports/ethics.pdf
• Annette Markham (co-author of AoIR
guidance) on Impact Models for
ethical decision making in data
research and design:
https://annettemarkham.com/2017/0
7/impact-model-ethics/
Recommended: Social Media
Research: A Guide to Ethics
• Excellent concise research ethics
guidance from the ESRC-
funded “Social Media, Privacy and
Risk: Towards More Ethical Research
Methodologies” project at
University of Aberdeen.
• Includes pointers to further social
media ethics resources.
• Townsend, L. and Wallace, C. 2016.
Social Media Research: A Guide to
Ethics. Aberdeen: University of
Aberdeen/ESRC Social Media
Enhancement project. Available
from: http://www.dotrural.ac.uk/soc
ial-media-research-ethics/
“But the data is already public”
In 2008 researchers released profile data (The T3 Data Set) from Facebook accounts of students at a US University,
inadvertently making identifiable data public, as reported in Zimmer (2010).
In this case the researchers:
• Had employed RAs who were part of the Network being examined and had (various levels of) access to more
information than a non-logged-in user of Facebook/user beyond the Network.
• Had funding that mandated open publishing and sharing of results.
• Had University but not individuals consent for data collection
• Combined Facebook with university housing data in their data sets
• Obscured the identity of the university where students were based, but described key characterstics
• Attempted to make all data anonymous by removing identifying information (name, student id, etc.) but left
network and behavioural information intact.
• Asked other researchers using the data not to attempt to reidentify subjects.
• Stated that “hackers” and “extreme effort” would be the only way to “crack” the data.
The university was identified swiftly based purely on the codebook and other writings about the data – but not
requiring direct access to the data. Once the university was identified, other specific identifying data (nationality, race,
home state, etc.), sometimes with only 1 individual in these groups, made re-identification of (some) students simple.
After public scrutiny and identification of the university, the data set was swiftly withdrawn by the researchers.
Zimmer, M. 2010. “But the data is already public”: on the ethics of research in Facebook. In Ethics and Information
Technology, 12 (4), December 2010, pp. 313-325. https://link.springer.com/article/10.1007%2Fs10676-010-9227-5
Terms & Conditions
• Before undertaking any social media research understand the T&Cs and
Developer T&Cs for the platform(s) you are looking at.
• Understand how your research aligns with the T&Cs, and any possible
issues of privacy, etiquette, or practical access.
• If your work is in conflict with T&Cs either re-design your research
(strongly recommended) or look carefully at risks and impacts.
• You should not ignore any T&Cs for technical reasons. If there is a valid
reason to ignore T&Cs for specific research reasons (such as research on
deleted tweets), be prepared to justify that to ethics boards and peer
reviewers. And understand that you may risk losing access to the platform
and your research data if you are found to be in breach of T&Cs.
Twitter Developer T&Cs of note (1)
Section VII (Other Important Terms), A: User Protection:
"Twitter Content, and information derived from Twitter Content, may not be
used by, or knowingly displayed, distributed, or otherwise made available
to:"…
"any entity for the purposes of conducting or providing surveillance, analyses
or research that isolates a group of individuals or any single individual for any
unlawful or discriminatory purpose or in a manner that would be inconsistent
with our users' reasonable expectations of privacy;"
https://developer.twitter.com/en/developer-terms/agreement-and-policy
Twitter Developer T&Cs of note (1)
Section VII (Other Important Terms), C: Respect Users' Control and Privacy:
"3. If Content is deleted, gains protected status, or is otherwise suspended,
withheld, modified, or removed from the Twitter Service (including
removal of location information), you will make all reasonable efforts to
delete or modify such Content (as applicable) as soon as reasonably
possible, and in any case within 24 hours after a request to do so by
Twitter or by a Twitter user with regard to their Content."
https://developer.twitter.com/en/developer-terms/agreement-and-policy
Facebook Statement of Rights &
Responsibilities
Section 5: Protecting Other People's Rights
"We respect other people's rights, and expect you to do the same.
1. You will not post content or take any action on Facebook that infringes or violates someone else's rights or
otherwise violates the law.
2. We can remove any content or information you post on Facebook if we believe that it violates this Statement or
our policies.
3. We provide you with tools to help you protect your intellectual property rights. To learn more, visit our How to
Report Claims of Intellectual Property Infringement page.
4. If we remove your content for infringing someone else's copyright, and you believe we removed it by mistake, we
will provide you with an opportunity to appeal.
5. If you repeatedly infringe other people's intellectual property rights, we will disable your account when
appropriate.
6. You will not use our copyrights or Trademarks or any confusingly similar marks, except as expressly permitted by
our Brand Usage Guidelines or with our prior written permission.
7. If you collect information from users, you will: obtain their consent, make it clear you (and not
Facebook) are the one collecting their information, and post a privacy policy explaining what
information you collect and how you will use it.
8. You will not post anyone's identification documents or sensitive financial information on Facebook.
9. You will not tag users or send email invitations to non-users without their consent. Facebook offers social
reporting tools to enable users to provide feedback about tagging."
https://www.facebook.com/terms.php
Trust in Social Networks
vs Trust in Research
Research Ethics – Randall Munroe/xkcd (https://xkcd.com/1390/) Licensed under CC-BY-NC 2.5
Trust in social media networks is mixed, with
users increasingly savvy about data use…
However…
• Social Media users can find observation by
academic researchers more disconcerting
than by the companies who own the
platforms.
• Research, depending on the topic, can feel
like a judgement on behaviours making
consent hugely important.
• The burden on researchers to be clear about
motives, funders, process, etc. is higher than
on commercial companies.
• There are parallels here to how individuals
feel about e.g. Tesco Clubcard or Credit Card
data capture vs. surveys and censuses.
Question/Discussion
What are the ethical concerns and
considerations for your current (or previous)
social media research?
Obtaining Consent
• Consent may be implicitly included for API data access in some terms and
conditions BUT, when did you last read the terms and conditions? What about
your research participants?
So:
• Obtain explicit consent wherever possible.
• Be transparent if you are engaging in research in a space – with a pinned post, link
to your participant information sheet, etc.
• Consent can be tricky in anonymous and less traditional social media spaces (see
e.g. Osborne 2017 for approaches used with Yik Yak).
• Apply particular caution to gaining consent for screen shots, attributed posts,
reproducing exact images or text of posts etc.
Osborne, N. 2017. Addressing ethics of research in anonymous online spaces. In “A Live Pulse”: Yik Yak for understanding
teaching, learning and assessment at Edinburgh [blog], 13th July 2017.
http://yikyakresearch.blogs.edina.ac.uk/2017/07/13/addressing-ethics-of-research-in-anonymous-online-spaces/
Some Common Ethics Pitfalls
• Researcher assumes public data can be used in any way desired, without
considering the subject(s) intent when originally sharing their profile/post etc.
• Researcher explores conveniently available “public” data without realising that
privacy settings may make more information available to them, than is truly
“public”.
• Researcher is using “big” data under belief that individuals will not be identifiable
(as in the “But the data is already public” case).
• Research subject(s) has shared data on a public site but is not aware of their own
settings, or has not checked them lately, making implicit consent and the public
nature of the data problematic. Discovering that they have been included in
published research may be upsetting and problematic.
• Research Ethics Committees and/or Journal Editorial Boards are unaware or do not
properly consider that social media data includes real names, pseudonyms,
locations, highly disclosive data and do not ask the right questions around the
consent process, collection, aggregation, storage and retention of data.
• Researcher uses full text of a post as an “anonymous” example but this is then
Googled which identifies the original post/tweet/content and individual.
Data Considerations
• What kind of research approach are you taking?
• Who or what is the subject of your research – what is the right social media space to capture
appropriate data?
• What scale of data are you looking to collect/harvest? (If working with big data see boyd &
Crawford 2012)
• Will you be sampling or looking to collect all data over a specific time period?
• How sensitive is the topic?
• What level and type of consent can you obtain from participants?
• What kind of content?
– Profiles – for network analysis, image analysis, qualitative review of content through profile components/data?
– Posts – through API/data feed/harvesting or observation? Textual, visual, multimedia? Manual coding or text/data
mining?
– Comments/discussion – contents or threads of discussion?
– Metadata – tags, likes, engagements?
• Time bounds – how long do you expect to collect data for?
• What use will you make of the data after capture?
boyd, d. and Crawford, K., 2012. Critical questions for big data. In Information, Communication & Society special issue: A decade in
internet time: the dynamics of the internet and society, 15 (5). http://dx.doi.org/10.1080/1369118X.2012.678878
Sources of baseline data on usage,
access, trends, literacies etc.
• Oxford Internet Surveys: biennial data on UK public use and attitudes to the
internet, including social media: http://oxis.oii.ox.ac.uk/research/dataset-request/
• Ofcom research and data: Regular reporting on UK public use and attitudes to
media, including internet and social media: https://www.ofcom.org.uk/research-
and-data/search. Includes:
– Annual adult media use and attitudes, and children’s media literacy reporting:
https://www.ofcom.org.uk/research-and-data/media-literacy-research;
– Communications Market Report: annual overview at consumer use of communications of all types:
https://www.ofcom.org.uk/research-and-data/multi-sector-research/cmr
– Further regular and one-off data via the statistical release calendar:
https://www.ofcom.org.uk/research-and-data/data/statistics
• Pew Internet & American Life datasets: data on US public use, knowledge and
understanding of the web, digital literacy, social media, etc:
http://www.pewinternet.org/datasets/. For example:
– Social Media Update 2016: http://www.pewinternet.org/2016/11/11/social-media-update-2016/
Sources of Official Social Media
Usage Data, Trends, Financials, etc.
Best sources are quarterly earnings reports and presentations, typically including: monthly active
users, usage trends, earnings, monetization strategies, financials, future plans:
• Facebook & Instagram & WhatsApp: https://investor.fb.com/home/default.aspx
• Twitter: https://investor.twitterinc.com/results.cfm
• SnapChat: https://investor.snap.com/events-and-presentations/events
• YouTube/Google via Alphabet: https://abc.xyz/investor/
• Flickr:
– Currently owned by Oath, should be via Verizon once deal closes:
http://www.verizon.com/about/investors
– Historical up to 2017, via Yahoo captures in the Internet Archive:
https://web.archive.org/web/*/https://investor.yahoo.net/index.cfm
• LinkedIn:
– Current via Microsoft: https://www.microsoft.com/en-us/investor/
– Historical up to 2016: https://news.linkedin.com/topic/earnings
• Weibo: http://ir.weibo.com/phoenix.zhtml?c=253076&p=irol-irhome
Privately Held Social Media
• Crunchbase (https://www.crunchbase.com/) is a good source of
information on shareholders/owners, acquisitions, finances, etc.
• Alexa web rankings (owned by Amazon) give an overview of usage levels
and trends based on ranking relative to other sites in the US, and globally.
• Social Media sites’ “business” and “press” sites, official blogs and news
releases are best for user data.
• Some social media provide advertising APIs – which may be usable for
research depending on T&Cs and data content - but not developer or open
APIs, e.g. Snapchat: https://www.snap.com/en-GB/news/post/third-party-
applications-and-the-snapchat-api/
e.g:
– Pinterest:
• data on usage from Pinterest: https://business.pinterest.com/en
• Alexa data on usage: https://www.alexa.com/siteinfo/pinterest.com
• investor data:
https://www.crunchbase.com/organization/pinterest/investors/investors_list
Data Quality & Reliability
• Data sources and APIs can change regularly, and what is available may change over time (e.g.
Twitter moved from all to “Top” tweets some years ago for its API; Facebook have changed data
structures multiple times).
• Errors in automated data collection can be hard to spot until analysis is undertaken – sampling, trial
data collection, and review of code by colleagues can all be useful.
• Gaps in data may occur because there are genuine gaps in data creation/posting etc; because there
are technical issues with the social media service; because of an error in your code; or because you
are over your API rate limit for the minute/hour/day.
• Data may change over time – Facebook and Instagram allow posts to be edited so a request will
capture one moment in time not necessarily the original or final versions.
• Data may disappear over time. Notable example: the Twitter deletions terms and conditions means
that deleted tweets will not appear in a later API call.
– Research tools obeying the T&Cs will also update and remove deleted tweets.
– Research tools retaining deleted tweets are technically in breach of the T&Cs.
• Acquisitions, Mergers, and shut downs of social media sites can lead to changed terms and
conditions, changes to data availability and use, changes or removals of APIs and data access
routes, changes to user presence in a space, acceptable norms within a space (important for
qualitative work particularly).
Hidden pre-filtering and sampling
• Not all social media posts are equally likely to be included in standard API
endpoints
– e.g. a Twitter user with few posts and few followers is unlikely to appear on a
popular hashtag.
– The standard "Streaming" and "Search" APIs include 1% of Tweets and varies
in accuracy depending on activity/time etc. (See Morstatter et al 2013).
• Privacy settings will reduce the accuracy of any data sampled from
Facebook or other more complex privacy networks but it is hard to see
what is being excluded.
Morstatter, F., Pfeffer, J., Liu, H. and Carley, K.M., 2013. Is the Sample good enough?
Comparing data from Twitter's streaming API with Twitter's Firehose. In ICWSM 2013
and eprint arXiv:1306.5204. Available from: https://arxiv.org/abs/1306.5204
Question/Discussion
Have you already tried obtaining data for the
social media space you are using in your
research?
Have you faced any challenges or obstacles?
Existing Data Sets
• “The Zuckerberg files”: digital archive of all public comments by Mark Zuckerberg
including social media and mainstream media content for research use:
https://www.zuckerbergfiles.org/
• FiveThirtyEight Data: archive of data associated with FiveThirtyEight articles,
including social media data sets: https://github.com/fivethirtyeight/data
• Lumen database – tracking legal notices and complains for removal of online
materials (including social media content): https://www.lumendatabase.org/
• CSIRO (Australia’s national science agency) We Feel – emotions in Tweets – API:
http://wefeel.csiro.au/#/api (see:
http://datadrivenjournalism.net/resources/we_feel)
• Stanford Large Network Dataset Collection - includes social network data
sets: https://snap.stanford.edu/data/
• Network Repository – network datasets, including social media, Facebook and
Twitter networks: http://networkrepository.com/
• DocNow – social justice social network archives: http://www.docnow.io/
Cross-site data tools
• North Caroline Social Media Archive
Toolkit: https://www.lib.ncsu.edu/social-media-archives-toolkit; see
also: https://github.com/NCSU-Libraries/Social-Media-Combine
• Social Mention (search engine for social media) API:
http://www.socialmention.com/api/
• Scrapebox (premium tool) YouTube Downloader:
http://www.scrapebox.com/youtube-downloader and Social Account
Scraper: http://www.scrapebox.com/social-account-scraper
• ESRC COSMOS Open Data Tools (available but no longer updated since
2014): http://socialdatalab.net/software
• Overview of Twitter data tools (Ahmed
2015): http://blogs.lse.ac.uk/impactofsocialsciences/2015/07/10/social-
media-research-tools-overview/
Recommended: DMI Tools
The Digital Methods Initiative add new (documented) tools all the time, including:
• Censorship Explorer – determine censorship in various regions through URLs & proxies.
• Discus (Disqus) Comment Scraper – obtain data from the Discus comment plugin.
• Expand Tiny URLs – automatically expand large collections of Tiny URLs (e.g .from tweets).
• Geo IP – translate URLs or IP addresses into geographic locations (e.g. for a blog).
• Instagram Hashtag Explorer – retrieve Instagram media via specific hashtags.
• Issue Crawler – uses URLs to analyse relationships and connections through links between
URLs.
• Netvizz (Facebook) – extracts data from Facebook around groups, pages, search.
• Pinterest Scraper – scrapes Pinterest URLs and captures metadata of pins.
• Tumblr – data capture based on a Tumblr tags which retrieves metadata and co-incident tags.
• Twitter Capture and Analysis Toolset (DMI-TCAT) – robust and reproducible tool for data
capture and analysis of Twitter data. Source code available for local use.
• YouTube Data Tools – extract data on YouTube channels and videos, e.g. channel networks.
Access documentation and DMI tools at: https://wiki.digitalmethods.net/Dmi/ToolDatabase
See also, DMI Protocols: https://wiki.digitalmethods.net/Dmi/DmiProtocols
https://github.com/digitalmethodsinitiative/dmi-tcat/wiki
Internet Archive & WayBackMachine
• Global archive capturing websites (to various levels of detail/depth) based on IA
targets and user-submitted requests (since 2001).
• You can request a site for archiving, or a group of sites.
• Searchable resource OR can use exact URL to retrieve previous archived pages
(WayBackMachine).
• Collections exist for various social media collections, e.g:
– 2016 US Presidential Election Social Media: https://archive.org/details/2016electiontwitter
– Arab America on Social Media: https://archive.org/details/ArchiveIt-Collection-2797
– Gif Cities (Gifs from GeoCities): https://gifcities.org/
• Great for social media website changes, blogs, terms and conditions versions, etc.
• Sites available in a range of archive formats (IA), or as viewable pages
(WayBackMachine).
• See:
– https://archive.org/
– https://archive.org/web/
https://gifcities.org/?q=star+wars
UK Web Archive
• Run by the British Library (since 2004).
• Indexes (UK/related) sites to a greater depth than the Internet Archive.
• Smaller archive.
• You can request a site for archiving.
• Special Collections include:
– UK Blogs:
https://www.webarchive.org.uk/ukwa/collection/100698/page/1/source/colle
ction
– London Terror Attacks, 2005 (mainstream and social media commentary):
https://www.webarchive.org.uk/ukwa/collection/100757/page/1/source/colle
ction
– Olympic & Paralympic Games 2012 (mainstream and social media):
https://www.webarchive.org.uk/ukwa/collection/4325386/page/1/source/coll
ection
• See: https://www.webarchive.org.uk/ukwa/.
Other Web Archive Resources
• Rhizome: archiving for internet art, including interactive works
engaging with/critiquing social media: http://rhizome.org/
• Note: EDINA are currently working on an archiving tool for
researchers, ask me for more info on Site2Cite.
Using APIs to obtain Data
• APIs (Application Programming Interfaces) exist for most
social media sites and allow direct requests for data.
• Some unofficial APIs exist for sites without official/open
APIs. Use only with caution as these frequently have
privacy, security or legal issues.
• Consider working with text and data mining colleagues, or
developers, to seek additional ways to capture data such
as:
– Screen scraping (automated capture of pages from a user
perspective).
– Mobile data collection or data capture approaches to social
media.
– Internet archiving approaches using standard tools or code
libraries
Glossary: Data Request Terms
• API: Application Programming Interface – a way to request data from a web service.
• REST or RESTful API: REST stands for “Representational State Transfer” and means an API that uses
HTTP (the protocol for accessing websites) requests (or “calls”) to:
– GET – read access to content such as posts, users, etc. This is the main request you would use to retrieve
data.
– PUT – update or replace data.
– POST – create new data (such as a post to a blog, a wiki page, etc.).
– DELETE – Delete content.
• An API Endpoint – is essentially the way to address and structure what kind of request you are
making. E.g. home_timeline vs user_timeline. Each endpoint provides a different entry to the data
behind a web service.
• In a REST GET request you may have:
– Fields – the various fields of data you want to retrieve, e.g. link, message, post, etc. These are usually shown
in the Developer Documentation.
– Modifiers or Parameters - these act like filters, limiting the request in a specific way, e.g. only retrieving
posts with a location attached.
– Operators – are the various standard terms/labels for content and content types that you can use in your
GET request to shape and customise it, for instance this might include “retweets_of” or “bio” or “has:links”
etc.
• Other types of APIs and M2M (Machine-to-Machine) interfaces exist including “SOAP” and “RPC”.
• SDK is Standard Developer Kit and is used increasingly often as a way to package various requests
for developers to use in web or mobile apps (SDKs has been used as a term for the coding tools for
smartphone platforms iOS and Android for years).
Locating or Requesting
Social Media Data
ProgrammableWeb (https://www.programmableweb.com/) is a great source for API
information for social media sites:
• Instagram Developer: https://www.instagram.com/developer/
– API Endpoints: https://www.instagram.com/developer/endpoints/
• Twitter Developer: https://developer.twitter.com/
– APIs: https://developer.twitter.com/en/docs
– GNIP: http://support.gnip.com/apis/ - premium "Firehose" access. See also Twitter
Enterprise: https://developer.twitter.com/en/enterprise
– Free APIs cover 7 days tweets; Premium APIs exist for 30-day search and full archive search.
– Facebook for Developers: https://developers.facebook.com/
– API (Graph API): https://developers.facebook.com/docs/graph-api/
• YouTube Developers: https://developers.google.com/youtube/
– APIs (Comments and Comment Threads particularly useful):
https://developers.google.com/youtube/v3/docs/
• Weibo API: http://open.weibo.com/wiki/API%E6%96%87%E6%A1%A3/en
How do you make an API call?
• For open RESTful APIs you can enter an HTTP request in any browser
window, e.g. http://services.groupkt.com/state/get/USA/all
• Most social media APIs now require you to register your app, request a key
from them and for you to include the access tokens in your request.
• In general API calls are made from within a small programme – this might
be running on your machine or from a browser based coding tool.
• Lots of existing tools based on social media APIs exist – see later slide for a
sample of these.
• Try it out:
– Codecademy Twitter API tutorial:
https://www.codecademy.com/en/tracks/twitter
An API Endpoint is a bit
like a vending machine…
Vending machine priced by grams of fat, Google, San
Jose, California.jpg by Flickr user Cory Doctorrow.
You have to use the right machine to get hold of the item
you want, then you have to enter the right code and the
right price to get your candy.
• Each item has a name, and a standard way to access it
(in a vending machine this is the item code).
• Each item has a value (in a vending machine this is the
delicious edible contents of each item).
• Each item requires some sort of trust exchange before
you can access it (in a vending machine this is cash).
• In an API that “E12” item code is actually going to look
more like:
https://api.twitter.com/1.1/statuses/user_timeline.jso
n?screen_name=twitterapi&count=2
• In an API the price is usually a unique key/access
token that is unique to you and your app – that
indicates a legitimate request and who it’s from.
Bonus: In APIs there is usually a huge range of data
(research candy) to ask for, and lots of filtering options.
What will you get back from
an API GET request?
Assuming it has worked correctly, something like this…
See the full example at: https://developer.twitter.com/en/docs/tweets/timelines/api-
reference/get-statuses-user_timeline.html
Each of these is a new field for a single
tweet and it’s value.
[] is an empty field (e.g. no hashtag on this
tweet).
This data can then be processed by your app, or simply
retrieved and stored in a database or spreadsheet…
Recommended Tool:
Martin Hawksey’s TAGS
• Uses Google Docs to capture tweets based on a hashtag, search term, user, etc.
• Can be automated to allow rolling capture.
• Useful for capturing a sample of long term community dialogues or public
discourse where Top Tweets/7 day limits will be acceptable.
• Includes spreadsheet; visualisation; searchable archive - latter two options are
only available if you make data (semi) public.
• Uses Twitter API – takes “Top” rather than “Latest” tweets so accuracy depends on
popularity of content/hashtags.
• Well documented and supported by Martin.
• A great way to dip your toe in the API water – you have to obtain a key the first
time you run TAGs, and can access and look at the code it runs. You can also make
more advanced use of the tool and automation connecting it to other
visualisations and analysis tools.
• See: https://tags.hawksey.info/
• Support: https://tags.hawksey.info/forums/
Question/Discussion
Do you have any experience or
recommendations for social media data
collection tools or approaches?
Have you attended one of the Digital Scholarship
sessions where CAHSS researchers can meet
with developers and data specialists?
[Recommended!]
Analysis & Visualisation
Further information, tutorials etc. online and/or running through Digital Scholarship and Schools Research Methods training.
• Nvivo (http://www.qsrinternational.com/nvivo/) – Premium qualitative data analysis software with social media and multimedia
support, collaborative working also supported. Feature rich. Training available. Available through UoE/CAHSS license:
https://www.ed.ac.uk/information-services/computing/desktop-personal/software/main-software-deals/nvivo.
• IBM SPSS (https://www.ibm.com/analytics/us/en/technology/spss/) – Premium data analysis tool for surveys and particularly for
quantitative data, widely used in social sciences. Available through UoE license: https://www.ed.ac.uk/information-
services/computing/desktop-personal/software/main-software-deals/spss.
• Dedoose (http://www.dedoose.com/) – Premium qualitative data analysis software with simple interface, tagging, annotation and
exploration options.
• Chorus (http://chorusanalytics.co.uk/) – Free software for data harvesting and analytics for social science research using Twitter data
• Gephi (https://gephi.org/) – Visualisation and exploration of multiple data types, particularly good for network analysis. Feature rich so
a bit of a learning curve. Free download.
• D3 visualisation libraries (see: https://github.com/d3/d3/wiki/gallery) – Free collection of Javascript libraries for use in data
visualisation and exploration of multiple data types.
• NodeXL (https://nodexl.codeplex.com/) – Free network visualisation tool for Excel. Free.
• TAGS Explorer (https://tags.hawksey.info/) – Twitter only visualisations of networks (using NodeXL) and searchable timeline archive
explorations. Free.
• Textal (http://www.textal.org/) – Text analysis tools for mobile use with Twitter streams, websites (inc. blogs), and documents. Free.
• Tableau (https://www.tableau.com/) – Visualisation of multiple data sources and types. Free trial, otherwise monthly subscription.
A large quantity of open source tools and software are available. Search for these or look at the Journal of Open Research Software
(https://openresearchsoftware.metajnl.com/) or the Journal of Open Source Software (http://joss.theoj.org/) for well documented research-
driven examples. See also Tony Hirst’s OU Useful Blog (https://blog.ouseful.info/) for visualisation approaches. There are also many marketing
packages for social media analysis which could be used/adapted for research where their processes are well documented.
Appropriate Handling & Storage
• Data is usually returned with unique identifiers that can be easily
traced back to the original poster/subject.
• The unique identifiers connect conversations and posts so are hard
to strip away entirely – although you could try a one-way hash of
the data to mask the identifiable information but retain
connections.
• Short posts and tweets are highly identifiable. Try Googling or
searching Twitter for a recent tweet to see that in action.
• Images and videos can also be relatively easily compared/reverse
image searched and therefore identifiable.
• Think about which fields you actually need to retain for your
research question(s).
• Plan how long you will keep your data, and how you will keep it
secure - where and how you store your data really matters.
Data Protection & GDPR
• Be aware of current Data Protection (Data Protection Act 1998) guidance
on the use, storage and retention of personal data.
• From 25th May 2018 the General Data Protection Regulation (GDPR)
comes into effect with:
– Increased rights for individuals to understand the use, access, rectification, erasure,
rights to restrict processing, portability, and rights to object to the use of their data.
– Increased legal measures for organisations breaching GDPR guidance.
• Ensure your Consent process, your Research Data Management plans, and
your use, access and disposal of data is compliant.
• By default social media APIs provide a lot of data:
– What is the minimum data you require?
– Removing unneeded data at the point of collection and/or data cleaning will
help reduce any risks of exposure or non-compliance with data protection
legislation.
See:
• Data Protection Act 1998: https://www.legislation.gov.uk/ukpga/1998/29/contents
• ICO guidance: https://ico.org.uk/for-organisations/data-protection-reform/overview-of-the-gdpr/
Local Support
• Research Data Mantra – self-led course on Research Data Management,
including appropriate handling, storage and planning for onward
preservation, sharing or destruction: http://mantra.edina.ac.uk/
• Data Store – secure storage for active research data, available to all staff
and PGR students: https://www.ed.ac.uk/information-services/research-
support/research-data-service/working-with-data/data-storage
• Working with Sensitive Data – guidance and further resources on working
with sensitive and personal data: https://www.ed.ac.uk/information-
services/research-support/research-data-service/working-with-
data/sensitive-data
• Information Security Team – guidance on legal and technical approaches
to keeping data secure and appropriately encrypted and disposed of:
https://www.ed.ac.uk/infosec
Making Research Data Open
• If you have a consent process in place, ensure you request consent for any onward use you
expect to make of your data. And ensure there is a process to withdraw consent for onward.
• Beware verbatim quoting in publications – it can be easy to search back to the original text.
– Public figures who would consider their social media content a publication and part of their profile
(e.g. politicians) are more appropriate to quote, where needed.
– Even if anonymous/not attributed it is safer to paraphrase short comments where possible to make
reverse searching more challenging.
• Screenshots of posts often reveal the subject name, image, location, and their contacts. Only
use these where appropriate, properly consented to, and where you are not placing your
subjects at risk.
• Consider the timelag between data collection and any publication. Is your consent from
participants still valid if a year has passed? What about 2 years? Or 5 years? A teen
participant may feel differently about data being exposed when they are, for instance, a
newly qualified lawyer or medic with very different reputational considerations.
See also: University of North Carolina at Chapel Hill and UoE Research Data Management and Sharing
(Coursera): https://www.coursera.org/learn/data-management
Courses and Information
• DMI Digital Methods online course:
https://wiki.digitalmethods.net/Digitalmethods/WebHome
• UCL Why We Post: the Anthropology of Social Media course (FutureLearn):
https://www.futurelearn.com/courses/anthropology-social-media
• QUT Social Media Analytics: Using Data to Understand Public
Conversations (FutureLearn):
https://www.futurelearn.com/courses/social-media-analytics
• Rutgers University Social Media Data Analytics (Coursera):
https://www.coursera.org/learn/social-media-data-analytics
• Doing Journalism with Data: First steps, skills and tools:
http://learno.net/courses/doing-journalism-with-data-first-steps-skills-
and-tools
• UoE Digital Footprint MOOC – understand some of the challenging
identity, privacy and ethical concerns around social media for you and
your research subjects: https://www.coursera.org/learn/digital-footprint/
Useful Niche Resources
• Utrecht Data School Data Ethics Decision Aid (DEDA):
https://dataschool.nl/research/deda/?lang=en
• Programming Historian Data Mining the Internet Archive lesson:
https://programminghistorian.org/lessons/data-mining-the-internet-archive
• Insight News Lab Social Network Analysis and Visualisation for #RDAPlenary 3
(using ScraperWiki and OpenRefine): http://hujo.deri.ie/rdaplenarysn/
• Tony Hirst First Baby Steps to Anonymising Data with Open Refine:
https://blog.ouseful.info/2015/01/23/anonymising-data-with-open-refine/
• Tony Hirst Social Interest Positioning – Visualising Facebook Friends’ Likes with
Data Grabbed Using Google Refine: https://blog.ouseful.info/2012/01/04/social-
interest-positioning-visualising-facebook-friends-likes/
• Tony Hirst Grabbing Twitter Search Results into Google Refine and Exporting
Conversations into Gephi  needs updating for new Twitter API:
https://blog.ouseful.info/2012/10/02/grabbing-twitter-search-results-into-google-
refine-and-exporting-conversations-into-gephi/
Local research and expertise
(a small sampling thereof!)
• Social media, Digital Ethnography, Sociological research methods– Kate Orton Johnsone (Sociology)
• Social Media, Digital Labour – Karen Gregory (Sociology).
• Communities on the Darknet, illicit markets and cultures – Angus Bancroft (Sociology).
• Social media in education; bots; anonymity in social media – Sian Bayne (Research in Digital
Education Centre, Moray House).
• Digital cultural heritage learning and engagement– Jen Ross (Research in Digital Education Centre,
Moray House); Claire Sowton (CAHSS); Melissa Terras (UCL/CAHSS).
• Text and data mining of social media content – Claire Grover (Informatics); Richard Tobin
(Informatics); Clare Llewellyn (Informatics; Neuropolitics Research, SPS).
• Sharing of photography, autobiographical memory and distributed cognition (inc. social media) –
Tim Fawns (Clinical Education, Centre for Medical Education, MVM).
• Big data (inc. social media) in healthcare – Mhairi Aitken (Usher Institute, MVM).
• Social media, Digital Footprint, blogging and Buddhism – Louise Connelly (Vet School).
• Mobility, mobile technology, formal and informal education communities around the world –
Michael Sean Gallagher (Research in Digital Education Centre, Moray House).
• Playful learning in informal digital environments – Clara O’Shea (Research in Digital Education
Centre, Moray House).
• Social media and politics – Neuropolitics Research group: Laura Cram (Politics, SPS); Robin Hill
(Informatics; SPS); Sujin Hong (SPS); Adam Moore (PPL).
• Visualisation of big data, including network analysis – Benjamin Bach (Design Informatics).
• Social Media and scholarly communities–Sara Shinton (IAD); James Stewart (SPS).
Recommended work
& groups researching in this area
• UoE Beyond Text Network – interdisciplinary network for social media and multimedia researchers:
https://www.wiki.ed.ac.uk/display/DIG/Beyond+Text
• UoE Informatics Language Technology Group – text mining expertise working on projects including topic modelling and
social media analysis: https://www.ltg.ed.ac.uk/
• Digital Methods Initiative (DMI) (European multi-organisation research group):
https://wiki.digitalmethods.net/Dmi/DmiAbout
• Microsoft Research Social Media Collective (US) – particularly danah boyd, Nancy Baym and Kate Crawford’s work:
https://www.microsoft.com/en-us/research/group/social-media-collective/
• #NSMNSS: New social media, new social science? - great blog reflecting on social science methods around social
media http://nsmnss.blogspot.co.uk/
• Oxford Internet Institute – particularly strong on relationships to mainstream media environment: https://www.oii.ox.ac.uk/
• Visual Social Media Lab (Sheffield) – led by Farida Vis: http://visualsocialmedialab.org/
• DocNow – social justice social media archiving: http://www.docnow.io/
• Data Driven Journalism (European Journalism Centre and Netherlands): http://datadrivenjournalism.net/
• Analysing Social Media Collaboration (UK cross-institution group, site now dormant) – responsible for the high profile
“Reading the Riots” Twitter analysis work in 2011: http://www.analysingsocialmedia.org/home
• Michael Zimmer – influential work on privacy, leading projects on privacy and Facebook: http://www.michaelzimmer.org/
• Electronic Freedom Foundation –advocates with expertise on privacy and tracking in social media: https://www.eff.org/
• Centre for Social Media Research (University of Westminster): https://www.westminster.ac.uk/social-media-research
• Digital Media and Society Research Group (Cardiff): https://www.cardiff.ac.uk/research/explore/research-units/digital-
media-and-society
• COSMOS (legacy page for Cardiff research group): http://www.cs.cf.ac.uk/cosmos/
Recommended Journals
• First Monday (University of Illinois at Chicago): http://firstmonday.org/index
• New Media & Society (Sage): http://journals.sagepub.com/home/nms
• Information, Communication & Society (Taylor & Francis):
http://tandfonline.com/toc/rics20/current
• Social Media + Society (Sage): http://journals.sagepub.com/home/sms
• Big Data & Society (Sage): http://journals.sagepub.com/home/bds
• Policy & Internet (Wiley):
http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)1944-2866
• Journal of Computer-Mediated Communication (Wiley):
http://onlinelibrary.wiley.com/journal/10.1111/(ISSN)1083-6101
• Cyberpsychology, Behaviour, and Social Networking (Mary Ann Liebert Inc.):
http://online.liebertpub.com/loi/CYBER
• Journal of Broadcasting & Electronic Media (Taylor & Francis):
http://www.tandfonline.com/toc/hbem20/current
Relevant Upcoming
Digital Scholarship Sessions
• Digital Research Clinics and Resources (26th October 2017)
• Cleaning Data with Open Refine (1st November 2017)
• Regex: Regular Expressions (23rd November 2017)
• Introduction to Sentiment Analysis: What it is and how to do it simply
(14th December 2017)
Look out for further sessions and/or contact the team with any specific
requests: http://www.digital.cahss.ed.ac.uk/
Questions & Discussion
Or follow up after today: nicola.osborne@ed.ac.uk

Weitere ähnliche Inhalte

Was ist angesagt?

Distributed Server
Distributed ServerDistributed Server
Distributed ServerRajan Kumar
 
Client server technology
Client server technologyClient server technology
Client server technologyAnwar Kamal
 
Building a Big Data Pipeline
Building a Big Data PipelineBuilding a Big Data Pipeline
Building a Big Data PipelineJesus Rodriguez
 
Iclaves: ¿Cuánto están dipuestos los usuarios a pagar por los contenidos?
Iclaves: ¿Cuánto están dipuestos los usuarios a pagar por los contenidos?Iclaves: ¿Cuánto están dipuestos los usuarios a pagar por los contenidos?
Iclaves: ¿Cuánto están dipuestos los usuarios a pagar por los contenidos?Iclaves SL
 
Slides: Knowledge Graphs vs. Property Graphs
Slides: Knowledge Graphs vs. Property GraphsSlides: Knowledge Graphs vs. Property Graphs
Slides: Knowledge Graphs vs. Property GraphsDATAVERSITY
 
Unified big data architecture
Unified big data architectureUnified big data architecture
Unified big data architectureDataWorks Summit
 
Introduction of Knowledge Graphs
Introduction of Knowledge GraphsIntroduction of Knowledge Graphs
Introduction of Knowledge GraphsJeff Z. Pan
 
VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...
VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...
VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...vtunotesbysree
 
Artificial Intelligence for Data Quality
Artificial Intelligence for Data QualityArtificial Intelligence for Data Quality
Artificial Intelligence for Data QualityVera Ekimenko
 
DAMA Australia: How to Choose a Data Management Tool
DAMA Australia: How to Choose a Data Management ToolDAMA Australia: How to Choose a Data Management Tool
DAMA Australia: How to Choose a Data Management ToolPrecisely
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notesMohit Saini
 
Cloud computing (IT-703) UNIT 1 & 2
Cloud computing (IT-703) UNIT 1 & 2Cloud computing (IT-703) UNIT 1 & 2
Cloud computing (IT-703) UNIT 1 & 2Jitendra s Rathore
 
Governance and Architecture in Data Integration
Governance and Architecture in Data IntegrationGovernance and Architecture in Data Integration
Governance and Architecture in Data IntegrationAnalytiX DS
 
Education data mining presentation
Education data mining presentationEducation data mining presentation
Education data mining presentationNishabhanot1
 
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data Science
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data ScienceGet Started with the Most Advanced Edition Yet of Neo4j Graph Data Science
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data ScienceNeo4j
 
Ppt for Application of big data
Ppt for Application of big dataPpt for Application of big data
Ppt for Application of big dataPrashant Sharma
 

Was ist angesagt? (20)

Distributed Server
Distributed ServerDistributed Server
Distributed Server
 
The Nature of Data
The Nature of DataThe Nature of Data
The Nature of Data
 
Client server technology
Client server technologyClient server technology
Client server technology
 
Distributed Computing and Big Data
Distributed Computing and Big DataDistributed Computing and Big Data
Distributed Computing and Big Data
 
Building a Big Data Pipeline
Building a Big Data PipelineBuilding a Big Data Pipeline
Building a Big Data Pipeline
 
Iclaves: ¿Cuánto están dipuestos los usuarios a pagar por los contenidos?
Iclaves: ¿Cuánto están dipuestos los usuarios a pagar por los contenidos?Iclaves: ¿Cuánto están dipuestos los usuarios a pagar por los contenidos?
Iclaves: ¿Cuánto están dipuestos los usuarios a pagar por los contenidos?
 
Slides: Knowledge Graphs vs. Property Graphs
Slides: Knowledge Graphs vs. Property GraphsSlides: Knowledge Graphs vs. Property Graphs
Slides: Knowledge Graphs vs. Property Graphs
 
Unified big data architecture
Unified big data architectureUnified big data architecture
Unified big data architecture
 
Introduction of Knowledge Graphs
Introduction of Knowledge GraphsIntroduction of Knowledge Graphs
Introduction of Knowledge Graphs
 
VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...
VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...
VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...
 
Artificial Intelligence for Data Quality
Artificial Intelligence for Data QualityArtificial Intelligence for Data Quality
Artificial Intelligence for Data Quality
 
DAMA Australia: How to Choose a Data Management Tool
DAMA Australia: How to Choose a Data Management ToolDAMA Australia: How to Choose a Data Management Tool
DAMA Australia: How to Choose a Data Management Tool
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notes
 
Cloud computing (IT-703) UNIT 1 & 2
Cloud computing (IT-703) UNIT 1 & 2Cloud computing (IT-703) UNIT 1 & 2
Cloud computing (IT-703) UNIT 1 & 2
 
Cloud Computing Architecture
Cloud Computing ArchitectureCloud Computing Architecture
Cloud Computing Architecture
 
Governance and Architecture in Data Integration
Governance and Architecture in Data IntegrationGovernance and Architecture in Data Integration
Governance and Architecture in Data Integration
 
Education data mining presentation
Education data mining presentationEducation data mining presentation
Education data mining presentation
 
Social media with big data analytics
Social media with big data analyticsSocial media with big data analytics
Social media with big data analytics
 
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data Science
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data ScienceGet Started with the Most Advanced Edition Yet of Neo4j Graph Data Science
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data Science
 
Ppt for Application of big data
Ppt for Application of big dataPpt for Application of big data
Ppt for Application of big data
 

Ähnlich wie Working with Social Media Data: Ethics & good practice around collecting, using and storing data

Researching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisResearching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisFarida Vis
 
Blurring the Boundaries? Ethical challenges in using social media for social...
Blurring the Boundaries? Ethical challenges in using social media for social...Blurring the Boundaries? Ethical challenges in using social media for social...
Blurring the Boundaries? Ethical challenges in using social media for social...Kandy Woodfield
 
Social Media, Social Science and Research Ethics
Social Media, Social Science and Research EthicsSocial Media, Social Science and Research Ethics
Social Media, Social Science and Research EthicsTheSRAOrg
 
Lecture series: Using trace data or subjective data, that is the question dur...
Lecture series: Using trace data or subjective data, that is the question dur...Lecture series: Using trace data or subjective data, that is the question dur...
Lecture series: Using trace data or subjective data, that is the question dur...Bart Rienties
 
Sdal air education workforce analytics workshop jan. 7 , 2014.pptx
Sdal air education workforce analytics workshop jan. 7 , 2014.pptxSdal air education workforce analytics workshop jan. 7 , 2014.pptx
Sdal air education workforce analytics workshop jan. 7 , 2014.pptxkimlyman
 
Networked Scholars, or, Why on earth do academics use social media and why ...
Networked Scholars, or, Why on earth do academics use social media and why ...Networked Scholars, or, Why on earth do academics use social media and why ...
Networked Scholars, or, Why on earth do academics use social media and why ...George Veletsianos
 
Social Networking, Online Communities & Research - WCHRI Rounds
Social Networking, Online Communities & Research - WCHRI RoundsSocial Networking, Online Communities & Research - WCHRI Rounds
Social Networking, Online Communities & Research - WCHRI RoundsColleen Young
 
Ethical Challenges of Using Social Media Data In Research
Ethical Challenges of Using Social Media Data In Research Ethical Challenges of Using Social Media Data In Research
Ethical Challenges of Using Social Media Data In Research Dr Wasim Ahmed
 
Social Networking, Online Communities and Clinical Research
Social Networking, Online Communities and Clinical ResearchSocial Networking, Online Communities and Clinical Research
Social Networking, Online Communities and Clinical ResearchColleen Young
 
Ethical challenges for learning analytics
Ethical challenges for learning analyticsEthical challenges for learning analytics
Ethical challenges for learning analyticsRebecca Ferguson
 
Access to and use of Web 2.0 and social media applications within the NHS in ...
Access to and use of Web 2.0 and social media applications within the NHS in ...Access to and use of Web 2.0 and social media applications within the NHS in ...
Access to and use of Web 2.0 and social media applications within the NHS in ...ifuturesconf
 
Social media applications within the NHS: role and impact of organisational c...
Social media applications within the NHS: role and impact of organisational c...Social media applications within the NHS: role and impact of organisational c...
Social media applications within the NHS: role and impact of organisational c...Catherine Ebenezer
 
Stepping stones to ‘big data’: supporting quantitative methods teaching with ...
Stepping stones to ‘big data’: supporting quantitative methods teaching with ...Stepping stones to ‘big data’: supporting quantitative methods teaching with ...
Stepping stones to ‘big data’: supporting quantitative methods teaching with ...The Higher Education Academy
 
Mapping Movements: Social movement research and big data: critiques and alter...
Mapping Movements: Social movement research and big data: critiques and alter...Mapping Movements: Social movement research and big data: critiques and alter...
Mapping Movements: Social movement research and big data: critiques and alter...Tim Highfield
 
Learning analytics at the intersections of student trust, disclosure and benefit
Learning analytics at the intersections of student trust, disclosure and benefitLearning analytics at the intersections of student trust, disclosure and benefit
Learning analytics at the intersections of student trust, disclosure and benefitUniversity of South Africa (Unisa)
 
The Hidden Data of Social Media Rearch_CSS-winter-symposium
The Hidden Data of Social Media Rearch_CSS-winter-symposiumThe Hidden Data of Social Media Rearch_CSS-winter-symposium
The Hidden Data of Social Media Rearch_CSS-winter-symposiumKatrin Weller
 
Internet Research Ethics CSSWS2015 Tutorial
Internet Research Ethics CSSWS2015 TutorialInternet Research Ethics CSSWS2015 Tutorial
Internet Research Ethics CSSWS2015 TutorialKa_Kinder
 
Griffiths lace workshop-eden-2016
Griffiths lace workshop-eden-2016Griffiths lace workshop-eden-2016
Griffiths lace workshop-eden-2016Dai Griffiths
 
Requirements for Learning Analytics
Requirements for Learning AnalyticsRequirements for Learning Analytics
Requirements for Learning AnalyticsTore Hoel
 

Ähnlich wie Working with Social Media Data: Ethics & good practice around collecting, using and storing data (20)

Researching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisResearching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media Analysis
 
Blurring the Boundaries? Ethical challenges in using social media for social...
Blurring the Boundaries? Ethical challenges in using social media for social...Blurring the Boundaries? Ethical challenges in using social media for social...
Blurring the Boundaries? Ethical challenges in using social media for social...
 
Social Media, Social Science and Research Ethics
Social Media, Social Science and Research EthicsSocial Media, Social Science and Research Ethics
Social Media, Social Science and Research Ethics
 
Lecture series: Using trace data or subjective data, that is the question dur...
Lecture series: Using trace data or subjective data, that is the question dur...Lecture series: Using trace data or subjective data, that is the question dur...
Lecture series: Using trace data or subjective data, that is the question dur...
 
Sdal air education workforce analytics workshop jan. 7 , 2014.pptx
Sdal air education workforce analytics workshop jan. 7 , 2014.pptxSdal air education workforce analytics workshop jan. 7 , 2014.pptx
Sdal air education workforce analytics workshop jan. 7 , 2014.pptx
 
#AcAdvOnline Webinar
#AcAdvOnline Webinar#AcAdvOnline Webinar
#AcAdvOnline Webinar
 
Networked Scholars, or, Why on earth do academics use social media and why ...
Networked Scholars, or, Why on earth do academics use social media and why ...Networked Scholars, or, Why on earth do academics use social media and why ...
Networked Scholars, or, Why on earth do academics use social media and why ...
 
Social Networking, Online Communities & Research - WCHRI Rounds
Social Networking, Online Communities & Research - WCHRI RoundsSocial Networking, Online Communities & Research - WCHRI Rounds
Social Networking, Online Communities & Research - WCHRI Rounds
 
Ethical Challenges of Using Social Media Data In Research
Ethical Challenges of Using Social Media Data In Research Ethical Challenges of Using Social Media Data In Research
Ethical Challenges of Using Social Media Data In Research
 
Social Networking, Online Communities and Clinical Research
Social Networking, Online Communities and Clinical ResearchSocial Networking, Online Communities and Clinical Research
Social Networking, Online Communities and Clinical Research
 
Ethical challenges for learning analytics
Ethical challenges for learning analyticsEthical challenges for learning analytics
Ethical challenges for learning analytics
 
Access to and use of Web 2.0 and social media applications within the NHS in ...
Access to and use of Web 2.0 and social media applications within the NHS in ...Access to and use of Web 2.0 and social media applications within the NHS in ...
Access to and use of Web 2.0 and social media applications within the NHS in ...
 
Social media applications within the NHS: role and impact of organisational c...
Social media applications within the NHS: role and impact of organisational c...Social media applications within the NHS: role and impact of organisational c...
Social media applications within the NHS: role and impact of organisational c...
 
Stepping stones to ‘big data’: supporting quantitative methods teaching with ...
Stepping stones to ‘big data’: supporting quantitative methods teaching with ...Stepping stones to ‘big data’: supporting quantitative methods teaching with ...
Stepping stones to ‘big data’: supporting quantitative methods teaching with ...
 
Mapping Movements: Social movement research and big data: critiques and alter...
Mapping Movements: Social movement research and big data: critiques and alter...Mapping Movements: Social movement research and big data: critiques and alter...
Mapping Movements: Social movement research and big data: critiques and alter...
 
Learning analytics at the intersections of student trust, disclosure and benefit
Learning analytics at the intersections of student trust, disclosure and benefitLearning analytics at the intersections of student trust, disclosure and benefit
Learning analytics at the intersections of student trust, disclosure and benefit
 
The Hidden Data of Social Media Rearch_CSS-winter-symposium
The Hidden Data of Social Media Rearch_CSS-winter-symposiumThe Hidden Data of Social Media Rearch_CSS-winter-symposium
The Hidden Data of Social Media Rearch_CSS-winter-symposium
 
Internet Research Ethics CSSWS2015 Tutorial
Internet Research Ethics CSSWS2015 TutorialInternet Research Ethics CSSWS2015 Tutorial
Internet Research Ethics CSSWS2015 Tutorial
 
Griffiths lace workshop-eden-2016
Griffiths lace workshop-eden-2016Griffiths lace workshop-eden-2016
Griffiths lace workshop-eden-2016
 
Requirements for Learning Analytics
Requirements for Learning AnalyticsRequirements for Learning Analytics
Requirements for Learning Analytics
 

Mehr von Nicola Osborne

Curating an Effective Digital Research Presence - Nicola Osborne, EDINA
Curating an Effective Digital Research Presence - Nicola Osborne, EDINACurating an Effective Digital Research Presence - Nicola Osborne, EDINA
Curating an Effective Digital Research Presence - Nicola Osborne, EDINANicola Osborne
 
Managing Your Digital Footprint presentation for #5Rights Youth Leadership Gr...
Managing Your Digital Footprint presentation for #5Rights Youth Leadership Gr...Managing Your Digital Footprint presentation for #5Rights Youth Leadership Gr...
Managing Your Digital Footprint presentation for #5Rights Youth Leadership Gr...Nicola Osborne
 
Enhancing your research impact through social media
Enhancing your research impact through social mediaEnhancing your research impact through social media
Enhancing your research impact through social mediaNicola Osborne
 
The Digital Footprint MOOC: A Free online course and resources encouraging cr...
The Digital Footprint MOOC: A Free online course and resources encouraging cr...The Digital Footprint MOOC: A Free online course and resources encouraging cr...
The Digital Footprint MOOC: A Free online course and resources encouraging cr...Nicola Osborne
 
Cataloguing Your Friends and Neighbours: Personal Metadata and the Opportunit...
Cataloguing Your Friends and Neighbours: Personal Metadata and the Opportunit...Cataloguing Your Friends and Neighbours: Personal Metadata and the Opportunit...
Cataloguing Your Friends and Neighbours: Personal Metadata and the Opportunit...Nicola Osborne
 

Mehr von Nicola Osborne (6)

Curating an Effective Digital Research Presence - Nicola Osborne, EDINA
Curating an Effective Digital Research Presence - Nicola Osborne, EDINACurating an Effective Digital Research Presence - Nicola Osborne, EDINA
Curating an Effective Digital Research Presence - Nicola Osborne, EDINA
 
Managing Your Digital Footprint presentation for #5Rights Youth Leadership Gr...
Managing Your Digital Footprint presentation for #5Rights Youth Leadership Gr...Managing Your Digital Footprint presentation for #5Rights Youth Leadership Gr...
Managing Your Digital Footprint presentation for #5Rights Youth Leadership Gr...
 
Enhancing your research impact through social media
Enhancing your research impact through social mediaEnhancing your research impact through social media
Enhancing your research impact through social media
 
The Digital Footprint MOOC: A Free online course and resources encouraging cr...
The Digital Footprint MOOC: A Free online course and resources encouraging cr...The Digital Footprint MOOC: A Free online course and resources encouraging cr...
The Digital Footprint MOOC: A Free online course and resources encouraging cr...
 
Edina and Second Life
Edina and Second LifeEdina and Second Life
Edina and Second Life
 
Cataloguing Your Friends and Neighbours: Personal Metadata and the Opportunit...
Cataloguing Your Friends and Neighbours: Personal Metadata and the Opportunit...Cataloguing Your Friends and Neighbours: Personal Metadata and the Opportunit...
Cataloguing Your Friends and Neighbours: Personal Metadata and the Opportunit...
 

Kürzlich hochgeladen

Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...RKavithamani
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 

Kürzlich hochgeladen (20)

Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 

Working with Social Media Data: Ethics & good practice around collecting, using and storing data

  • 1. Working with Social Media Data: Ethics & Good Practice around Collecting, Using and Storing Data Nicola Osborne Digital Education Manager, EDINA Nicola.osborne@ed.ac.uk @suchprettyeyes
  • 2. Introductions: my social media work • Digital Education Manager at EDINA, University of Edinburgh. • Work on EDINA’s educational technology, innovation, digital and data projects for audiences across Scotland, UK and further afield. • Co-I on: PTAS-funded Managing Your Digital Footprints research strand (2014- 2015); Ongoing (2015-) Managing Your Digital Footprint research team; PTAS- funded “A Live Pulse”: Yik Yak for understanding teaching, learning and assessment at Edinburgh project. • Co-tutor on ongoing Digital Footprint MOOC (2017-) • Previously EDINA Social Media Officer (2009-2015), providing expertise and advice on social media to colleagues across UoE for over 8 years. http://edina.ac.uk/
  • 3. Introduction: you and your work 1. Who are you? 2. What social media related research are you working on or hoping to work on? 3. What do you hope to get out of today’s session?
  • 4. Overview • Introduction & Design Considerations – Approach – Data accuracy • Ethical Considerations – Recommended ethical guidance – Terms & Conditions – and impact on Data – Consent and trust • Practical Considerations – Existing data sets – Available data tools – APIS – Options for analysis and visualisation • Storing and handling Data – Compliance with legal requirements – Sources of support • Recommended researchers, groups, and resources. • Q&A/Discussion – but questions welcome throughout!
  • 5. Where to start… • What is your research question(s)? • Are social media or social media communities the subject, or core to the subject? • Or, is it the space for recruitment or reaching an audience? • Or, is it just a convenient space for data collection?
  • 6. The Elephant (Blue Bird) in the Room Image ©Twitter.com 2012
  • 7. Research Design Considerations • Research approach to be taken • Appropriate data types to support your research – Streaming/live data OR – Archived / capture of data over time with asynchronous analysis • Ethical considerations • Consent process of subjects and their network • Etiquette considerations • Platform(s) to be used – Fit with target subjects – Terms & Conditions • Practical access limitations e.g. – Do tools for data capture exist? – Does an API exist? – What are the API limitations? – Costs of access • Your (researcher) or RAs expertise. • Long term research vision – do you have rights to use and reuse data in the ways you hope to?
  • 8. Possible Methods & Questions to Think About • Computational (See also Batrinca and Treleaven 2015): – Data access through APIs, screen scraping, established methods (e.g. DMI tools)? – Text and data mining and/or Natural Language Processing (NLP)? – Social network analysis and/or Actor Network Theory (ANT) analysis using nodes and edges in the network? – Sentiment analysis based on text mining/NLP or based on presence/absence of emojis and/or visual content? – Visual analysis and/or video or audio analysis for multimedia content? • Quantitative (See also OII 2013a, b & c): – Medium or large scale data? – Automated or survey/volunteered data collection? – Data cleansing process – how will you ensure that you have a good quality data set? – What kind of statistical analysis do you want to take? Tools might include SPSS, NVIVO, Gephi, Tableu, etc. – Will you be comparing to existing data sets and/or undertaking trend analysis over time? – What standard tools in your field – for digital or non digital data – can you use to collect or interpret your data? • Qualitative: – Manual collection? – Ethnographic approaches and/or participant observation – Focus groups or similar? – Critical/reflexive reading and coding of texts/content Batrinca, B. and Treleaven, P.C., 2015. Social Media Analytics: a survey of techniques, tools and platforms. In AI & Society, 30 (1). Pp. 89-116. https://doi.org/10.1007/s00146-014-0549-4 Oxford Internet Institute, 2013a. Quantitative Methods in Social Media Research: Big Data. In OII YouTube Channel, 15 March 2013. See: https://www.youtube.com/watch?v=6hmjj7_1sSY Oxford Internet Institute, 2013b. Quantitative Methods in Social Media Research: Populations and Sampling. In OII YouTube Channel, 15 March 2013. See: https://www.youtube.com/watch?v=6hmjj7_1sSY Oxford Internet Institute, 2013c. Space-Time as a Sampling Condition for New Media Research. In OII YouTube Channel, 15 March 2013. See: https://www.youtube.com/watch?v=HNxn0PqOc8k
  • 9. Is Social Media Data Representative? • Not all people use social media (and some of the least privileged groups in society are not online at all). • Most social media data collection methods favour English language data in mainstream US/Global sites. It is unusual to see multilingual research or research that acknowledges use of content including non-English text by primarily English speakers. • Privacy settings and publicness tend to reflect status and privilege. Accessing at-risk, vulnerable, heavily trolled, and/or niche interest groups is more difficult than obtaining public posts from middle class white male social media users. BAME communities, women’s groups, LGBTQ+ communities, etc. tend to make higher use of private groups, group moderation, and protective measures that require more qualitative and overt consent-based approaches. • Not all social media users are active. There is an “activity and agency bias” (Lutz and Hoffman 2017) in much of the current research. Obtaining data on passive reads and engagement with content is extremely difficult through quantitative methods. It may be easier with participant observation. Lutz, C. and Hoffman, C. P. 2017. The dark side of online participation: exploring non-passive and negative participation. In Information, Communication & Society: AoIR Special Issue, 20 (6), pp. 876-897. http://dx.doi.org/10.1080/1369118X.2017.1293129
  • 10. Question/Discussion Which platform(s) are you intending to/are you working with? How did you select these social media spaces?
  • 11. Ethical Considerations • Visibility vs expectations of privacy: – Being “in public” is not consent to being researched, their imagined audience may be quite different. (see AoIR guidance, Marwick and boyd 2011) – Are you engaging with private or “public” figures – expectations over visibility will vary significantly. • How possible is it to obtain informed consent for work undertaken with your chosen social media platform? How can consent be withdrawn? • How will your data be collected and used? (Attributed vs Pseudonyms vs Anonymous). • What personal data is being used? Does it put anyone at risk? • What is the risk of accidental exposure or re-identification? Text snippets, quotes and images may all be easily searchable. • Public – or previously public – data can change in sensitivities over time. • How will you handle/remove/retain subsequently deleted content Marwick, A. and boyd, d., 2011. I tweet honestly, I tweet passionately: Twitter users, context collapse, and the imagined audience. In new Media & Society, 12 (1), pp. 144-133. DOI: 10.1177/1461444810365313.
  • 12. Recommended: AoIR Ethics Guidance • AoIR Ethics Guidance (2012): https://aoir.org/reports/ethics2.pdf • AoIR Ethics Chart – a quick guide to key issues: https://aoir.org/aoir_ethics_graphic_2 016/ • AoIR Ethics Guidance (2002): https://aoir.org/reports/ethics.pdf • Annette Markham (co-author of AoIR guidance) on Impact Models for ethical decision making in data research and design: https://annettemarkham.com/2017/0 7/impact-model-ethics/
  • 13. Recommended: Social Media Research: A Guide to Ethics • Excellent concise research ethics guidance from the ESRC- funded “Social Media, Privacy and Risk: Towards More Ethical Research Methodologies” project at University of Aberdeen. • Includes pointers to further social media ethics resources. • Townsend, L. and Wallace, C. 2016. Social Media Research: A Guide to Ethics. Aberdeen: University of Aberdeen/ESRC Social Media Enhancement project. Available from: http://www.dotrural.ac.uk/soc ial-media-research-ethics/
  • 14. “But the data is already public” In 2008 researchers released profile data (The T3 Data Set) from Facebook accounts of students at a US University, inadvertently making identifiable data public, as reported in Zimmer (2010). In this case the researchers: • Had employed RAs who were part of the Network being examined and had (various levels of) access to more information than a non-logged-in user of Facebook/user beyond the Network. • Had funding that mandated open publishing and sharing of results. • Had University but not individuals consent for data collection • Combined Facebook with university housing data in their data sets • Obscured the identity of the university where students were based, but described key characterstics • Attempted to make all data anonymous by removing identifying information (name, student id, etc.) but left network and behavioural information intact. • Asked other researchers using the data not to attempt to reidentify subjects. • Stated that “hackers” and “extreme effort” would be the only way to “crack” the data. The university was identified swiftly based purely on the codebook and other writings about the data – but not requiring direct access to the data. Once the university was identified, other specific identifying data (nationality, race, home state, etc.), sometimes with only 1 individual in these groups, made re-identification of (some) students simple. After public scrutiny and identification of the university, the data set was swiftly withdrawn by the researchers. Zimmer, M. 2010. “But the data is already public”: on the ethics of research in Facebook. In Ethics and Information Technology, 12 (4), December 2010, pp. 313-325. https://link.springer.com/article/10.1007%2Fs10676-010-9227-5
  • 15. Terms & Conditions • Before undertaking any social media research understand the T&Cs and Developer T&Cs for the platform(s) you are looking at. • Understand how your research aligns with the T&Cs, and any possible issues of privacy, etiquette, or practical access. • If your work is in conflict with T&Cs either re-design your research (strongly recommended) or look carefully at risks and impacts. • You should not ignore any T&Cs for technical reasons. If there is a valid reason to ignore T&Cs for specific research reasons (such as research on deleted tweets), be prepared to justify that to ethics boards and peer reviewers. And understand that you may risk losing access to the platform and your research data if you are found to be in breach of T&Cs.
  • 16. Twitter Developer T&Cs of note (1) Section VII (Other Important Terms), A: User Protection: "Twitter Content, and information derived from Twitter Content, may not be used by, or knowingly displayed, distributed, or otherwise made available to:"… "any entity for the purposes of conducting or providing surveillance, analyses or research that isolates a group of individuals or any single individual for any unlawful or discriminatory purpose or in a manner that would be inconsistent with our users' reasonable expectations of privacy;" https://developer.twitter.com/en/developer-terms/agreement-and-policy
  • 17. Twitter Developer T&Cs of note (1) Section VII (Other Important Terms), C: Respect Users' Control and Privacy: "3. If Content is deleted, gains protected status, or is otherwise suspended, withheld, modified, or removed from the Twitter Service (including removal of location information), you will make all reasonable efforts to delete or modify such Content (as applicable) as soon as reasonably possible, and in any case within 24 hours after a request to do so by Twitter or by a Twitter user with regard to their Content." https://developer.twitter.com/en/developer-terms/agreement-and-policy
  • 18. Facebook Statement of Rights & Responsibilities Section 5: Protecting Other People's Rights "We respect other people's rights, and expect you to do the same. 1. You will not post content or take any action on Facebook that infringes or violates someone else's rights or otherwise violates the law. 2. We can remove any content or information you post on Facebook if we believe that it violates this Statement or our policies. 3. We provide you with tools to help you protect your intellectual property rights. To learn more, visit our How to Report Claims of Intellectual Property Infringement page. 4. If we remove your content for infringing someone else's copyright, and you believe we removed it by mistake, we will provide you with an opportunity to appeal. 5. If you repeatedly infringe other people's intellectual property rights, we will disable your account when appropriate. 6. You will not use our copyrights or Trademarks or any confusingly similar marks, except as expressly permitted by our Brand Usage Guidelines or with our prior written permission. 7. If you collect information from users, you will: obtain their consent, make it clear you (and not Facebook) are the one collecting their information, and post a privacy policy explaining what information you collect and how you will use it. 8. You will not post anyone's identification documents or sensitive financial information on Facebook. 9. You will not tag users or send email invitations to non-users without their consent. Facebook offers social reporting tools to enable users to provide feedback about tagging." https://www.facebook.com/terms.php
  • 19. Trust in Social Networks vs Trust in Research Research Ethics – Randall Munroe/xkcd (https://xkcd.com/1390/) Licensed under CC-BY-NC 2.5 Trust in social media networks is mixed, with users increasingly savvy about data use… However… • Social Media users can find observation by academic researchers more disconcerting than by the companies who own the platforms. • Research, depending on the topic, can feel like a judgement on behaviours making consent hugely important. • The burden on researchers to be clear about motives, funders, process, etc. is higher than on commercial companies. • There are parallels here to how individuals feel about e.g. Tesco Clubcard or Credit Card data capture vs. surveys and censuses.
  • 20. Question/Discussion What are the ethical concerns and considerations for your current (or previous) social media research?
  • 21. Obtaining Consent • Consent may be implicitly included for API data access in some terms and conditions BUT, when did you last read the terms and conditions? What about your research participants? So: • Obtain explicit consent wherever possible. • Be transparent if you are engaging in research in a space – with a pinned post, link to your participant information sheet, etc. • Consent can be tricky in anonymous and less traditional social media spaces (see e.g. Osborne 2017 for approaches used with Yik Yak). • Apply particular caution to gaining consent for screen shots, attributed posts, reproducing exact images or text of posts etc. Osborne, N. 2017. Addressing ethics of research in anonymous online spaces. In “A Live Pulse”: Yik Yak for understanding teaching, learning and assessment at Edinburgh [blog], 13th July 2017. http://yikyakresearch.blogs.edina.ac.uk/2017/07/13/addressing-ethics-of-research-in-anonymous-online-spaces/
  • 22.
  • 23. Some Common Ethics Pitfalls • Researcher assumes public data can be used in any way desired, without considering the subject(s) intent when originally sharing their profile/post etc. • Researcher explores conveniently available “public” data without realising that privacy settings may make more information available to them, than is truly “public”. • Researcher is using “big” data under belief that individuals will not be identifiable (as in the “But the data is already public” case). • Research subject(s) has shared data on a public site but is not aware of their own settings, or has not checked them lately, making implicit consent and the public nature of the data problematic. Discovering that they have been included in published research may be upsetting and problematic. • Research Ethics Committees and/or Journal Editorial Boards are unaware or do not properly consider that social media data includes real names, pseudonyms, locations, highly disclosive data and do not ask the right questions around the consent process, collection, aggregation, storage and retention of data. • Researcher uses full text of a post as an “anonymous” example but this is then Googled which identifies the original post/tweet/content and individual.
  • 24. Data Considerations • What kind of research approach are you taking? • Who or what is the subject of your research – what is the right social media space to capture appropriate data? • What scale of data are you looking to collect/harvest? (If working with big data see boyd & Crawford 2012) • Will you be sampling or looking to collect all data over a specific time period? • How sensitive is the topic? • What level and type of consent can you obtain from participants? • What kind of content? – Profiles – for network analysis, image analysis, qualitative review of content through profile components/data? – Posts – through API/data feed/harvesting or observation? Textual, visual, multimedia? Manual coding or text/data mining? – Comments/discussion – contents or threads of discussion? – Metadata – tags, likes, engagements? • Time bounds – how long do you expect to collect data for? • What use will you make of the data after capture? boyd, d. and Crawford, K., 2012. Critical questions for big data. In Information, Communication & Society special issue: A decade in internet time: the dynamics of the internet and society, 15 (5). http://dx.doi.org/10.1080/1369118X.2012.678878
  • 25. Sources of baseline data on usage, access, trends, literacies etc. • Oxford Internet Surveys: biennial data on UK public use and attitudes to the internet, including social media: http://oxis.oii.ox.ac.uk/research/dataset-request/ • Ofcom research and data: Regular reporting on UK public use and attitudes to media, including internet and social media: https://www.ofcom.org.uk/research- and-data/search. Includes: – Annual adult media use and attitudes, and children’s media literacy reporting: https://www.ofcom.org.uk/research-and-data/media-literacy-research; – Communications Market Report: annual overview at consumer use of communications of all types: https://www.ofcom.org.uk/research-and-data/multi-sector-research/cmr – Further regular and one-off data via the statistical release calendar: https://www.ofcom.org.uk/research-and-data/data/statistics • Pew Internet & American Life datasets: data on US public use, knowledge and understanding of the web, digital literacy, social media, etc: http://www.pewinternet.org/datasets/. For example: – Social Media Update 2016: http://www.pewinternet.org/2016/11/11/social-media-update-2016/
  • 26. Sources of Official Social Media Usage Data, Trends, Financials, etc. Best sources are quarterly earnings reports and presentations, typically including: monthly active users, usage trends, earnings, monetization strategies, financials, future plans: • Facebook & Instagram & WhatsApp: https://investor.fb.com/home/default.aspx • Twitter: https://investor.twitterinc.com/results.cfm • SnapChat: https://investor.snap.com/events-and-presentations/events • YouTube/Google via Alphabet: https://abc.xyz/investor/ • Flickr: – Currently owned by Oath, should be via Verizon once deal closes: http://www.verizon.com/about/investors – Historical up to 2017, via Yahoo captures in the Internet Archive: https://web.archive.org/web/*/https://investor.yahoo.net/index.cfm • LinkedIn: – Current via Microsoft: https://www.microsoft.com/en-us/investor/ – Historical up to 2016: https://news.linkedin.com/topic/earnings • Weibo: http://ir.weibo.com/phoenix.zhtml?c=253076&p=irol-irhome
  • 27. Privately Held Social Media • Crunchbase (https://www.crunchbase.com/) is a good source of information on shareholders/owners, acquisitions, finances, etc. • Alexa web rankings (owned by Amazon) give an overview of usage levels and trends based on ranking relative to other sites in the US, and globally. • Social Media sites’ “business” and “press” sites, official blogs and news releases are best for user data. • Some social media provide advertising APIs – which may be usable for research depending on T&Cs and data content - but not developer or open APIs, e.g. Snapchat: https://www.snap.com/en-GB/news/post/third-party- applications-and-the-snapchat-api/ e.g: – Pinterest: • data on usage from Pinterest: https://business.pinterest.com/en • Alexa data on usage: https://www.alexa.com/siteinfo/pinterest.com • investor data: https://www.crunchbase.com/organization/pinterest/investors/investors_list
  • 28. Data Quality & Reliability • Data sources and APIs can change regularly, and what is available may change over time (e.g. Twitter moved from all to “Top” tweets some years ago for its API; Facebook have changed data structures multiple times). • Errors in automated data collection can be hard to spot until analysis is undertaken – sampling, trial data collection, and review of code by colleagues can all be useful. • Gaps in data may occur because there are genuine gaps in data creation/posting etc; because there are technical issues with the social media service; because of an error in your code; or because you are over your API rate limit for the minute/hour/day. • Data may change over time – Facebook and Instagram allow posts to be edited so a request will capture one moment in time not necessarily the original or final versions. • Data may disappear over time. Notable example: the Twitter deletions terms and conditions means that deleted tweets will not appear in a later API call. – Research tools obeying the T&Cs will also update and remove deleted tweets. – Research tools retaining deleted tweets are technically in breach of the T&Cs. • Acquisitions, Mergers, and shut downs of social media sites can lead to changed terms and conditions, changes to data availability and use, changes or removals of APIs and data access routes, changes to user presence in a space, acceptable norms within a space (important for qualitative work particularly).
  • 29. Hidden pre-filtering and sampling • Not all social media posts are equally likely to be included in standard API endpoints – e.g. a Twitter user with few posts and few followers is unlikely to appear on a popular hashtag. – The standard "Streaming" and "Search" APIs include 1% of Tweets and varies in accuracy depending on activity/time etc. (See Morstatter et al 2013). • Privacy settings will reduce the accuracy of any data sampled from Facebook or other more complex privacy networks but it is hard to see what is being excluded. Morstatter, F., Pfeffer, J., Liu, H. and Carley, K.M., 2013. Is the Sample good enough? Comparing data from Twitter's streaming API with Twitter's Firehose. In ICWSM 2013 and eprint arXiv:1306.5204. Available from: https://arxiv.org/abs/1306.5204
  • 30. Question/Discussion Have you already tried obtaining data for the social media space you are using in your research? Have you faced any challenges or obstacles?
  • 31. Existing Data Sets • “The Zuckerberg files”: digital archive of all public comments by Mark Zuckerberg including social media and mainstream media content for research use: https://www.zuckerbergfiles.org/ • FiveThirtyEight Data: archive of data associated with FiveThirtyEight articles, including social media data sets: https://github.com/fivethirtyeight/data • Lumen database – tracking legal notices and complains for removal of online materials (including social media content): https://www.lumendatabase.org/ • CSIRO (Australia’s national science agency) We Feel – emotions in Tweets – API: http://wefeel.csiro.au/#/api (see: http://datadrivenjournalism.net/resources/we_feel) • Stanford Large Network Dataset Collection - includes social network data sets: https://snap.stanford.edu/data/ • Network Repository – network datasets, including social media, Facebook and Twitter networks: http://networkrepository.com/ • DocNow – social justice social network archives: http://www.docnow.io/
  • 32. Cross-site data tools • North Caroline Social Media Archive Toolkit: https://www.lib.ncsu.edu/social-media-archives-toolkit; see also: https://github.com/NCSU-Libraries/Social-Media-Combine • Social Mention (search engine for social media) API: http://www.socialmention.com/api/ • Scrapebox (premium tool) YouTube Downloader: http://www.scrapebox.com/youtube-downloader and Social Account Scraper: http://www.scrapebox.com/social-account-scraper • ESRC COSMOS Open Data Tools (available but no longer updated since 2014): http://socialdatalab.net/software • Overview of Twitter data tools (Ahmed 2015): http://blogs.lse.ac.uk/impactofsocialsciences/2015/07/10/social- media-research-tools-overview/
  • 33. Recommended: DMI Tools The Digital Methods Initiative add new (documented) tools all the time, including: • Censorship Explorer – determine censorship in various regions through URLs & proxies. • Discus (Disqus) Comment Scraper – obtain data from the Discus comment plugin. • Expand Tiny URLs – automatically expand large collections of Tiny URLs (e.g .from tweets). • Geo IP – translate URLs or IP addresses into geographic locations (e.g. for a blog). • Instagram Hashtag Explorer – retrieve Instagram media via specific hashtags. • Issue Crawler – uses URLs to analyse relationships and connections through links between URLs. • Netvizz (Facebook) – extracts data from Facebook around groups, pages, search. • Pinterest Scraper – scrapes Pinterest URLs and captures metadata of pins. • Tumblr – data capture based on a Tumblr tags which retrieves metadata and co-incident tags. • Twitter Capture and Analysis Toolset (DMI-TCAT) – robust and reproducible tool for data capture and analysis of Twitter data. Source code available for local use. • YouTube Data Tools – extract data on YouTube channels and videos, e.g. channel networks. Access documentation and DMI tools at: https://wiki.digitalmethods.net/Dmi/ToolDatabase See also, DMI Protocols: https://wiki.digitalmethods.net/Dmi/DmiProtocols
  • 35. Internet Archive & WayBackMachine • Global archive capturing websites (to various levels of detail/depth) based on IA targets and user-submitted requests (since 2001). • You can request a site for archiving, or a group of sites. • Searchable resource OR can use exact URL to retrieve previous archived pages (WayBackMachine). • Collections exist for various social media collections, e.g: – 2016 US Presidential Election Social Media: https://archive.org/details/2016electiontwitter – Arab America on Social Media: https://archive.org/details/ArchiveIt-Collection-2797 – Gif Cities (Gifs from GeoCities): https://gifcities.org/ • Great for social media website changes, blogs, terms and conditions versions, etc. • Sites available in a range of archive formats (IA), or as viewable pages (WayBackMachine). • See: – https://archive.org/ – https://archive.org/web/
  • 37. UK Web Archive • Run by the British Library (since 2004). • Indexes (UK/related) sites to a greater depth than the Internet Archive. • Smaller archive. • You can request a site for archiving. • Special Collections include: – UK Blogs: https://www.webarchive.org.uk/ukwa/collection/100698/page/1/source/colle ction – London Terror Attacks, 2005 (mainstream and social media commentary): https://www.webarchive.org.uk/ukwa/collection/100757/page/1/source/colle ction – Olympic & Paralympic Games 2012 (mainstream and social media): https://www.webarchive.org.uk/ukwa/collection/4325386/page/1/source/coll ection • See: https://www.webarchive.org.uk/ukwa/.
  • 38. Other Web Archive Resources • Rhizome: archiving for internet art, including interactive works engaging with/critiquing social media: http://rhizome.org/ • Note: EDINA are currently working on an archiving tool for researchers, ask me for more info on Site2Cite.
  • 39. Using APIs to obtain Data • APIs (Application Programming Interfaces) exist for most social media sites and allow direct requests for data. • Some unofficial APIs exist for sites without official/open APIs. Use only with caution as these frequently have privacy, security or legal issues. • Consider working with text and data mining colleagues, or developers, to seek additional ways to capture data such as: – Screen scraping (automated capture of pages from a user perspective). – Mobile data collection or data capture approaches to social media. – Internet archiving approaches using standard tools or code libraries
  • 40. Glossary: Data Request Terms • API: Application Programming Interface – a way to request data from a web service. • REST or RESTful API: REST stands for “Representational State Transfer” and means an API that uses HTTP (the protocol for accessing websites) requests (or “calls”) to: – GET – read access to content such as posts, users, etc. This is the main request you would use to retrieve data. – PUT – update or replace data. – POST – create new data (such as a post to a blog, a wiki page, etc.). – DELETE – Delete content. • An API Endpoint – is essentially the way to address and structure what kind of request you are making. E.g. home_timeline vs user_timeline. Each endpoint provides a different entry to the data behind a web service. • In a REST GET request you may have: – Fields – the various fields of data you want to retrieve, e.g. link, message, post, etc. These are usually shown in the Developer Documentation. – Modifiers or Parameters - these act like filters, limiting the request in a specific way, e.g. only retrieving posts with a location attached. – Operators – are the various standard terms/labels for content and content types that you can use in your GET request to shape and customise it, for instance this might include “retweets_of” or “bio” or “has:links” etc. • Other types of APIs and M2M (Machine-to-Machine) interfaces exist including “SOAP” and “RPC”. • SDK is Standard Developer Kit and is used increasingly often as a way to package various requests for developers to use in web or mobile apps (SDKs has been used as a term for the coding tools for smartphone platforms iOS and Android for years).
  • 41. Locating or Requesting Social Media Data ProgrammableWeb (https://www.programmableweb.com/) is a great source for API information for social media sites: • Instagram Developer: https://www.instagram.com/developer/ – API Endpoints: https://www.instagram.com/developer/endpoints/ • Twitter Developer: https://developer.twitter.com/ – APIs: https://developer.twitter.com/en/docs – GNIP: http://support.gnip.com/apis/ - premium "Firehose" access. See also Twitter Enterprise: https://developer.twitter.com/en/enterprise – Free APIs cover 7 days tweets; Premium APIs exist for 30-day search and full archive search. – Facebook for Developers: https://developers.facebook.com/ – API (Graph API): https://developers.facebook.com/docs/graph-api/ • YouTube Developers: https://developers.google.com/youtube/ – APIs (Comments and Comment Threads particularly useful): https://developers.google.com/youtube/v3/docs/ • Weibo API: http://open.weibo.com/wiki/API%E6%96%87%E6%A1%A3/en
  • 42. How do you make an API call? • For open RESTful APIs you can enter an HTTP request in any browser window, e.g. http://services.groupkt.com/state/get/USA/all • Most social media APIs now require you to register your app, request a key from them and for you to include the access tokens in your request. • In general API calls are made from within a small programme – this might be running on your machine or from a browser based coding tool. • Lots of existing tools based on social media APIs exist – see later slide for a sample of these. • Try it out: – Codecademy Twitter API tutorial: https://www.codecademy.com/en/tracks/twitter
  • 43. An API Endpoint is a bit like a vending machine… Vending machine priced by grams of fat, Google, San Jose, California.jpg by Flickr user Cory Doctorrow. You have to use the right machine to get hold of the item you want, then you have to enter the right code and the right price to get your candy. • Each item has a name, and a standard way to access it (in a vending machine this is the item code). • Each item has a value (in a vending machine this is the delicious edible contents of each item). • Each item requires some sort of trust exchange before you can access it (in a vending machine this is cash). • In an API that “E12” item code is actually going to look more like: https://api.twitter.com/1.1/statuses/user_timeline.jso n?screen_name=twitterapi&count=2 • In an API the price is usually a unique key/access token that is unique to you and your app – that indicates a legitimate request and who it’s from. Bonus: In APIs there is usually a huge range of data (research candy) to ask for, and lots of filtering options.
  • 44. What will you get back from an API GET request? Assuming it has worked correctly, something like this… See the full example at: https://developer.twitter.com/en/docs/tweets/timelines/api- reference/get-statuses-user_timeline.html Each of these is a new field for a single tweet and it’s value. [] is an empty field (e.g. no hashtag on this tweet).
  • 45. This data can then be processed by your app, or simply retrieved and stored in a database or spreadsheet…
  • 46. Recommended Tool: Martin Hawksey’s TAGS • Uses Google Docs to capture tweets based on a hashtag, search term, user, etc. • Can be automated to allow rolling capture. • Useful for capturing a sample of long term community dialogues or public discourse where Top Tweets/7 day limits will be acceptable. • Includes spreadsheet; visualisation; searchable archive - latter two options are only available if you make data (semi) public. • Uses Twitter API – takes “Top” rather than “Latest” tweets so accuracy depends on popularity of content/hashtags. • Well documented and supported by Martin. • A great way to dip your toe in the API water – you have to obtain a key the first time you run TAGs, and can access and look at the code it runs. You can also make more advanced use of the tool and automation connecting it to other visualisations and analysis tools. • See: https://tags.hawksey.info/ • Support: https://tags.hawksey.info/forums/
  • 47.
  • 48. Question/Discussion Do you have any experience or recommendations for social media data collection tools or approaches? Have you attended one of the Digital Scholarship sessions where CAHSS researchers can meet with developers and data specialists? [Recommended!]
  • 49. Analysis & Visualisation Further information, tutorials etc. online and/or running through Digital Scholarship and Schools Research Methods training. • Nvivo (http://www.qsrinternational.com/nvivo/) – Premium qualitative data analysis software with social media and multimedia support, collaborative working also supported. Feature rich. Training available. Available through UoE/CAHSS license: https://www.ed.ac.uk/information-services/computing/desktop-personal/software/main-software-deals/nvivo. • IBM SPSS (https://www.ibm.com/analytics/us/en/technology/spss/) – Premium data analysis tool for surveys and particularly for quantitative data, widely used in social sciences. Available through UoE license: https://www.ed.ac.uk/information- services/computing/desktop-personal/software/main-software-deals/spss. • Dedoose (http://www.dedoose.com/) – Premium qualitative data analysis software with simple interface, tagging, annotation and exploration options. • Chorus (http://chorusanalytics.co.uk/) – Free software for data harvesting and analytics for social science research using Twitter data • Gephi (https://gephi.org/) – Visualisation and exploration of multiple data types, particularly good for network analysis. Feature rich so a bit of a learning curve. Free download. • D3 visualisation libraries (see: https://github.com/d3/d3/wiki/gallery) – Free collection of Javascript libraries for use in data visualisation and exploration of multiple data types. • NodeXL (https://nodexl.codeplex.com/) – Free network visualisation tool for Excel. Free. • TAGS Explorer (https://tags.hawksey.info/) – Twitter only visualisations of networks (using NodeXL) and searchable timeline archive explorations. Free. • Textal (http://www.textal.org/) – Text analysis tools for mobile use with Twitter streams, websites (inc. blogs), and documents. Free. • Tableau (https://www.tableau.com/) – Visualisation of multiple data sources and types. Free trial, otherwise monthly subscription. A large quantity of open source tools and software are available. Search for these or look at the Journal of Open Research Software (https://openresearchsoftware.metajnl.com/) or the Journal of Open Source Software (http://joss.theoj.org/) for well documented research- driven examples. See also Tony Hirst’s OU Useful Blog (https://blog.ouseful.info/) for visualisation approaches. There are also many marketing packages for social media analysis which could be used/adapted for research where their processes are well documented.
  • 50. Appropriate Handling & Storage • Data is usually returned with unique identifiers that can be easily traced back to the original poster/subject. • The unique identifiers connect conversations and posts so are hard to strip away entirely – although you could try a one-way hash of the data to mask the identifiable information but retain connections. • Short posts and tweets are highly identifiable. Try Googling or searching Twitter for a recent tweet to see that in action. • Images and videos can also be relatively easily compared/reverse image searched and therefore identifiable. • Think about which fields you actually need to retain for your research question(s). • Plan how long you will keep your data, and how you will keep it secure - where and how you store your data really matters.
  • 51. Data Protection & GDPR • Be aware of current Data Protection (Data Protection Act 1998) guidance on the use, storage and retention of personal data. • From 25th May 2018 the General Data Protection Regulation (GDPR) comes into effect with: – Increased rights for individuals to understand the use, access, rectification, erasure, rights to restrict processing, portability, and rights to object to the use of their data. – Increased legal measures for organisations breaching GDPR guidance. • Ensure your Consent process, your Research Data Management plans, and your use, access and disposal of data is compliant. • By default social media APIs provide a lot of data: – What is the minimum data you require? – Removing unneeded data at the point of collection and/or data cleaning will help reduce any risks of exposure or non-compliance with data protection legislation. See: • Data Protection Act 1998: https://www.legislation.gov.uk/ukpga/1998/29/contents • ICO guidance: https://ico.org.uk/for-organisations/data-protection-reform/overview-of-the-gdpr/
  • 52. Local Support • Research Data Mantra – self-led course on Research Data Management, including appropriate handling, storage and planning for onward preservation, sharing or destruction: http://mantra.edina.ac.uk/ • Data Store – secure storage for active research data, available to all staff and PGR students: https://www.ed.ac.uk/information-services/research- support/research-data-service/working-with-data/data-storage • Working with Sensitive Data – guidance and further resources on working with sensitive and personal data: https://www.ed.ac.uk/information- services/research-support/research-data-service/working-with- data/sensitive-data • Information Security Team – guidance on legal and technical approaches to keeping data secure and appropriately encrypted and disposed of: https://www.ed.ac.uk/infosec
  • 53. Making Research Data Open • If you have a consent process in place, ensure you request consent for any onward use you expect to make of your data. And ensure there is a process to withdraw consent for onward. • Beware verbatim quoting in publications – it can be easy to search back to the original text. – Public figures who would consider their social media content a publication and part of their profile (e.g. politicians) are more appropriate to quote, where needed. – Even if anonymous/not attributed it is safer to paraphrase short comments where possible to make reverse searching more challenging. • Screenshots of posts often reveal the subject name, image, location, and their contacts. Only use these where appropriate, properly consented to, and where you are not placing your subjects at risk. • Consider the timelag between data collection and any publication. Is your consent from participants still valid if a year has passed? What about 2 years? Or 5 years? A teen participant may feel differently about data being exposed when they are, for instance, a newly qualified lawyer or medic with very different reputational considerations. See also: University of North Carolina at Chapel Hill and UoE Research Data Management and Sharing (Coursera): https://www.coursera.org/learn/data-management
  • 54. Courses and Information • DMI Digital Methods online course: https://wiki.digitalmethods.net/Digitalmethods/WebHome • UCL Why We Post: the Anthropology of Social Media course (FutureLearn): https://www.futurelearn.com/courses/anthropology-social-media • QUT Social Media Analytics: Using Data to Understand Public Conversations (FutureLearn): https://www.futurelearn.com/courses/social-media-analytics • Rutgers University Social Media Data Analytics (Coursera): https://www.coursera.org/learn/social-media-data-analytics • Doing Journalism with Data: First steps, skills and tools: http://learno.net/courses/doing-journalism-with-data-first-steps-skills- and-tools • UoE Digital Footprint MOOC – understand some of the challenging identity, privacy and ethical concerns around social media for you and your research subjects: https://www.coursera.org/learn/digital-footprint/
  • 55. Useful Niche Resources • Utrecht Data School Data Ethics Decision Aid (DEDA): https://dataschool.nl/research/deda/?lang=en • Programming Historian Data Mining the Internet Archive lesson: https://programminghistorian.org/lessons/data-mining-the-internet-archive • Insight News Lab Social Network Analysis and Visualisation for #RDAPlenary 3 (using ScraperWiki and OpenRefine): http://hujo.deri.ie/rdaplenarysn/ • Tony Hirst First Baby Steps to Anonymising Data with Open Refine: https://blog.ouseful.info/2015/01/23/anonymising-data-with-open-refine/ • Tony Hirst Social Interest Positioning – Visualising Facebook Friends’ Likes with Data Grabbed Using Google Refine: https://blog.ouseful.info/2012/01/04/social- interest-positioning-visualising-facebook-friends-likes/ • Tony Hirst Grabbing Twitter Search Results into Google Refine and Exporting Conversations into Gephi  needs updating for new Twitter API: https://blog.ouseful.info/2012/10/02/grabbing-twitter-search-results-into-google- refine-and-exporting-conversations-into-gephi/
  • 56. Local research and expertise (a small sampling thereof!) • Social media, Digital Ethnography, Sociological research methods– Kate Orton Johnsone (Sociology) • Social Media, Digital Labour – Karen Gregory (Sociology). • Communities on the Darknet, illicit markets and cultures – Angus Bancroft (Sociology). • Social media in education; bots; anonymity in social media – Sian Bayne (Research in Digital Education Centre, Moray House). • Digital cultural heritage learning and engagement– Jen Ross (Research in Digital Education Centre, Moray House); Claire Sowton (CAHSS); Melissa Terras (UCL/CAHSS). • Text and data mining of social media content – Claire Grover (Informatics); Richard Tobin (Informatics); Clare Llewellyn (Informatics; Neuropolitics Research, SPS). • Sharing of photography, autobiographical memory and distributed cognition (inc. social media) – Tim Fawns (Clinical Education, Centre for Medical Education, MVM). • Big data (inc. social media) in healthcare – Mhairi Aitken (Usher Institute, MVM). • Social media, Digital Footprint, blogging and Buddhism – Louise Connelly (Vet School). • Mobility, mobile technology, formal and informal education communities around the world – Michael Sean Gallagher (Research in Digital Education Centre, Moray House). • Playful learning in informal digital environments – Clara O’Shea (Research in Digital Education Centre, Moray House). • Social media and politics – Neuropolitics Research group: Laura Cram (Politics, SPS); Robin Hill (Informatics; SPS); Sujin Hong (SPS); Adam Moore (PPL). • Visualisation of big data, including network analysis – Benjamin Bach (Design Informatics). • Social Media and scholarly communities–Sara Shinton (IAD); James Stewart (SPS).
  • 57. Recommended work & groups researching in this area • UoE Beyond Text Network – interdisciplinary network for social media and multimedia researchers: https://www.wiki.ed.ac.uk/display/DIG/Beyond+Text • UoE Informatics Language Technology Group – text mining expertise working on projects including topic modelling and social media analysis: https://www.ltg.ed.ac.uk/ • Digital Methods Initiative (DMI) (European multi-organisation research group): https://wiki.digitalmethods.net/Dmi/DmiAbout • Microsoft Research Social Media Collective (US) – particularly danah boyd, Nancy Baym and Kate Crawford’s work: https://www.microsoft.com/en-us/research/group/social-media-collective/ • #NSMNSS: New social media, new social science? - great blog reflecting on social science methods around social media http://nsmnss.blogspot.co.uk/ • Oxford Internet Institute – particularly strong on relationships to mainstream media environment: https://www.oii.ox.ac.uk/ • Visual Social Media Lab (Sheffield) – led by Farida Vis: http://visualsocialmedialab.org/ • DocNow – social justice social media archiving: http://www.docnow.io/ • Data Driven Journalism (European Journalism Centre and Netherlands): http://datadrivenjournalism.net/ • Analysing Social Media Collaboration (UK cross-institution group, site now dormant) – responsible for the high profile “Reading the Riots” Twitter analysis work in 2011: http://www.analysingsocialmedia.org/home • Michael Zimmer – influential work on privacy, leading projects on privacy and Facebook: http://www.michaelzimmer.org/ • Electronic Freedom Foundation –advocates with expertise on privacy and tracking in social media: https://www.eff.org/ • Centre for Social Media Research (University of Westminster): https://www.westminster.ac.uk/social-media-research • Digital Media and Society Research Group (Cardiff): https://www.cardiff.ac.uk/research/explore/research-units/digital- media-and-society • COSMOS (legacy page for Cardiff research group): http://www.cs.cf.ac.uk/cosmos/
  • 58. Recommended Journals • First Monday (University of Illinois at Chicago): http://firstmonday.org/index • New Media & Society (Sage): http://journals.sagepub.com/home/nms • Information, Communication & Society (Taylor & Francis): http://tandfonline.com/toc/rics20/current • Social Media + Society (Sage): http://journals.sagepub.com/home/sms • Big Data & Society (Sage): http://journals.sagepub.com/home/bds • Policy & Internet (Wiley): http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)1944-2866 • Journal of Computer-Mediated Communication (Wiley): http://onlinelibrary.wiley.com/journal/10.1111/(ISSN)1083-6101 • Cyberpsychology, Behaviour, and Social Networking (Mary Ann Liebert Inc.): http://online.liebertpub.com/loi/CYBER • Journal of Broadcasting & Electronic Media (Taylor & Francis): http://www.tandfonline.com/toc/hbem20/current
  • 59. Relevant Upcoming Digital Scholarship Sessions • Digital Research Clinics and Resources (26th October 2017) • Cleaning Data with Open Refine (1st November 2017) • Regex: Regular Expressions (23rd November 2017) • Introduction to Sentiment Analysis: What it is and how to do it simply (14th December 2017) Look out for further sessions and/or contact the team with any specific requests: http://www.digital.cahss.ed.ac.uk/
  • 60. Questions & Discussion Or follow up after today: nicola.osborne@ed.ac.uk

Hinweis der Redaktion

  1. An awful lots of social media research is on Twitter? Why Researchers use Twitter It’s really easy to get data from It feels influential and visible Usage has gone u: Around 45% of UK Online Adults use Twitter, 37% have an account and login daily [http://www.rosemcgrory.co.uk/2017/01/03/uk-social-media-statistics-for-2017/] But a lot of Twitter users NEVER post – 1% of accounts post 20% of all tweets. And most Twitter users have modest followings, impact and visibility, and their tweets won’t make search results on busy hashtags unless they have a connection. Twitter is no longer a serendipitous network, it is filtered and tailored so that it has some of the characteristics of Facebook in terms of visibility and “filter bubbles”.